Merge pull request #2 from MICS-Lab/dev

Dev
MICS-Lab · Sep 2, 2022 · 2628235 · 2628235
2 parents bb252a8 + fd2ce26
commit 2628235
Show file tree

Hide file tree

Showing 63 changed files with 3,142 additions and 181 deletions.
diff --git a/README.md b/README.md
@@ -9,7 +9,6 @@
 [![Downloads](https://pepy.tech/badge/scyan)](https://pepy.tech/project/scyan)
 [![License](https://img.shields.io/pypi/l/scyan.svg)](https://github.com/MICS-Lab/scyan/blob/master/LICENSE)
 [![Imports: isort](https://img.shields.io/badge/imports-isort-blueviolet)](https://pycqa.github.io/isort/)
-[![DOI](https://zenodo.org/badge/516048412.svg)](https://zenodo.org/badge/latestdoi/516048412)
 
 Scyan stands for **S**ingle-cell **Cy**tometry **A**nnotation **N**etwork. Based on biological knowledge prior, it provides a fast cell population annotation without requiring any training label. Scyan is an interpretable model that also corrects batch-effect and can be used for debarcoding, cell sampling, and population discovery.
 
@@ -101,13 +100,13 @@ Scyan is a **Python** library based on:
         module/               # Folder containing neural network modules
             coupling_layer.py # Coupling layer
             distribution.py   # Prior distribution (called U in the article)
+            mmd.py            # Maximum Mean Discrepancy implementation
             real_nvp.py       # Normalizing Flow
             scyan_module      # Core module
         plot/                 # Plotting tools
             ...
         tools/
             ...               # Preprocessing tools and more
-        mmd.py                # Maximum Mean Discrepancy implementation
         model.py              # Scyan model class
         utils.py              # Misc functions
     .gitattributes

diff --git a/config/project/aml.yaml b/config/project/aml.yaml
@@ -1,6 +1,6 @@
 name: aml
 wandb_project_name: aml
 label: cell_type
-size: default
+version: default
 # batch_key: subject
 # batch_ref: H1
diff --git a/config/project/bmmc.yaml b/config/project/bmmc.yaml
@@ -1,4 +1,4 @@
 name: bmmc
 wandb_project_name: bmmc
 label: cell_type
-size: default
+version: default
diff --git a/config/project/infinity_flow.yaml b/config/project/infinity_flow.yaml
@@ -1,4 +1,4 @@
 name: infinity_flow
 wandb_project_name: inf-flow
 label: cell_type
-size: full
+version: full
diff --git a/config/project/pop_durva.yaml b/config/project/pop_durva.yaml
@@ -1,3 +1,3 @@
 name: pop_durva
 wandb_project_name: pop
-size: default
+version: default
diff --git a/docs/advanced/advice.md b/docs/advanced/advice.md
@@ -24,7 +24,7 @@
 ### What should I do if Scyan seems wrong?
 
 - First thing to do is to check your table again. You may have made a typo that could confuse the model. Typically, if you have written `Marker+` for a population that is `Marker-` (or the opposite), it can perturb the prediction toward this population **and** toward other populations.
-- Try using [`scyan.plot.probs_per_marker`](../api/probs_per_marker.md). Many markers may show up dark on the heatmap at places they shouldn't, but the errors may be due to only one marker. You can find which marker it is by checking actual marker expressions on [a UMAP plot](../api/plot_umap.md), or with a [scatter plot](../api/scatter.md), and then update your table or read some literature again.
+- Try using [scyan.plot.probs_per_marker](/api/plots/#scyan.plot.probs_per_marker). Many markers may show up dark on the heatmap at places they shouldn't, but the errors may be due to only one marker. You can find which marker it is by checking actual marker expressions on [a UMAP plot](/api/plots/#scyan.plot.umap), or with a [scatter plot](/api/plots/#scyan.plot.scatter), and then update your table or read some literature again.
 - One reason for not predicting a population may be an unbalanced knowledge quantity between two related populations. For instance, having 10 values inside the table for `CD4 T CM` cells versus 5 values for `CD4 T EM` cells will probably make the model predict very few `CD4 T CM` cells. Indeed, `CD4 T CM` has many constraints compared to `CD4 T EM`, which becomes the "easy prediction" (indeed, very few constraints are applied to this population). In that case, read the advice related to the scatter plot above again.
 
 !!! info "Example about how Scyan handles NA"

diff --git a/docs/advanced/data.md b/docs/advanced/data.md
@@ -1,32 +1,30 @@
 # Add your own dataset
 
-## Prepare your Python objects
+Existing datasets and versions can be listed via [`scyan.data.list()`](/api/datasets/#scyan.data.list). By default, only public datasets are available, but you can add some if you follow the next steps.
 
-You must prepare your cytometry data and create your knowledge table as described in the [preprocessing tutorial](../tutorials/preprocessing.ipynb).
+## 1. Prepare your Python objects
 
-!!! tips
-
-    Read our [advice](../advanced/advice.md) to create the knowledge table. A great table leads to better predictions.
+You must prepare your `cytometry data` and create your `knowledge table` as described in the [preprocessing tutorial](/tutorials/preprocessing). You can also read our [advice](/advanced/advice) to create the knowledge table (a great table leads to better predictions!).
 
 !!! info
 
-    If needed, you have an example of an `adata` object and a knowledge table if you run:
+    If needed, you have an example of an `adata` object and a knowledge table (a.k.a `marker_pop_matrix`) if you run:
 
     ```python
     adata, marker_pop_matrix = scyan.data.load("aml")
     ```
 
-## Save your dataset
+## 2. Save your dataset
 
-Now that you have created an `adata` object and a `marker_pop_matrix` table, you can simply save them (for more details, see [scyan.data.add](../api/add.md)):
+Now that you have created an `adata` object and a `marker_pop_matrix` table, you can simply save them (for more details, see [scyan.data.add](/api/datasets/#scyan.data.add)):
 
 ```python
 scyan.data.add("<your-project-name>", adata, marker_pop_matrix)
 ```
 
-## Load your dataset
+## 3. Load your dataset
 
-Congrats, you can now load your dataset with `scyan`:
+Congrats, you can now load your dataset (for more details, see [scyan.data.load](/api/datasets/#scyan.data.load)):
 
 ```python
 adata, marker_pop_matrix = scyan.data.load("<your-project-name>")

diff --git a/docs/advanced/hydra_wandb.md b/docs/advanced/hydra_wandb.md
@@ -2,16 +2,16 @@
 
 If needed, you can use [Hydra](https://hydra.cc/docs/intro/) to manage your configuration. It allows to run the scripts or run hyperparameter optimization easily. You can also monitor your jobs with [Weight & Biases](https://wandb.ai/site)
 
-For that, clone the repository and make an editable install of the project (see [Getting Started](https://mics-lab.github.io/scyan/getting_started/)). Then, you have to follow the step listed below.
+For that, clone the repository and make an editable install of the project (see [Getting Started](/getting_started)). Then, you have to follow the step listed below.
 
 ## Create a new project configuration
 
-Create a new project at `config/project/<your-project-name>.yaml`, where `<your-project-name>` is the one you used to [create your dataset](./data.md).
+Create a new project at `config/project/<your-project-name>.yaml`, where `<your-project-name>` is the one you used to [create your dataset](/advanced/data).
 In this file, add `name: <your-project-name>`.
 
 Add optionally:
 
-- `size` or `table` if you don't want to use your dataset's default table or anndata files.
+- `version` or `table` if you don't want to use your dataset's default table or anndata files.
 - `batch_key` (and eventually `batch_ref`) if you want to correct the batch effect.
 - You can add some `continuous_covariate_keys` and `categorical_covariate_keys` (as a list of items).
 - `wandb_project_name`, the name of your Weight and Biases project for model monitoring. It will log all the metrics over the epochs and save different figures online.
@@ -34,4 +34,4 @@ Update `config/sweeper/optuna.yaml` to select the parameters you want to optimiz
 
 !!! check
 
-    Now that you have configured your project, you can run the scripts (see [running scripts](./scripts.md)) by providing the argument `project=<your-project-name>`.
+    Now that you have configured your project, you can run the scripts (see [running scripts](/advanced/scripts)) by providing the argument `project=<your-project-name>`.
diff --git a/docs/advanced/parameters.md b/docs/advanced/parameters.md
@@ -1,10 +1,10 @@
 !!! note
 
-    If not done yet, you may be interested in reading our [advice to improve your knowledge table](../advice). The default `Scyan` parameters should work well for most cases, so you should first check your knowledge table.
+    If not done yet, you may be interested in reading our [advice to improve your knowledge table](/advanced/advice). The default `Scyan` parameters should work well for most cases, so you should first check your knowledge table.
 
 ## Main parameters
 
-We provide some help to choose [scyan.Scyan](../api/model.md) parameters. We listed below the most important ones.
+We provide some help to choose [scyan.Scyan](/api/model) parameters. We listed below the most important ones.
 
 - `prior_std` is probably one of the most important parameters. Its default value should work for most of the usage, but it can be changed if needed. A low `prior_std` (about `0.15`) will help better separate the populations, but it may be too stringent, and some small populations may disappear. In contrast, a high `prior_std` (about `0.4`) increases the chances of having a large diversity of populations, but their separation may be less clear. For a project where populations are easy to identify, we thus recommend lowering the `prior_std`.
 - Reducing the `temperature` can help better capture small populations. For instance, you can lower the temperature to `0.5`. If it is not enough, try using `modulo_temp = 3`.
@@ -13,4 +13,4 @@ We provide some help to choose [scyan.Scyan](../api/model.md) parameters. We lis
 
 ## Hyperparameter search
 
-If you want to automate the choice of the hyperparameters, you can also run a hyperparameter optimization with [Hydra](https://hydra.cc/docs/intro/). See how to [configure your project and run Hydra](./hydra_wandb.md).
+If you want to automate the choice of the hyperparameters, you can also run a hyperparameter optimization with [Hydra](https://hydra.cc/docs/intro/). See how to [configure your project and run Hydra](/advanced/hydra_wandb).
diff --git a/docs/advanced/scripts.md b/docs/advanced/scripts.md
@@ -2,7 +2,7 @@
 
 !!! caution
 
-    You can run `Scyan` without using these scripts. Use it **only if** you want to use Hydra and/or Weight and Biases. Before continuing, ensure that you have [configured your project](./hydra_wandb.md).
+    You can run `Scyan` without using these scripts. Use it **only if** you want to use Hydra and/or Weight and Biases. Before continuing, ensure that you have [configured your project](/advanced/hydra_wandb).
 
 ## Usage examples
 
@@ -28,7 +28,7 @@ python -m scripts.run project=<project-name> model.temperature=0.5 model.prior_s
 
 ## Reproduce the article results
 
-The hyperparameters were obtained by (unsupervised) hyperparameter optimization (see [Hydra configuration](./hydra_wandb.md)).
+The hyperparameters were obtained by (unsupervised) hyperparameter optimization (see [Hydra configuration](/advanced/hydra_wandb)).
 
 ```bash
 # Testing on BMMC

diff --git a/docs/api/add.md b/docs/api/add.md
diff --git a/docs/api/analysis.md b/docs/api/analysis.md
@@ -0,0 +1,7 @@
+::: scyan.tools.count_cell_populations
+    options:
+      show_root_heading: true
+
+::: scyan.tools.mean_intensities
+    options:
+      show_root_heading: true
diff --git a/docs/api/asinh.md b/docs/api/asinh.md
diff --git a/docs/api/auto_logicle.md b/docs/api/auto_logicle.md
diff --git a/docs/api/datasets.md b/docs/api/datasets.md
@@ -0,0 +1,11 @@
+::: scyan.data.list
+    options:
+      show_root_heading: true
+
+::: scyan.data.load
+    options:
+      show_root_heading: true
+
+::: scyan.data.add
+    options:
+      show_root_heading: true
diff --git a/docs/api/io.md b/docs/api/io.md
@@ -0,0 +1,7 @@
+::: scyan.read_fcs
+    options:
+      show_root_heading: true
+
+::: scyan.write_fcs
+    options:
+      show_root_heading: true
diff --git a/docs/api/kde_per_population.md b/docs/api/kde_per_population.md
diff --git a/docs/api/latent_expressions.md b/docs/api/latent_expressions.md
diff --git a/docs/api/latent_heatmap.md b/docs/api/latent_heatmap.md
diff --git a/docs/api/load.md b/docs/api/load.md
diff --git a/docs/api/loss_mmd.md b/docs/api/loss_mmd.md
@@ -1 +1 @@
-::: scyan.mmd.LossMMD
+::: scyan.module.LossMMD
diff --git a/docs/api/plot_umap.md b/docs/api/plot_umap.md
diff --git a/docs/api/plots.md b/docs/api/plots.md
@@ -0,0 +1,35 @@
+::: scyan.plot.scatter
+    options:
+      show_root_heading: true
+
+::: scyan.plot.umap
+    options:
+      show_root_heading: true
+
+::: scyan.plot.probs_per_marker
+    options:
+      show_root_heading: true
+
+::: scyan.plot.latent_heatmap
+    options:
+      show_root_heading: true
+
+::: scyan.plot.subclusters
+    options:
+      show_root_heading: true
+
+::: scyan.plot.latent_expressions
+    options:
+      show_root_heading: true
+
+::: scyan.plot.kde_per_population
+    options:
+      show_root_heading: true
+
+::: scyan.plot.all_groups
+    options:
+      show_root_heading: true
+
+::: scyan.plot.one_group
+    options:
+      show_root_heading: true
diff --git a/docs/api/preprocessing.md b/docs/api/preprocessing.md
@@ -0,0 +1,19 @@
+::: scyan.tools.auto_logicle_transform
+    options:
+      show_root_heading: true
+
+::: scyan.tools.asinh_transform
+    options:
+      show_root_heading: true
+
+::: scyan.tools.inverse_transform
+    options:
+      show_root_heading: true
+
+::: scyan.tools.scale
+    options:
+      show_root_heading: true
+
+::: scyan.tools.unscale
+    options:
+      show_root_heading: true
diff --git a/docs/api/probs_per_marker.md b/docs/api/probs_per_marker.md
diff --git a/docs/api/read_fcs.md b/docs/api/read_fcs.md
diff --git a/docs/api/representation.md b/docs/api/representation.md
@@ -0,0 +1,7 @@
+::: scyan.tools.umap
+    options:
+      show_root_heading: true
+
+::: scyan.tools.subcluster
+    options:
+      show_root_heading: true
diff --git a/docs/api/scale.md b/docs/api/scale.md
diff --git a/docs/api/scatter.md b/docs/api/scatter.md
diff --git a/docs/api/subcluster.md b/docs/api/subcluster.md
diff --git a/docs/api/subclusters.md b/docs/api/subclusters.md
diff --git a/docs/api/umap.md b/docs/api/umap.md
diff --git a/docs/api/unscale.md b/docs/api/unscale.md
diff --git a/docs/api/write_fcs.md b/docs/api/write_fcs.md
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -68,21 +68,21 @@ This code should run in approximately 40 seconds (once the dataset is loaded).
 
 ### Inputs details
 
-- `adata` is an [AnnData](https://anndata.readthedocs.io/en/latest/) object, whose variables (`adata.var`) corresponds to markers, and observations (`adata.obs`) to cells. `adata.X` is a matrix of size ($N$ cells, $M$ markers) representing cell-marker expressions after being **preprocessed** ([asinh](./api/asinh.md) or [logicle](./api/auto_logicle.md)) and [**standardized**](./api/scale.md).
+- `adata` is an [AnnData](https://anndata.readthedocs.io/en/latest/) object, whose variables (`adata.var`) corresponds to markers, and observations (`adata.obs`) to cells. `adata.X` is a matrix of size ($N$ cells, $M$ markers) representing cell-marker expressions after being **preprocessed** ([asinh](/api/preprocessing/#scyan.tools.asinh_transform) or [logicle](/api/preprocessing/#scyan.tools.auto_logicle_transform)) and [**standardized**](/api/preprocessing/#scyan.tools.scale).
 - `marker_pop_matrix` is a [pandas DataFrame](https://pandas.pydata.org/) with $P$ rows (one per population) and $M$ columns (one per marker). Each value represents the knowledge about the expected expression, i.e. `-1` for negative expression, `1` for positive expression, or `NA` if we don't know. It can also be any float value such as `0` or `0.5` for mid and low expressions, respectively (use it only when necessary).
 
 !!! note "Help to create the `adata` object and the `marker_pop_matrix`"
 
-    Read the [preprocessing tutorial](./tutorials/preprocessing.ipynb) if you have an FCS file and want explanations to initialize `Scyan`.
+    Read the [preprocessing tutorial](/tutorials/preprocessing) if you have an FCS file and want explanations to initialize `Scyan`.
 
 !!! check
 
     Make sure every marker from the table (i.e. columns names of the DataFrame) is inside the data, i.e. in `adata.var_names`.
 
 ## Resources to guide you
 
-- Read the tutorials (e.g. [how to prepare your data](./tutorials/preprocessing.ipynb) or [usage example with interpretability](./tutorials/usage.ipynb)).
-- Read our [advice](./advanced/advice.md) to design the knowledge table.
-- Read the API to know more about what you can do (e.g. [scyan.Scyan](./api/model.md)).
-- [Save and load your own dataset](./advanced/data.md).
-- [How to choose the model parameters if you don't want to use the default ones](./advanced/parameters.md).
+- Read the tutorials (e.g. [how to prepare your data](/tutorials/preprocessing) or [usage example with interpretability](/tutorials/usage)).
+- Read our [advice](/advanced/advice) to design the knowledge table.
+- Read the API to know more about what you can do (e.g. [scyan.Scyan](/api/model)).
+- [Save and load your own dataset](/advanced/data).
+- [How to choose the model parameters if you don't want to use the default ones](/advanced/parameters).
diff --git a/docs/index.md b/docs/index.md
@@ -11,7 +11,7 @@ Scyan stands for **S**ingle-cell **Cy**tometry **A**nnotation **N**etwork. Based
 Scyan is a Bayesian probabilistic model composed of a deep invertible neural network called a normalizing flow (the function $f_{\phi}$). It maps a latent distribution of cell expressions into the empirical distribution of cell expressions. This cell distribution is a mixture of gaussian-like distributions representing the sum of a cell-specific and a population-specific term. Also, interpretability and batch effect correction are based on the model latent space — more details in the article's Methods section.
 
 <figure markdown>
-  ![Image title](./assets/overview.png)
+  ![Image title](/assets/overview.png)
   <figcaption>a) Overview of the tasks that Scyan can perform. b) Overview of the model architecture. c) One coupling layer, i.e., the elementary unit that composes the Normalizing Flow.</figcaption>
 </figure>
 
@@ -42,13 +42,13 @@ See [Scyan on Github](https://github.com/MICS-Lab/scyan)
         module/               # Folder containing neural network modules
             coupling_layer.py # Coupling layer
             distribution.py   # Prior distribution (called U in the article)
+            mmd.py            # Maximum Mean Discrepancy implementation
             real_nvp.py       # Normalizing Flow
             scyan_module      # Core module
         plot/                 # Plotting tools
             ...
         tools/
             ...               # Preprocessing tools and more
-        mmd.py                # Maximum Mean Discrepancy implementation
         model.py              # Scyan model class
         utils.py              # Misc functions
     .gitattributes