Skip to content

Commit

Permalink
Merge pull request #2 from MICS-Lab/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
quentinblampey authored Sep 2, 2022
2 parents bb252a8 + fd2ce26 commit 2628235
Show file tree
Hide file tree
Showing 63 changed files with 3,142 additions and 181 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
[![Downloads](https://pepy.tech/badge/scyan)](https://pepy.tech/project/scyan)
[![License](https://img.shields.io/pypi/l/scyan.svg)](https://github.com/MICS-Lab/scyan/blob/master/LICENSE)
[![Imports: isort](https://img.shields.io/badge/imports-isort-blueviolet)](https://pycqa.github.io/isort/)
[![DOI](https://zenodo.org/badge/516048412.svg)](https://zenodo.org/badge/latestdoi/516048412)

Scyan stands for **S**ingle-cell **Cy**tometry **A**nnotation **N**etwork. Based on biological knowledge prior, it provides a fast cell population annotation without requiring any training label. Scyan is an interpretable model that also corrects batch-effect and can be used for debarcoding, cell sampling, and population discovery.

Expand Down Expand Up @@ -101,13 +100,13 @@ Scyan is a **Python** library based on:
module/ # Folder containing neural network modules
coupling_layer.py # Coupling layer
distribution.py # Prior distribution (called U in the article)
mmd.py # Maximum Mean Discrepancy implementation
real_nvp.py # Normalizing Flow
scyan_module # Core module
plot/ # Plotting tools
...
tools/
... # Preprocessing tools and more
mmd.py # Maximum Mean Discrepancy implementation
model.py # Scyan model class
utils.py # Misc functions
.gitattributes
Expand Down
2 changes: 1 addition & 1 deletion config/project/aml.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: aml
wandb_project_name: aml
label: cell_type
size: default
version: default
# batch_key: subject
# batch_ref: H1
2 changes: 1 addition & 1 deletion config/project/bmmc.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: bmmc
wandb_project_name: bmmc
label: cell_type
size: default
version: default
2 changes: 1 addition & 1 deletion config/project/infinity_flow.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: infinity_flow
wandb_project_name: inf-flow
label: cell_type
size: full
version: full
2 changes: 1 addition & 1 deletion config/project/pop_durva.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
name: pop_durva
wandb_project_name: pop
size: default
version: default
2 changes: 1 addition & 1 deletion docs/advanced/advice.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
### What should I do if Scyan seems wrong?

- First thing to do is to check your table again. You may have made a typo that could confuse the model. Typically, if you have written `Marker+` for a population that is `Marker-` (or the opposite), it can perturb the prediction toward this population **and** toward other populations.
- Try using [`scyan.plot.probs_per_marker`](../api/probs_per_marker.md). Many markers may show up dark on the heatmap at places they shouldn't, but the errors may be due to only one marker. You can find which marker it is by checking actual marker expressions on [a UMAP plot](../api/plot_umap.md), or with a [scatter plot](../api/scatter.md), and then update your table or read some literature again.
- Try using [scyan.plot.probs_per_marker](/api/plots/#scyan.plot.probs_per_marker). Many markers may show up dark on the heatmap at places they shouldn't, but the errors may be due to only one marker. You can find which marker it is by checking actual marker expressions on [a UMAP plot](/api/plots/#scyan.plot.umap), or with a [scatter plot](/api/plots/#scyan.plot.scatter), and then update your table or read some literature again.
- One reason for not predicting a population may be an unbalanced knowledge quantity between two related populations. For instance, having 10 values inside the table for `CD4 T CM` cells versus 5 values for `CD4 T EM` cells will probably make the model predict very few `CD4 T CM` cells. Indeed, `CD4 T CM` has many constraints compared to `CD4 T EM`, which becomes the "easy prediction" (indeed, very few constraints are applied to this population). In that case, read the advice related to the scatter plot above again.

!!! info "Example about how Scyan handles NA"
Expand Down
18 changes: 8 additions & 10 deletions docs/advanced/data.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,30 @@
# Add your own dataset

## Prepare your Python objects
Existing datasets and versions can be listed via [`scyan.data.list()`](/api/datasets/#scyan.data.list). By default, only public datasets are available, but you can add some if you follow the next steps.

You must prepare your cytometry data and create your knowledge table as described in the [preprocessing tutorial](../tutorials/preprocessing.ipynb).
## 1. Prepare your Python objects

!!! tips

Read our [advice](../advanced/advice.md) to create the knowledge table. A great table leads to better predictions.
You must prepare your `cytometry data` and create your `knowledge table` as described in the [preprocessing tutorial](/tutorials/preprocessing). You can also read our [advice](/advanced/advice) to create the knowledge table (a great table leads to better predictions!).

!!! info

If needed, you have an example of an `adata` object and a knowledge table if you run:
If needed, you have an example of an `adata` object and a knowledge table (a.k.a `marker_pop_matrix`) if you run:

```python
adata, marker_pop_matrix = scyan.data.load("aml")
```

## Save your dataset
## 2. Save your dataset

Now that you have created an `adata` object and a `marker_pop_matrix` table, you can simply save them (for more details, see [scyan.data.add](../api/add.md)):
Now that you have created an `adata` object and a `marker_pop_matrix` table, you can simply save them (for more details, see [scyan.data.add](/api/datasets/#scyan.data.add)):

```python
scyan.data.add("<your-project-name>", adata, marker_pop_matrix)
```

## Load your dataset
## 3. Load your dataset

Congrats, you can now load your dataset with `scyan`:
Congrats, you can now load your dataset (for more details, see [scyan.data.load](/api/datasets/#scyan.data.load)):

```python
adata, marker_pop_matrix = scyan.data.load("<your-project-name>")
Expand Down
8 changes: 4 additions & 4 deletions docs/advanced/hydra_wandb.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

If needed, you can use [Hydra](https://hydra.cc/docs/intro/) to manage your configuration. It allows to run the scripts or run hyperparameter optimization easily. You can also monitor your jobs with [Weight & Biases](https://wandb.ai/site)

For that, clone the repository and make an editable install of the project (see [Getting Started](https://mics-lab.github.io/scyan/getting_started/)). Then, you have to follow the step listed below.
For that, clone the repository and make an editable install of the project (see [Getting Started](/getting_started)). Then, you have to follow the step listed below.

## Create a new project configuration

Create a new project at `config/project/<your-project-name>.yaml`, where `<your-project-name>` is the one you used to [create your dataset](./data.md).
Create a new project at `config/project/<your-project-name>.yaml`, where `<your-project-name>` is the one you used to [create your dataset](/advanced/data).
In this file, add `name: <your-project-name>`.

Add optionally:

- `size` or `table` if you don't want to use your dataset's default table or anndata files.
- `version` or `table` if you don't want to use your dataset's default table or anndata files.
- `batch_key` (and eventually `batch_ref`) if you want to correct the batch effect.
- You can add some `continuous_covariate_keys` and `categorical_covariate_keys` (as a list of items).
- `wandb_project_name`, the name of your Weight and Biases project for model monitoring. It will log all the metrics over the epochs and save different figures online.
Expand All @@ -34,4 +34,4 @@ Update `config/sweeper/optuna.yaml` to select the parameters you want to optimiz

!!! check

Now that you have configured your project, you can run the scripts (see [running scripts](./scripts.md)) by providing the argument `project=<your-project-name>`.
Now that you have configured your project, you can run the scripts (see [running scripts](/advanced/scripts)) by providing the argument `project=<your-project-name>`.
6 changes: 3 additions & 3 deletions docs/advanced/parameters.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
!!! note

If not done yet, you may be interested in reading our [advice to improve your knowledge table](../advice). The default `Scyan` parameters should work well for most cases, so you should first check your knowledge table.
If not done yet, you may be interested in reading our [advice to improve your knowledge table](/advanced/advice). The default `Scyan` parameters should work well for most cases, so you should first check your knowledge table.

## Main parameters

We provide some help to choose [scyan.Scyan](../api/model.md) parameters. We listed below the most important ones.
We provide some help to choose [scyan.Scyan](/api/model) parameters. We listed below the most important ones.

- `prior_std` is probably one of the most important parameters. Its default value should work for most of the usage, but it can be changed if needed. A low `prior_std` (about `0.15`) will help better separate the populations, but it may be too stringent, and some small populations may disappear. In contrast, a high `prior_std` (about `0.4`) increases the chances of having a large diversity of populations, but their separation may be less clear. For a project where populations are easy to identify, we thus recommend lowering the `prior_std`.
- Reducing the `temperature` can help better capture small populations. For instance, you can lower the temperature to `0.5`. If it is not enough, try using `modulo_temp = 3`.
Expand All @@ -13,4 +13,4 @@ We provide some help to choose [scyan.Scyan](../api/model.md) parameters. We lis

## Hyperparameter search

If you want to automate the choice of the hyperparameters, you can also run a hyperparameter optimization with [Hydra](https://hydra.cc/docs/intro/). See how to [configure your project and run Hydra](./hydra_wandb.md).
If you want to automate the choice of the hyperparameters, you can also run a hyperparameter optimization with [Hydra](https://hydra.cc/docs/intro/). See how to [configure your project and run Hydra](/advanced/hydra_wandb).
4 changes: 2 additions & 2 deletions docs/advanced/scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

!!! caution

You can run `Scyan` without using these scripts. Use it **only if** you want to use Hydra and/or Weight and Biases. Before continuing, ensure that you have [configured your project](./hydra_wandb.md).
You can run `Scyan` without using these scripts. Use it **only if** you want to use Hydra and/or Weight and Biases. Before continuing, ensure that you have [configured your project](/advanced/hydra_wandb).

## Usage examples

Expand All @@ -28,7 +28,7 @@ python -m scripts.run project=<project-name> model.temperature=0.5 model.prior_s

## Reproduce the article results

The hyperparameters were obtained by (unsupervised) hyperparameter optimization (see [Hydra configuration](./hydra_wandb.md)).
The hyperparameters were obtained by (unsupervised) hyperparameter optimization (see [Hydra configuration](/advanced/hydra_wandb)).

```bash
# Testing on BMMC
Expand Down
1 change: 0 additions & 1 deletion docs/api/add.md

This file was deleted.

7 changes: 7 additions & 0 deletions docs/api/analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
::: scyan.tools.count_cell_populations
options:
show_root_heading: true

::: scyan.tools.mean_intensities
options:
show_root_heading: true
1 change: 0 additions & 1 deletion docs/api/asinh.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/auto_logicle.md

This file was deleted.

11 changes: 11 additions & 0 deletions docs/api/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
::: scyan.data.list
options:
show_root_heading: true

::: scyan.data.load
options:
show_root_heading: true

::: scyan.data.add
options:
show_root_heading: true
7 changes: 7 additions & 0 deletions docs/api/io.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
::: scyan.read_fcs
options:
show_root_heading: true

::: scyan.write_fcs
options:
show_root_heading: true
1 change: 0 additions & 1 deletion docs/api/kde_per_population.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/latent_expressions.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/latent_heatmap.md

This file was deleted.

10 changes: 0 additions & 10 deletions docs/api/load.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/api/loss_mmd.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
::: scyan.mmd.LossMMD
::: scyan.module.LossMMD
1 change: 0 additions & 1 deletion docs/api/plot_umap.md

This file was deleted.

35 changes: 35 additions & 0 deletions docs/api/plots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
::: scyan.plot.scatter
options:
show_root_heading: true

::: scyan.plot.umap
options:
show_root_heading: true

::: scyan.plot.probs_per_marker
options:
show_root_heading: true

::: scyan.plot.latent_heatmap
options:
show_root_heading: true

::: scyan.plot.subclusters
options:
show_root_heading: true

::: scyan.plot.latent_expressions
options:
show_root_heading: true

::: scyan.plot.kde_per_population
options:
show_root_heading: true

::: scyan.plot.all_groups
options:
show_root_heading: true

::: scyan.plot.one_group
options:
show_root_heading: true
19 changes: 19 additions & 0 deletions docs/api/preprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
::: scyan.tools.auto_logicle_transform
options:
show_root_heading: true

::: scyan.tools.asinh_transform
options:
show_root_heading: true

::: scyan.tools.inverse_transform
options:
show_root_heading: true

::: scyan.tools.scale
options:
show_root_heading: true

::: scyan.tools.unscale
options:
show_root_heading: true
1 change: 0 additions & 1 deletion docs/api/probs_per_marker.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/read_fcs.md

This file was deleted.

7 changes: 7 additions & 0 deletions docs/api/representation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
::: scyan.tools.umap
options:
show_root_heading: true

::: scyan.tools.subcluster
options:
show_root_heading: true
1 change: 0 additions & 1 deletion docs/api/scale.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/scatter.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/subcluster.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/subclusters.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/umap.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/unscale.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/api/write_fcs.md

This file was deleted.

14 changes: 7 additions & 7 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,21 +68,21 @@ This code should run in approximately 40 seconds (once the dataset is loaded).

### Inputs details

- `adata` is an [AnnData](https://anndata.readthedocs.io/en/latest/) object, whose variables (`adata.var`) corresponds to markers, and observations (`adata.obs`) to cells. `adata.X` is a matrix of size ($N$ cells, $M$ markers) representing cell-marker expressions after being **preprocessed** ([asinh](./api/asinh.md) or [logicle](./api/auto_logicle.md)) and [**standardized**](./api/scale.md).
- `adata` is an [AnnData](https://anndata.readthedocs.io/en/latest/) object, whose variables (`adata.var`) corresponds to markers, and observations (`adata.obs`) to cells. `adata.X` is a matrix of size ($N$ cells, $M$ markers) representing cell-marker expressions after being **preprocessed** ([asinh](/api/preprocessing/#scyan.tools.asinh_transform) or [logicle](/api/preprocessing/#scyan.tools.auto_logicle_transform)) and [**standardized**](/api/preprocessing/#scyan.tools.scale).
- `marker_pop_matrix` is a [pandas DataFrame](https://pandas.pydata.org/) with $P$ rows (one per population) and $M$ columns (one per marker). Each value represents the knowledge about the expected expression, i.e. `-1` for negative expression, `1` for positive expression, or `NA` if we don't know. It can also be any float value such as `0` or `0.5` for mid and low expressions, respectively (use it only when necessary).

!!! note "Help to create the `adata` object and the `marker_pop_matrix`"

Read the [preprocessing tutorial](./tutorials/preprocessing.ipynb) if you have an FCS file and want explanations to initialize `Scyan`.
Read the [preprocessing tutorial](/tutorials/preprocessing) if you have an FCS file and want explanations to initialize `Scyan`.

!!! check

Make sure every marker from the table (i.e. columns names of the DataFrame) is inside the data, i.e. in `adata.var_names`.

## Resources to guide you

- Read the tutorials (e.g. [how to prepare your data](./tutorials/preprocessing.ipynb) or [usage example with interpretability](./tutorials/usage.ipynb)).
- Read our [advice](./advanced/advice.md) to design the knowledge table.
- Read the API to know more about what you can do (e.g. [scyan.Scyan](./api/model.md)).
- [Save and load your own dataset](./advanced/data.md).
- [How to choose the model parameters if you don't want to use the default ones](./advanced/parameters.md).
- Read the tutorials (e.g. [how to prepare your data](/tutorials/preprocessing) or [usage example with interpretability](/tutorials/usage)).
- Read our [advice](/advanced/advice) to design the knowledge table.
- Read the API to know more about what you can do (e.g. [scyan.Scyan](/api/model)).
- [Save and load your own dataset](/advanced/data).
- [How to choose the model parameters if you don't want to use the default ones](/advanced/parameters).
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Scyan stands for **S**ingle-cell **Cy**tometry **A**nnotation **N**etwork. Based
Scyan is a Bayesian probabilistic model composed of a deep invertible neural network called a normalizing flow (the function $f_{\phi}$). It maps a latent distribution of cell expressions into the empirical distribution of cell expressions. This cell distribution is a mixture of gaussian-like distributions representing the sum of a cell-specific and a population-specific term. Also, interpretability and batch effect correction are based on the model latent space — more details in the article's Methods section.

<figure markdown>
![Image title](./assets/overview.png)
![Image title](/assets/overview.png)
<figcaption>a) Overview of the tasks that Scyan can perform. b) Overview of the model architecture. c) One coupling layer, i.e., the elementary unit that composes the Normalizing Flow.</figcaption>
</figure>

Expand Down Expand Up @@ -42,13 +42,13 @@ See [Scyan on Github](https://github.com/MICS-Lab/scyan)
module/ # Folder containing neural network modules
coupling_layer.py # Coupling layer
distribution.py # Prior distribution (called U in the article)
mmd.py # Maximum Mean Discrepancy implementation
real_nvp.py # Normalizing Flow
scyan_module # Core module
plot/ # Plotting tools
...
tools/
... # Preprocessing tools and more
mmd.py # Maximum Mean Discrepancy implementation
model.py # Scyan model class
utils.py # Misc functions
.gitattributes
Expand Down
Loading

0 comments on commit 2628235

Please sign in to comment.