Skip to content

Commit

Permalink
Release prep for v0.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
kevin931 committed Jun 18, 2022
1 parent 6b960f1 commit de94fa8
Show file tree
Hide file tree
Showing 8 changed files with 310 additions and 106 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@

# Pyton Build Files and Directory
*.egg-info
/build
/dist
/dist_conda

# CI: Testing and coverage
*.coverage
Expand Down
2 changes: 1 addition & 1 deletion LICENSE_associated.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Usage: Project ideation and links to HDCytoData server

Project: poetic
Link: https://github.com/kevin931/poetic
Usgae: Implementation of DataLoader in data.py
Usgae: Implementation of DataLoader in data.py; setup.py build commands.

The MIT License (MIT)

Expand Down
200 changes: 120 additions & 80 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,84 +5,80 @@ This package is an all-in-one CyTOF data analysis package for your experiments.

## Installation

We're currently under development! To install, you can do the following:
You can install ``PyCytoData`` easily from ``pip``:

```shell
git clone https://github.com/kevin931/PyCytoData
cd PyCytoData
python setup.py deveop
```
pip install PyCytoData
```

or from ``conda``:

This approach will allow you to use the package while developing!
```
conda install pycytodata -c kevin931 -c bioconda
```

### Dependencies
If you wish to use ``CytofDR`` along with PyCytoData, use can optionally install it as well:

We need the following dependencies:
```
pip install CytofDR
```

- fcsparser
- pandas
- numpy
For more information on optional dependencies or installation details, look [here](https://pycytodata.readthedocs.io/en/latest/installation.html).

## Install and Load Benchmark Datasets

You can load the data easily with the following python snippet:

```python
from PyCytoData import DataLoader
>>> from PyCytoData import DataLoader

exprs = DataLoader.load_dataset(dataset = "levine13")
exprs.expression_matrix # Expression matrix
exprs.cell_types # Cell types
exprs.sample_index # Sample index
exprs.features # The feature/marker names
>>> exprs = DataLoader.load_dataset(dataset = "levine13")
>>> exprs.expression_matrix # Expression matrix
>>> exprs.cell_types # Cell types
>>> exprs.sample_index # Sample index
>>> exprs.features # The feature/marker names
```

The resulting ``exprs`` is a ``PyCytoData`` object, which is easy to use. The expression matrix, cell types (if available), and sample index are directly accessible with attributes, and they are all stored as **numpy.array**. You can also access some metadata of the object with the following attributes:

```python
exprs.n_cells
exprs.n_cell_types
exprs.n_samples
exprs.n_features
>>> exprs.n_cells
>>> exprs.n_cell_types
>>> exprs.n_samples
>>> exprs.n_features
```

All these metadata is automatically set, and there is protection in place for unintended changes. You can also add a sample with the following:

```python
exprs.add_sample(expression_matrix, cell_types, sample_index) # All inputs should be ArrayLike
>>> exprs.add_sample(expression_matrix, cell_types, sample_index) # All inputs should be ArrayLike
```

**Note**: The data are downloaded from a server instead of being shipped with this package. Each dataset only needs to be downloaded once, which is automatically managed. During the first-time download of the data, a command-line confirmation is needed. To override this, you can do the following:

```python
from PyCytoData import DataLoader

exprs = DataLoader.load_dataset(dataset = "levine13", force_download = True)
```
**Note**: The data are downloaded from a server instead of being shipped with this package. Each dataset only needs to be downloaded once, which is automatically managed. During the first-time download of the data, a command-line confirmation is needed.

## Bring Your Own Dataset (BYOD)

Yes, you read it right! You can load your own datasets. Currently, we only support reading in plain text files with saved with delimiters. The data need to have cells as rows and features as columns. To do load them in as a ``PyCytoData`` object, you can simply do the following:

```python
from PyCytoData import FileIO
>>> from PyCytoData import FileIO

FileIO.load_delim(files="/path", # Path to file
col_names=True, # Whether the first row is feature (column) names
delim="\t" # Delimiter
)
>>> FileIO.load_delim(files="/path", # Path to file
... col_names=True, # Whether the first row is feature (column) names
... delim="\t" # Delimiter
... )
```

If your experiment has multiple samples, you can simply import them together:

```python
from PyCytoData import FileIO
>>> from PyCytoData import FileIO

expression_paths = ["path1", "path2", "path3"]
FileIO.load_delim(files=expression_paths, # Path to file
col_names=True, # Whether the first row is feature (column) names
delim="\t" # Delimiter
)
>>> expression_paths = ["path1", "path2", "path3"]
>>> FileIO.load_delim(files=expression_paths, # Path to file
... col_names=True, # Whether the first row is feature (column) names
... delim="\t" # Delimiter
... )
```

In this case, the expression matrices are concatenated automatically without any normalization. To access particular samples, you can access the ``sample_index`` of the attribute and use the standard ``numpy`` indexing techniques.
Expand All @@ -94,44 +90,58 @@ In this case, the expression matrices are concatenated automatically without any
Currently, ``levine13``, ``levine32``, and ``samusik`` have all been mostly preprocessed. All you need to do is to perform ``aecsinh`` transformaion. You can simply do this:

```python
from PyCytoData import DataLoader
>>> from PyCytoData import DataLoader

exprs = DataLoader.load_dataset(dataset = "levine13")
exprs.preprocess(arcsinh=True)
>>> exprs = DataLoader.load_dataset(dataset = "levine13")
>>> exprs.preprocess(arcsinh=True)
```

When you perform BYOD, you can have much more flexibility:

```python
from PyCytoData import FileIO

byod = FileIO.load_delim(files="/path", # Path to file
col_names=True, # Whether the first row is feature (column) names
delim="\t" # Delimiter
)
byod.lineage_channels = ["CD4", "CD8", "FoxP3", "CD15"]
byod.preprocess(arcsinh=True,
gate_debris_removal=True,
gate_intact_cells=True,
gate_live_cells=True,
gate_center_offset_residual=True,
bead_normalization=True)
>>> from PyCytoData import FileIO

>>> byod = FileIO.load_delim(files="/path", # Path to file
... col_names=True, # Whether the first row is feature (column) names
... delim="\t" # Delimiter
... )
>>> byod.lineage_channels = ["CD4", "CD8", "FoxP3", "CD15"]
>>> byod.preprocess(arcsinh=True,
... gate_debris_removal=True,
... gate_intact_cells=True,
... gate_live_cells=True,
... gate_center_offset_residual=True,
... bead_normalization=True)

byod.expression_matrix # This is preprocessed
```
As the example shows, we support five unique preprocessing steps! And of course, you can use a subset of these to suit your own needs! By default, we automatically detect the necessary channels, such as "Bead1" or "Center". However, if your dataset is unconventionally named, our auto-detect algorithm may fail. Thus, we can perform a manual override:

```python
byod.preprocess(arcsinh=True,
gate_debris_removal=True,
gate_intact_cells=True,
gate_live_cells=True,
gate_center_offset_residual=True,
bead_normalization=True,
bead_channels = ["1bead", "2bead"],
time_channel = ["clock"])
>>> byod.preprocess(arcsinh=True,
... gate_debris_removal=True,
... gate_intact_cells=True,
... gate_live_cells=True,
... gate_center_offset_residual=True,
... bead_normalization=True,
... bead_channels = ["1bead", "2bead"],
... time_channel = ["clock"])
```

## Dimension Reduction

If you wish to run DR on your dataset, you can easily do so as well if you have ``CytofDR`` installed (assume you have loaded the dataset and preprocessed it accordingly):

```python
>>> exprs.run_dr_methods(methods = ["PCA", "UMAP", "ICA"])
Running PCA
Running ICA
Running UMAP
>>> type(exprs.reductions)
<class 'CytofDR.dr.Reductions'>
```
The ``reductions`` attribute is a ``Reductions`` object from ``CytofDR``. You can perform all downstream DR workflows as usual.

## Datasets Supported

We only support the following datasets as of now. The *Literal* is the string literal used in this package to refer to the datasets whereas the *Dataset Name* is what these datasets are more commonly known for.
Expand All @@ -146,29 +156,59 @@ More datasets will be added in the future to be fully compatible with HDCytoData

## Documentation

We use ``sphinx`` and ``readthedocs`` for documentation! You will need to install the following packages:
For detailed documentation along with tutorials and API Reference, please visit our [Official Documentation](https://pycytodata.readthedocs.io/en/latest/). This is automatically updated with each update.

- sphinx
- sphinx-rtd-theme
- sphinx-git
- sphinxcontrib-autoprogram
- sphinx-autodoc-typehints
If you prefer to build documentation on your own, refer to [this guide](https://pycytodata.readthedocs.io/en/latest/change/build.html) for more details.

We currently don't have an online documentation. You will need to build the docs on your own! More detailed docs coming soon!
## Latest Release: 0.0.1

## Unit Testing
This is our latest pre-release with the following release notes:

You will need the following packages:
- This is the first official prerelease of the ``PyCytoData`` package.
- We have proper support for the following workflows, including:
- Downloading data
- Using PyCytoData as CyTOF data analysis pipeline
- FileIO
- CyTOF DR Integration
- Releases on PyPI and conda

- pytest
- pytest-cov
- pytest-mock
- coverage
### Known Issue

There is a potential issue of compatibility with ``CytofDR`` on ``conda``. If a problem occurs, try
using pip instead.

## References

[Levine J.H., Simonds E.F. Bendall S.C., Davis KL, Amir el-A.D., Tadmor M.D., Litvin O., Fienberg H.G., Jager A., Zunder E.R., Finck R., Gedman A.L., Radtke I., Downing J.R., & Pe'er D., Nolan G.P. "Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis." *Cell*. 2015 Jul 2;162(1):184-97. doi: 10.1016/j.cell.2015.05.047.](https://pubmed.ncbi.nlm.nih.gov/26095251/)
If you use ``PyCytoData`` to perform DR, citing the [our DR Review paper](https://doi.org/10.1101/2022.04.26.489549) is highly appreciated:

```
@article {Wang2022.04.26.489549,
author = {Wang, Kaiwen and Yang, Yuqiu and Wu, Fangjiang and Song, Bing and Wang, Xinlei and Wang, Tao},
title = {Comparative Analysis of Dimension Reduction Methods for Cytometry by Time-of-Flight Data},
elocation-id = {2022.04.26.489549},
year = {2022},
doi = {10.1101/2022.04.26.489549},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/06/02/2022.04.26.489549},
eprint = {https://www.biorxiv.org/content/early/2022/06/02/2022.04.26.489549.full.pdf},
journal = {bioRxiv}
}
```

If you use ``Cytomulate`` with this package, [our paper](https://doi.org/10.1101/2022.06.14.496200) can be cited here:

[Samusik et al. (2016), "Automated mapping of phenotype space with single-cell data", *Nature Methods, 13*(6), 493-496](https://www.ncbi.nlm.nih.gov/pubmed/27183440)
```
@article {Yang2022.06.14.496200,
author = {Yang, Yuqiu and Wang, Kaiwen and Lu, Zeyu and Wang, Tao and Wang, Xinlei},
title = {Cytomulate: Accurate and Efficient Simulation of CyTOF data},
elocation-id = {2022.06.14.496200},
year = {2022},
doi = {10.1101/2022.06.14.496200},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.496200},
eprint = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.496200.full.pdf},
journal = {bioRxiv}
}
```

[Weber L.M. and Soneson C. (2019). "HDCytoData: Collection of high-dimensional cytometry benchmark datasets in Bioconductor object formats." *F1000Research, 8*:1459, v2.](https://f1000research.com/articles/8-1459)
If you use the builtin datasets, please visit our [Reference Page](https://pycytodata.readthedocs.io/en/latest/references.html) and cite the papers accordingly.
19 changes: 12 additions & 7 deletions docs/source/change/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,19 @@ Latest Release

v0.0.1
********
- This is the first official prerelease of the ``PyCytoData`` package.
- We have proper support for the following workflows, including:
- Downloading data
- Using PyCytoData as CyTOF data analysis pipeline
- FileIO
- CyTOF DR Integration
- Releases on PyPI and conda

- This is the first official prerelease of the ``PyCytoData`` package.
- We have proper support for the following workflows, including:
- Downloading data
- Using PyCytoData as CyTOF data analysis pipeline
- FileIO
- CyTOF DR Integration
- Releases on PyPI and conda

.. warning::

There is a potential issue of compatibility with ``CytofDR`` on ``conda``. If a problem occurs, try
using pip instead.


.. toctree::
Expand Down
27 changes: 18 additions & 9 deletions docs/source/change/releases.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,22 @@
==========
##########
Releases
==========
##########

------------------

********
v0.0.1
********
- This is the first official prerelease of the ``PyCytoData`` package.
- We have proper support for the following workflows, including:
- Downloading data
- Using PyCytoData as CyTOF data analysis pipeline
- FileIO
- CyTOF DR Integration
- Releases on PyPI and conda

- This is the first official prerelease of the ``PyCytoData`` package.
- We have proper support for the following workflows, including:
- Downloading data
- Using PyCytoData as CyTOF data analysis pipeline
- FileIO
- CyTOF DR Integration
- Releases on PyPI and conda

.. warning::

There is a potential issue of compatibility with ``CytofDR`` on ``conda``. If a problem occurs, try
using pip instead.
19 changes: 15 additions & 4 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,29 @@ if you prefer. Just follow the instructions below and you are good to go!
Conda
***********

It's a great idea to release your package on Conda!
You can install our package on ``conda``:

.. code-block::
---------
conda install pycytodata -c kevin931 -c bioconda
Our ``conda`` package is published `here <https://anaconda.org/kevin931/pycytodata>`_.

----------------

***********
PyPI
***********

It's a great idea to release your package on PyPI!
You can also install our package on from ``PyPI``:

---------
.. code-block::
pip install PyCytoData
Our ``PyPI`` package is published `on this page <https://pypi.org/project/PyCytoData/>`_.

----------------

*************
Dependencies
Expand Down
Loading

0 comments on commit de94fa8

Please sign in to comment.