Cell differentiation mapping from lineage tracing data using Carta. (a) The input to Carta is one or more cell lineage trees, whose leaves are labeled by the measured cell type (labeled A, B, C) of the sequenced cells. Typically, some cells that are present at the time of the experiment are not sampled (denoted by dotted lines). (b) Carta infers a cell differentiation map that describes the progenitor cell types – represented as a potency set – and cell type transitions that occurred during development. (c) Carta quantifies the trade-off between the number k of progenitor cell types in the cell differentiation map and its discrepancy with the cell lineage trees by computing the Pareto front of optimal solutions. A cell differentiation map with the optimal number k* of progenitors is chosen by identifying an elbow of the Pareto front.
- numpy
- pandas
- gurobipy
- networkx
- cassiopeia
- ete3
- (for generating simulation and real data instances) snakemake (>=5.2.0)
-
Create a new conda environment named "carta" and install dependencies:
conda create -n carta python==3.10.13 pip
-
Then activate the created environment:
conda activate carta
-
Clone the git repository
git clone [email protected]:raphael-group/CARTA.git
-
Use
pip
to install the package:cd carta pip install .
-
Create a new conda environment named "carta" and install dependencies:
conda create -n carta python==3.10.13 pip
-
Then activate the created environment:
conda activate carta
-
Install carta-cell from pip:
pip install carta-cell==1.0
-
Install cassiopeia:
pip install git+https://github.com/YosefLab/Cassiopeia@master#egg=cassiopeia-lineage
The input for CARTA is
- A tab-delimited file, which has on each line the locations of the newick and state annotation files of the set of cell lineage trees over which to infer the cell differentiation map.
- Example for location file
test/test_locations.txt
- Example for newick file
test/concat_TLS.nwk
- Example for state annotation file
test/concat_TLS_meta_reduced.txt
- Example for location file
- A file containing all terminal cell types must be provided, with each cell type on its own line.
- Example for states file:
test/TLS_states.txt
- Example for states file:
- [Optional] A file with list of putative progenitors in csv format encoding a binary matrix, where each row is progenitor and column is a cell type.
- Example for putative progenitors file:
test/progen_mat.csv
- Example for putative progenitors file:
usage: carta [-h] [--prefix PREFIX] [-ks [KLIST ...]] [--file_locations FILE_LOCATIONS] [--states_file STATES_FILE]
[--normalize_method {no_normalization,cell_proportion_before_pruning,cell_proportion_after_pruning}]
[--time_limit_sec TIME_LIMIT_SEC] [--enforce_tree] [-m {exact,round}] [--progen_matrix PROGEN_MATRIX]
options:
-h, --help show this help message and exit
--prefix PREFIX filepath prefix for this run
-ks [KLIST ...], --klist [KLIST ...]
list of number of progenitors
--file_locations FILE_LOCATIONS
a file that on each line specifies the filepath of a newick file and a corresponding state metadata file in
that order, seperated by a tab
--states_file STATES_FILE
a file containing the cell states present in this dataset
--normalize_method {no_normalization,cell_proportion_before_pruning,cell_proportion_after_pruning}
the scheme by which to normalize the objective function
--time_limit_sec TIME_LIMIT_SEC
time limit on ilp runs
--enforce_tree whether or not to enforce that the progenitors can create a tree
-m {exact,round}, --mode {exact,round}
mode for heuristic progenitor set solver
--progen_matrix PROGEN_MATRIX
- Without putative progenitors
- Progenitors file: provides the set of selected progenitors in the cell differentiation map. Each line is a progenitor.
- Example:
test/test_Carta_progenitors_3.txt
- Example:
- Node labeling file: tab-separated file where each row descrbeis the progenitor labeling of a node in a cell linage tree.
- Example:
test/test_Carta_nodeLabels_3.txt
- Example:
- Progenitors file: provides the set of selected progenitors in the cell differentiation map. Each line is a progenitor.
- With putative progenitors
- Progenitors file: provides the set of selected progenitors in the cell differentiation map. Each line is a progenitor.
- Example:
test/test_fastCarta_progenitors_3.txt
- Example:
- Edge labeling file: tab-separated file where each row describes the progenitor labeling of source and target node of an each in a cell lineage tree.
- Progenitors file: provides the set of selected progenitors in the cell differentiation map. Each line is a progenitor.
An example of usage is as follows. This command can be run from the directory that contains this README file.
carta --file_locations test/test_locations.txt --states_file test/TLS_states.txt --klist 1 2 3 --prefix test/test_Carta
An example while providing a set of putative progenitors as input.
carta --file_locations test/test_locations.txt --states_file test/TLS_states.txt --klist 1 2 3 --prefix test/test_fastCarta --progen_matrix test/progen_mat.csv
The newick files for the TLS cell lineage trees are stored in data/TLS/input_trees
and the metadata files containing the cell type annotations are stored in data/TLS/formatted_and_reduced_labels
.
The input files for mouse hematopoiesis data are stored in data/larry/input
. The format of this data is slightly different and we use the code scripts/larry.py
which implements the CARTA-tree ILP that can take the mouse hematopoiesis data as input. The raw mouse hematopoiesis data is availabe at https://github.com/AllonKleinLab/paper-data/tree/master/Lineage_tracing_on_transcriptional_landscapes_links_state_to_fate_during_differentiation
The Snakefiles to generate the simulations and run cell differentiation mapping methods (including Carta) are provided in data/simulations
.