search.json

[{"path":"index.html","id":"about","chapter":"1 About","heading":"1 About","text":"user guide perform cell-type classification part single-cell\neQTLGen consortium.Source code project can \naccessed https://github.com/sc-eQTLgen-consortium/WG2-pipeline-classification.Current version state singularity image generating documentation commit 9ae2a75For inquiries please contact:Jose Alquicira Hernandez (j.alquicira@garvan.org.au)Lieke Michielsen (L.C.M.Michielsen@tudelft.nl)","code":""},{"path":"general.html","id":"general","chapter":"2 General","heading":"2 General","text":"cell-type annotation framework implemented project consists supervised classification using two approaches:Azimuth: method implemented Seurat V4 classifies cells mapping query dataset onto supervised PCA space constructed modal-weighted neighbors calculated combining RNA protein data.Azimuth: method implemented Seurat V4 classifies cells mapping query dataset onto supervised PCA space constructed modal-weighted neighbors calculated combining RNA protein data.Hierarchical scPred: approach based two methods:\nscPred: supervised classification method learns low dimensional representation reference dataset. Unlike azimuth (relies lazy classifier), scPred trains binary classifiers cell type using one vs. approach. query dataset projected onto training low dimensional space labels assigned according probability scores. details see https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1862-5\nHierarchical progressive learning (scHPL): method decomposes classification problem smaller sub-tasks according hierarchy representing cell-type relationships. Cells classified progressively using top-bottom approach instead using flat classifier. details see https://www.nature.com/articles/s41467-021-23196-8\nHierarchical scPred: approach based two methods:scPred: supervised classification method learns low dimensional representation reference dataset. Unlike azimuth (relies lazy classifier), scPred trains binary classifiers cell type using one vs. approach. query dataset projected onto training low dimensional space labels assigned according probability scores. details see https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1862-5scPred: supervised classification method learns low dimensional representation reference dataset. Unlike azimuth (relies lazy classifier), scPred trains binary classifiers cell type using one vs. approach. query dataset projected onto training low dimensional space labels assigned according probability scores. details see https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1862-5Hierarchical progressive learning (scHPL): method decomposes classification problem smaller sub-tasks according hierarchy representing cell-type relationships. Cells classified progressively using top-bottom approach instead using flat classifier. details see https://www.nature.com/articles/s41467-021-23196-8Hierarchical progressive learning (scHPL): method decomposes classification problem smaller sub-tasks according hierarchy representing cell-type relationships. Cells classified progressively using top-bottom approach instead using flat classifier. details see https://www.nature.com/articles/s41467-021-23196-8The cell-type classification guided using CITE-seq reference containing:161,764 human PBMCs20,953 genes224 protein markers (CITE-seq)31 cell types“PBMC reference dataset generated part Hao Hao\net al, Cell 2021 paper. comprised data eight volunteers\nenrolled HIV vaccine trial three time point samples\ntaken day 0, 3, 7 following vaccination. 24 samples\nprocessed CITE-seq panel 228 TotalSeq antibodies \ngenerate single-cell RNA ADT data. data integrated\nusing metholody described pre-print linked generate \nweighted nearest neighbor (WNN) representation RNA protein\ndata jointly. WNN representation used Azimuth app \nassign celltypes, embed reference UMAP, impute protein\nlevels query dataset”. Hao Y, et al. Cell. (2021)","code":""},{"path":"getting-started.html","id":"getting-started","chapter":"3 Getting started","heading":"3 Getting started","text":"","code":""},{"path":"getting-started.html","id":"downloading-singularity-image","chapter":"3 Getting started","heading":"3.1 Downloading singularity image","text":"core scripts data necessary perform cell-type classification \navailable singularity container Dropbox:https://www.dropbox.com/sh/ekvdocei8r45jxq/AACA17z7PFNbVkeuSavFvjPNa?dl=0","code":""},{"path":"getting-started.html","id":"cell-type-classification-workflow","chapter":"3 Getting started","heading":"3.2 Cell type classification workflow","text":"algorithm perform cell type classification follows split-map-reduce approach.\nmultiple experimental units contained Seurat object (e.g. pool, batch,\ndataset), data first split multiple objects. Cell type classification\nperformed individually batch finally results aggregated \nsingle Seurat object. figure shows general description process:Overview cell type classification processThe description R script included previous figure included \nnext section.","code":""},{"path":"getting-started.html","id":"singularity-image-components","chapter":"3 Getting started","heading":"3.3 Singularity image components","text":"singularity image cell_classification.sif contains following scripts:split_object.R: Splits Seurat data object multiple objects according \nlevels column metadata,azimuth.R: Performs cell type classification using azimuth approachhier_scpred.R: Applied hierarchical scPred classify cellsreduce.R: initial data split, script aggregates \nresults single Seurat objectcompare_results.R: azimuth hierarchical scPred run, \nscript creates heatmap long-format contingency matrix containing \nclassification concordance two methods.previous scripts share following parameters:--file: Input RDS object directory--: Base filename output files--path: Output directory store results. specified, \ncurrent directory used store output(s)image contains following reference files:pbmc_multimodal.h5seurat: used azimuth classificationhier_scpred.RDS: hierarchical trained model scPred classificationTo call components within container, commonly use following syntaxHowever, make make given directory visible container, use\n-B flagThe previous command allow current directory visible.multiple paths needed visible (e.g. input output directories\ncontained currect directory), please seehttps://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html","code":"# do not run\nsingularity exec cell_classification.sif [COMMAND]# do not run\nsingularity exec -B $PWD cell_classification.sif [COMMAND]"},{"path":"split-data.html","id":"split-data","chapter":"4 Split data","heading":"4 Split data","text":"data integration cell type classification purposes, recommended \nsplit data smaller subsets reflects single homogeneous experimental\nunit. context cell type classification define experimental unit \ngroup cells sharing similar biological technical variance components.\nbatch (e.g pool/well single cell experiment) represents common\nexperimental unit.Therefore, recommend splitting data :Faster classification/integration. Splitting data avoids extensive search\ncell anchors neighbors azimuth approachParallelization. experimental unit can processed analyzed independently\nusing multiple CPU workersAvoiding intra-batch effects query dataset. Cell differences driven \nbatch effects within query dataset may influence performance integration/alignment.recommend splitting data experimental units (e.g. batches). Seurat\nquery object represents single experimental unit, splitting necessary \nsection can skipped.split data, run following code. First, let’s create output directoryLet’s assume Seurat RDS object called query.RDS metadata column name\ncalled pool containing values 1 2 corresponding two independent pools. can split data pool using split.R module.Let’s examine parameter information:can now split data followssplit.R takes two main arguments. --file corresponds seurat object\nfilename .RDS format. --batch expects string corresponding column name\n@metadata slot. group RDS files following *{}_{batch levels}.RDS*\nfilename pattern created. example, two files called query_1.RDS query_2.RDS\nstored step1_split directory.strongly recommend storing results different directories avoid file\noverwriting.","code":"mkdir step1_splitsingularity exec -B $PWD cell_classification.sif \\ \n  Rscript /split.R --helpUsage: split.R [options]\n\n\nOptions:\n        --file=CHARACTER\n                RDS object file name\n\n        --batch=CHARACTER\n                Batch column to split the data\n\n        --out=CHARACTER\n                Output file name [default= split_object]\n\n        --path=CHARACTER\n                Output path to store results [default= .]\n\n        -h, --help\n                Show this help message and exit\nsingularity exec -B $PWD cell_classification.sif \\\n  Rscript /split.R \\\n  --file query.RDS \\\n  --batch pool \\\n  --out query \\\n  --path step1_split"},{"path":"azimuth-classification.html","id":"azimuth-classification","chapter":"5 Azimuth classification","heading":"5 Azimuth classification","text":"Cell type classification using azimuth method can performed using\nmap_azimuth.R.Let’s create output directory store results.check input argumentsFurther data splitting classification can performed within map_azimuth.R\nhowever, ’ll classify cells partitions already created.Although batch parameter provided option perform cell type\nclassification within map_azimuth.R, encourage users previously split data batch using split.R run sequential parallel jobs batch \nshown reduce RAM memory usage reduce computation timeIf batch used, future.globals.maxSize parameter future package can manually changed via --mem argument running map_azimuth.R. default, value set infinity allow complete use allocated memory CPU. However, users may experience issues depending computing infrastructure. value therefore can changed cirmunstances. --mem value must Gb.can classify batch within loop using map_azimuth.R follows:example, output stored step2_azimuth including \nSeurat object containing:Cell type classification (predicted.celltype.l2) + prediction scores\n(predicted.celltype.l2.scores) stored metadataCell type classification (predicted.celltype.l2) + prediction scores\n(predicted.celltype.l2.scores) stored metadataReference-based reductions:\nazimuth_spca: Supervised PCA\nazimuth_umap: UMAP generated using WNN graph\nReference-based reductions:azimuth_spca: Supervised PCAazimuth_umap: UMAP generated using WNN graphA new assay called predicted_ADT containing imputed protein data based \nRNAA new assay called predicted_ADT containing imputed protein data based \nRNAAdditionally, plots azimuth_spca azimuth_umap reductions included\noutputs exploratory data analysis.PCA embeddings projected onto reference supervised PCAQuery dataset projected onto reference UMAP","code":"mkdir step2_azimuthsingularity exec -B $PWD cell_classification.sif \\ \n  Rscript /map_azimuth.R --helpUsage: map_azimuth.R [options]\n\n\nOptions:\n        --file=CHARACTER\n                RDS object file name\n\n        --batch=CHARACTER\n                Batch column. If provided, each group in from the batch columns is mapped to reference independently\n\n        --plan=CHARACTER\n                Strategy to resolve future [default= sequential]:\n                multisession\n                multicore\n                cluster\n                remote\n                transparent\n\n        --workers=NUMERIC\n                Number of workers used for parallelization\n                [default= 1]\n\n        --mem=NUMERIC\n                Maximum allowed total size (in GB) of global variables identified\n                [default= Inf]\n\n        --out=CHARACTER\n                Output file name [default= azimuth]\n\n        --path=CHARACTER\n                Output path to store results [default= .]\n\n        -h, --help\n                Show this help message and exitfor i in $(ls step1_split);\ndo\n  out=$(echo $i | awk 'gsub(\".RDS\", \"\")') # Use same base filename as output\n  singularity run -B $PWD cell_classification.sif \\ \n  Rscript /map_azimuth.R \\\n  --file step1_split/${i} \\\n  --path step2_azimuth \\\n  --out ${out}\ndone"},{"path":"azimuth-classification.html","id":"parallelize-classification","chapter":"5 Azimuth classification","heading":"5.1 Parallelize classification","text":"","code":""},{"path":"azimuth-classification.html","id":"sge-example","chapter":"5 Azimuth classification","heading":"5.2 SGE example","text":"following array job code SGE (Sun Grid Engine) can used guide \nclassify pool individual jobs. code snippet used \nclassify multiple Seurat objects (75 pools) OneK1K dataset.save previous code file (e.g. run_azimuth.sh), can launch\narray job iterating pool name","code":"#$ -N clasify_cells\n#$ -q short.q\n#$ -l mem_requested=50G\n#$ -S /bin/bash\n#$ -r yes\n#$ -cwd \n#$ -o results/2021-10-28_cell_type_annotation\n#$ -e results/2021-10-28_cell_type_annotation\n\n# mkdir results/2021-10-28_cell_type_annotation\n\ncd $SGE_O_WORKDIR\n\n# Set environmental variables\ninput=results/2021-10-28_pools\noutput=results/2021-10-28_cell_type_annotation\n\n# Get job info\necho \"JOB: $JOB_ID TASK: $SGE_TASK_ID HOSTNAME: $HOSTNAME\"\n\n# Get basefile name\nfiles=($(ls ${input} | grep \".RDS\"))\ni=\"$(($SGE_TASK_ID-1))\"\n\nfilename=${files[$i]}\nfilename=$(echo $filename | sed 's/.RDS//')\n\necho \"Classifying: $filename\"\n\n# Run main command\nsingularity exec -B $SGE_O_WORKDIR bin/cell_classification.sif \\\n  Rscript /map_azimuth.R \\\n  --file ${input}/${filename}.RDS \\\n  --out ${filename}_out \\\n  --path ${output}qsub -t 1-75 bin/run_azimuth.sh\n# -t Vector of length equal to the number of pools (.RDS files)"},{"path":"azimuth-classification.html","id":"slurm-example","chapter":"5 Azimuth classification","heading":"5.3 SLURM example","text":"Likewise, can run code using SLURM scheduler follows:save previous code file (e.g. run_azimuth.sbatch), can launch\narray job iterating batch","code":"#!/bin/bash\n#SBATCH -J azimuth\n#SBATCH -N 1\n#SBATCH -n 1\n#SBATCH --time=0:30:00 \n#SBATCH --mem=40GB\n#SBATCH --error=job.%J.err\n#SBATCH --output=job.%J.out\n#SBATCH --mail-type=END,FAIL\n#SBATCH --mail-user=l.c.m.michielsen@lumc.nl\n\n# Clear the environment from any previously loaded modules\nmodule purge \nmodule add container/singularity/3.7.3/gcc.8.3.1\n\n# Set environmental variables\ninput=DataGroningen/step1_split\noutput=DataGroningen/output_Azimuth\n\n# Get job info\necho \"Starting at `date`\"\necho \"JOB: $SLURM_JOB_ID TASK: $SLURM_ARRAY_TASK_ID\"\n\n# Get basefile name\nfiles=($(ls ${input} | grep \".RDS\"))\ni=\"$SLURM_ARRAY_TASK_ID\"\n\nfilename=${files[$i]}\nfilename=$(echo $filename | sed 's/.RDS//')\n\necho \"Classifying: $filename\"\n\n# Run main command\nsingularity exec -B $PWD cell_classification.sif \\\n  Rscript /map_azimuth.R \\\n  --file ${input}/${filename}.RDS \\\n  --batch lane \\\n  --out ${filename}_out \\\n  --path ${output}sbatch -a 0-30 run_azimuth.sbatch\n# -a Vector of length equal to the number .RDS files"},{"path":"hierarchical-scpred-classification.html","id":"hierarchical-scpred-classification","chapter":"6 Hierarchical scPred classification","heading":"6 Hierarchical scPred classification","text":"Cell type classification using Hierarchical scPred method can performed using \nmap_hierscpred.R module.Note: azimuth hierarchical used cell type annotation, please\nuse output RDS files generated map_azimuth.R input map_hierscpred.R\n(viceversa) guarantee classification labels appended files \nSeurat object unnecessarily duplicated. also requirement \nremaining part pipelineLet’s create output directory store results.check input argumentsFurther data splitting classification can performed within map_hierscpred.R\nvia future package, however, ’ll classify cells partitions\nalready created.Let’s assume already classified cells using azimuth approach. can\nuse output Seurat .RDS files input hierchical scPred follows:Similar map_azimuth.R, map_hierscpred.R return Seurat object\nincluding:Cell type classification (scpred_prediction) column metadata","code":"mkdir step3_hierscpredsingularity exec -B $PWD cell_classification.sif \\ \n  Rscript /map_hierscpred.R --helpUsage: map_hierscpred.R [options]\n\n\nOptions:\n        --file=CHARACTER\n                RDS object file name\n\n        --batch=CHARACTER\n                Batch column. If provided, each group in from the batch columns is mapped to reference independently\n\n        --thr=CHARACTER\n                Threshold for rejection. By default no rejection is implemented\n\n        --iter=CHARACTER\n                Maximum number or Harmony iterations\n\n        --plan=CHARACTER\n                Strategy to resolve future [default= sequential]:\n                multisession\n                multicore\n                cluster\n                remote\n                transparent\n\n        --workers=NUMERIC\n                Number of workers used for parallelization\n                [default= 1]\n\n        --mem=NUMERIC\n                Maximum allowed total size (in GB) of global variables identified\n                [default= Inf]\n\n        --out=CHARACTER\n                Output file name [default= hier_scpred]\n\n        --path=CHARACTER\n                Output path to store results [default= .]\n\n        -h, --help\n                Show this help message and exitfor i in $(ls step2_azimuth | grep \".RDS\");\ndo\n  out=$(echo $i | awk 'gsub(\".RDS\", \"\")')\n  Rscript /map_hierscpred.R --file step2_azimuth/${i} --path step3_hierscpred --out ${out}\ndone"},{"path":"hierarchical-scpred-classification.html","id":"parallelize-classification-1","chapter":"6 Hierarchical scPred classification","heading":"6.1 Parallelize classification","text":"","code":""},{"path":"hierarchical-scpred-classification.html","id":"sge-example-1","chapter":"6 Hierarchical scPred classification","heading":"6.1.1 SGE example","text":"following array job code SGE (Sun Grid Engine) can used guide \nclassify pool individual jobs. code snippet used \nclassify multiple Seurat objects (75 pools) OneK1K dataset.save previous code file (e.g. run_hierscpred.sh), can launch\narray job iterating pool name","code":"#$ -N clasify_cells\n#$ -q short.q\n#$ -l mem_requested=50G\n#$ -S /bin/bash\n#$ -r yes\n#$ -cwd \n#$ -o results/2021-11-19_hier_scpred\n#$ -e results/2021-11-19_hier_scpred\n\n# mkdir results/2021-11-19_hier_scpred\n\ncd $SGE_O_WORKDIR\n\n# Set environmental variables\ninput=results/2021-10-28_pools\noutput=results/2021-11-19_hier_scpred\n\n# Get job info\necho \"JOB: $JOB_ID TASK: $SGE_TASK_ID HOSTNAME: $HOSTNAME\"\n\n# Get basefile name\nfiles=($(ls ${input} | grep \".RDS\"))\ni=\"$(($SGE_TASK_ID-1))\"\n\nfilename=${files[$i]}\nfilename=$(echo $filename | sed 's/.RDS//')\n\necho \"Running for: $filename\"\n\n# Run main command\nsingularity exec -B $SGE_O_WORKDIR bin/cell_classification.sif \\\n  Rscript /map_hierscpred.R \\\n  --file ${input}/${filename}.RDS \\\n  --out ${filename}_out \\\n  --path ${output}qsub -t 1-75 bin/run_hierscpred.sh\n# -t Vector of length equal to the number of pools (.RDS files)"},{"path":"hierarchical-scpred-classification.html","id":"slurm","chapter":"6 Hierarchical scPred classification","heading":"6.1.2 SLURM","text":"Likewise, can run code using SLURM scheduler follows:save previous code file (e.g. run_hierscpred.sbatch), can launch\narray job iterating batch","code":"#!/bin/bash\n#SBATCH -J hierscpred\n#SBATCH -N 1\n#SBATCH -n 1\n#SBATCH --time=1:00:00 \n#SBATCH --mem=10GB\n#SBATCH --error=job.%J.err\n#SBATCH --output=job.%J.out\n#SBATCH --mail-type=END,FAIL\n#SBATCH --mail-user=l.c.m.michielsen@lumc.nl\n\n# Clear the environment from any previously loaded modules\nmodule purge \nmodule add container/singularity/3.7.3/gcc.8.3.1\n\n# Set environmental variables\ninput=DataGroningen/output_Azimuth\noutput=DataGroningen/output_HierscPred\n\n# Get job info\necho \"Starting at `date`\"\necho \"JOB: $SLURM_JOB_ID TASK: $SLURM_ARRAY_TASK_ID\"\n\n# Get basefile name\nfiles=($(ls ${input} | grep \".RDS\"))\ni=\"$SLURM_ARRAY_TASK_ID\"\n\nfilename=${files[$i]}\nfilename=$(echo $filename | sed 's/.RDS//')\n\necho \"Classifying: $filename\"\n\n# Run main command\nsingularity exec -B $PWD cell_classification.sif \\\n  Rscript /map_hierscpred.R \\\n  --file ${input}/${filename}.RDS \\\n  --out ${filename}_out \\\n  --path ${output}sbatch -a 0-30 run_hierscpred.sbatch\n# -a Vector of length equal to the number .RDS files"},{"path":"merge-data.html","id":"merge-data","chapter":"7 Merge data","heading":"7 Merge data","text":"cell type classification performed experimental units (e.g. batches),\ndata can merged single Seurat object using reduce.R script.Let’s create output directory store results.check input argumentsLet’s assume directory called step3_hierscpred containing\nmultiple .RDS files corresponding different batches. can\nmerge Seurat objects follows:reduce.R output single Seurat object called reduced_data.RDS within \nstep4_reduce directory","code":"mkdir step4_reducesingularity exec -B $PWD cell_classification.sif \\  \n  Rscript /reduce.R --helpsingularity exec -B $PWD cell_classification.sif \\ \n            Rscript /reduce.R \\\n            --file step3_hierscpred \\\n            --out reduced_data \\\n            --path step4_reduce"},{"path":"compare-classifications.html","id":"compare-classifications","chapter":"8 Compare classifications","heading":"8 Compare classifications","text":"two methods used classify cells (e.g. azimuth hierarchical scPred),\ncan plot contingeny table showing concordance cell type two methods.Let’s create output directoryAnd check input argumentsLet’s assume directory called step4_reduce single .RDS file\ncells classified azimuth hierarchical scPred. can generate\nheatmap showing counts proportion cell type concordance \ntwo methods well text file numeric results follows:compare.R output two heatmap plots corresponding counts proportion\ncell type concordance two methods stored step5_compare\ndirectory.","code":"mkdir step5_comparesingularity exec -B $PWD cell_classification.sif \\  \n  Rscript /compare.R --helpUsage: compare.R [options]\n\n\nOptions:\n        --file=CHARACTER\n                RDS object file name\n\n        --xaxis=CHARACTER\n                Column in metadata\n\n        --yaxis=CHARACTER\n                Column in metadata\n\n        --sort=CHARACTER\n                Sort labels in both axes to match cell type hierarchy?\n\n        --out=CHARACTER\n                Output file name [default= comp]\n\n        --path=CHARACTER\n                Output path to store results [default= .]\n\n        -h, --help\n                Show this help message and exitsingularity exec -B $PWD cell_classification.sif \\ \n  Rscript /compare.R \\\n  --file step4_reduce/query.RDS \\\n  --out comparison \\\n  --path step5_compare"}]