-
Notifications
You must be signed in to change notification settings - Fork 10
Extract data from some samples cells from BAM file
The following examples assume sample names are encoded in the SM:
tag present in each bam/sam record.
In this example we will extract reads from two selected cells from input.bam
to cell314and315.bam
.
First prepare a file containing the names of the samples/cells you wish to extract from the input bam file:
my_experiment_cell_314
my_experiment_cell_315
Store this list of sample names in a text file, for example in extract_test.txt
Then run
bamExtractSamples.py input.bam extract_test.txt -o cell314and315.bam
This will extract reads with samples my_experiment_cell_314
and my_experiment_cell_315
from the bam file input.bam
and write them to cell314and315.bam
To extract multiple groups of cells in one go add a second column to the extraction text file. In this example we extract cells 20 to 25 to one bam file (case sample), and 26 to 30 to another (control sample).
extract_test.txt
looks like this:
cell_20 CASE
cell_21 CASE
cell_22 CASE
cell_23 CASE
cell_24 CASE
cell_25 CASE
cell_26 CONTROL
cell_27 CONTROL
cell_28 CONTROL
cell_29 CONTROL
cell_30 CONTROL
We then run
bamExtractSamples.py input.bam extract_test.txt -o output_.bam
This will read reads from input.bam
and write the reads from the samples listed in extract_test.txt to two bam files: output_CASE.bam
and output_CONTROL.bam
. If required you can extract more than two groups at once by specifying more group names in the second column.
bamSplitByTag.py input.bam SM -o single_
Extracting a subset of reads can be done using bamFilter.py
.
The following line extracts cells with sample names (SM) my_experiment_cell_315 and my_experiment_cell_314, and writes it to cell_315_and_314.bam .
bamFilter.py input.bam "r.get_tag('SM') in ['my_experiment_cell_315','my_experiment_cell_314']" -o cell_315_and_314.bam