This tool supports extracting BioSample metadata into a user friendly table. It assumes you will provide a list of valid BioSamples and will write out a table.
If an existing table is provided it will be used to update that table with additional columns as present in the collection of samples provided or added to.
biosample2table.py [ --in sample_list.txt] [--sample sampleid sampleid2 ...] --out result_table.tsv [--update]
Lookup a single Biosample and output tab delimited file
./scripts/biosample2table.py -s SAMN12327137 --out result_table.tsv -e [email protected]
Results will be
BioSample collection_date geo_loc_name host isolate sample_type strain
SAMN12327137 2006 USA:Sierra Nevada Mtns Rana sierrae Bd_JAM81 isolate GPL
./scripts/biosample2table.py -s SAMN12327137 --out result_table.csv -e [email protected]
BioSample,collection_date,geo_loc_name,host,isolate,sample_type,strain SAMN12327137,2006,USA:Sierra Nevada Mtns,Rana sierrae,Bd_JAM81,isolate,GPL
./scripts/biosample2table.py -s SRR14174621 --sra --out result_table.tsv -e [email protected]
Results will be
BioSample,SRA_Run,SRA_SampID,Sample No.,age,biomaterial_provider,isolate,sex,tissue,treatment
SAMN18650164,SRR14174621,SRS8658905,17,54,Liverpool Hospital,Diabetic Patient,male,Skin,Midpoint
If you have a list of IDs either SRA or BioSample you can use the --in
option
./scripts/biosample2table.py --in samplefile.txt --out result_table.tsv -e [email protected]
If you had a list of ids you wanted to pass from another program uyou can pass that in and omit the --in
option
echo SAMN18650164 | ./scripts/biosample2table.py --out result_table.tsv -e [email protected]
Omitting the --out
option will print the results out to STDOUT
echo SAMN18650164 | ./scripts/biosample2table.py -e [email protected]
Mixing --in
(or STDIN with no --in
) and the -s
will prefer sample list input provided by the -s
option and ignore any stdin or --in
file input.
echo SAMN18650164 | ./scripts/biosample2table.py -e [email protected]
Argument | Description |
---|---|
-o/--out | Write out the table to this file, if the file already exists, parse it in and reuse the existing data, and add to it. Will use file extension (.tab/.tsv or .csv) to determine output format (default is tab delimited if extension doesn't match). If --out is omitted will write to STDOUT (but will be unable to use previously cached results of course) |
--update | use the existing table for sample ids, but also re-lookup the fields |
-i/--in | Input file of biosamples, one per line for the query, if - is provided as filename will read in list from STDIN |
-s/--sample | List of samples to query instead of providing as input file |
--sra | Instead of BioSamples input IDs are SRR numbers |
Jason Stajich - jason.stajich[at]ucr.edu, http://lab.stajich.org