This is an example spacegraphcats project using some sequences from a Doryteuthis RNAseq assembly to demonstrate the basic workflow.
Follow the spacegraphcats install instructions for conda.
Then,
git clone https://github.com/spacegraphcats/spacegraphcats-dory-example
cd spacegraphcats-dory-example/
and now you can run the spacegraphcats workflow specified in config.yaml
:
spacegraphcats run config.yaml extract_reads
./data/dory-subset.fa
- the data used to make the graph (e.g. reads or assembly)./query/dory-head.fa
- query sequencesconfig.yaml
- a YAML config file that provides settings etc.
This is the cDBG build directory.
bcalm.dory.k21.unitigs.fa
- compact De Bruijn graph (cDBG) ofdory-subset.fa
built by BCALM.bcalm.dory.k21.unitigs.fa.log.txt
- BCALM build log.bcalm.dory.k21.unitigs.fa.sig
- a sourmash signature of the cDBG.dory.reads.bgz
- BGZF file of input reads.
This contains the info necessary to do graph queries etc. Building this is generally the most time consuming.
The directory name is constructed from the parameters in the
config.yaml
file.
cdbg.gxt
- cDBG structure.commands.log
- an (incomplete) log of executed commands.contigs.fa.gz
- a (potentially cleaned up) collection of cDBG contigs.contigs.fa.gz.indices
- something something database something.contigs.fa.gz.info.csv
- summary information for each cDBG node.contigs.fa.gz.mphf
- Minimal Perfect Hash Function database forcontigs.fa.gz
.contigs.fa.gz.sig
- a sourmash signature of the contigs, post processing.first_doms.txt
- dominating set.catlas.csv
- the catlas.reads.bgz.labels
- labeled reads for fast retrieval.
This contains the output of the search.
command.txt
- an (incomplete) log of executed commands.dory-head.fa.cdbg_ids.reads.fa.gz
- all of the input sequences in the neighborhood of this query; THIS IS THE MOST INTERESTING FILE :)dory-head.fa.cdbg_ids.txt.gz
- list of cDBG nodes.dory-head.fa.contigs.sig
- sourmash signature of output.dory-head.fa.frontier.txt.gz
- list of frontier nodes (catlas).dory-head.fa.response.txt
- response curve.results.csv
- summary of results from all searches.