-
Notifications
You must be signed in to change notification settings - Fork 3
Running GIL from the Command Line
GIL has two commands, generate_indexes
and create_sample_sheets
, each with a separate set of arguments which are described below.
If you installed GIL as a package, run GIL as follows:
GIL [COMMAND] [ARGUMENTS]
If instead you cloned the repo without installing, GIL must be run from the top-level GIL directory as follows:
python -m GIL.[COMMAND] [ARGUMENTS]
Below are explanations of the arguments for index generation (generate_indexes
command) and
creating sample sheets for combinatorial indexing (create_sample_sheets
command).
See usage examples for examples of how to use the two commands.
The following arguments can be used to customize the indexes and primers generated with the generate_indexes
command:
The length of the index. Default: 8.
The minimum Levenshtein distance between generated indexes. Also used as the minimum distance
when filtering out sequences with the --blocklist
argument. Default: 3.
Number of indexes to randomly sample at the start. By default, this is 5000 for index length >= 10, and for index length < 10, random sampling is not used (all possible indexes are generated at the start). Using this argument for index length < 10 will override generating all possible indexes.
The lower cutoff for GC content (percent). All indexes will have a strictly higher GC content. Default: 25 (CGCATATT passes, CGATATTA fails).
The upper cutoff for GC content (percent). All indexes will have a strictly lower GC content. Default: 75 (CGCCGATT passes, CGGCGCTA fails).
The maximum length of homopolymer repeat that will be present in the generated indexes. Default: 2 (CCAGTTAG passes, CCCTAGAC fails).
The maximum length of dinucleotide repeats that will be present in the generated indexes. Default: 2 (ATATGCAC passes, ATATATCG fails).
Allow index reads to start with G. Still filters out indexes that start with two Gs (or i5 indexes that end with two Cs).
E.g. both GTATGCAC and GGTAATCG fail by default, but with this flag, GTATGCAC passes while GGTAATCG still fails.
Don't filter indexes for high self-priming potential. By default, indexes with high self-priming potential are filtered out.
Changes the type of library that primer sequences are designed for. Options: TruSeq
, Nextera
, or custom
.
By default the program makes TruSeq-compatible primers. If custom
is chosen, sequences for the 5' and 3' ends
of both forward and reverse primers must be supplied with the --primer_sequences
option.
Used to supply custom primer sequences. --library_type custom
must be chosen to use this option.
Usage is --primer_sequences "<forward 5'> <forward 3'> <reverse 5'> <reverse 3'>"
. For example,
supplying the default TruSeq sequences: --primer_sequences "AATGATACGGCGACCACCGAGATCTACAC ACACTCTTTCCCTACACGACG CAAGCAGAAGACGGCATACGAGAT GTGACTGGAGTTCAGACGTG"
The name that appears in all generated files. By default this is the same as --library_type
,
which is TruSeq by default.
The path to a file containing indexes that the generated indexes should be compatible with; i.e. no generated indexes will have a Levenshtein distance of less than 3 to any index in the blocklist.
The file should be a plain text file with one index per line. The index sequences should be the expected index read (for i5 indexes, the expected read from MiniSeq®, NextSeq®, HiSeq4000, and HiSeq3000 machines). This is the reverse complement of the index sequence present in the primer.
Format used for oligo order sheet. Different companies use different formats for their order sheets and for
specifying a phosphorothioate bond. Choices: IDT
, Thermo
, or Sigma
. Eurofins uses the same format as IDT.
Default: IDT
.
For IDT and Sigma order sheets, don't add a '*' between the last two bases of the primers in the order sheet, which specifies a phosphorothioate bond (protects primers against 3'-5' exonuclease activity of high fidelity polymerases). For Thermo, don't add the letter code for phosphorothioate bond (a different letter for each base). By default, this modification is added to primers in the order sheet.
Seed for randomly selecting compatible sequences. Setting this allows for reproducible generation of indexes. By default, sequences will be different each time the program is run (no seed).
The path of the directory that all files will be saved to. By default, this is "Output".
The following arguments can be can be used when making sample sheets
(e.g. for combinatorial dual index plates) with the create_sample_sheets
command:
Create sample sheets for unique dual indexes. By default create_sample_sheets will create sample sheets for combinatorial dual indexes without this flag.
Required. Path to a TSV file containing i7 indexes, arranged according to their position in the 96-well plate (8 rows of 12 indexes each). The sequence of the indexes should match the index sequence in the primer, not the sequence read during sequencing.
Required. Path to a TSV file containing i5 indexes, arranged according to their position in the 96-well plate (8 rows of 12 indexes each). The sequence of the indexes should match the index sequence in the primer, not the sequence read during sequencing.
Required. Name of the plate. This will appear in the index and sample sheet filenames.
Required unless --unique
flag is used. The row (A-H) from the i7 plate used to make the CDI plate.
Required unless --unique
flag is used. The row (A-H) from the i5 plate used to make the CDI plate.
The path to the directory that index and sample sheets will be saved to. Default is Output/Sample_Sheets
, which is
the same directory that index and sample sheets are saved to when generating indexes.
If you've modified the code and you'd like to run our tests, run
python -m unittest
from the top level GIL directory.