Running GIL from the Command Line

Running GIL From the Command Line

GIL has two commands, generate_indexes and create_sample_sheets, each with a separate set of arguments which are described below.

If you installed GIL as a package, run GIL as follows:

GIL [COMMAND] [ARGUMENTS]

If instead you cloned the repo without installing, GIL must be run from the top-level GIL directory as follows:

python -m GIL.[COMMAND] [ARGUMENTS]

Below are explanations of the arguments for index generation (generate_indexes command) and creating sample sheets for combinatorial indexing (create_sample_sheets command).

See usage examples for examples of how to use the two commands.

Arguments for Index Generation

The following arguments can be used to customize the indexes and primers generated with the generate_indexes command:

--length

The length of the index. Default: 8.

--dist

The minimum Levenshtein distance between generated indexes. Also used as the minimum distance when filtering out sequences with the --blocklist argument. Default: 3.

--sample-n

Number of indexes to randomly sample at the start. By default, this is 5000 for index length >= 10, and for index length < 10, random sampling is not used (all possible indexes are generated at the start). Using this argument for index length < 10 will override generating all possible indexes.

--min-GC

The lower cutoff for GC content (percent). All indexes will have a strictly higher GC content. Default: 25 (CGCATATT passes, CGATATTA fails).

--max-GC

The upper cutoff for GC content (percent). All indexes will have a strictly lower GC content. Default: 75 (CGCCGATT passes, CGGCGCTA fails).

--max-homopolymer

The maximum length of homopolymer repeat that will be present in the generated indexes. Default: 2 (CCAGTTAG passes, CCCTAGAC fails).

--max-dinu

The maximum length of dinucleotide repeats that will be present in the generated indexes. Default: 2 (ATATGCAC passes, ATATATCG fails).

--allow-start-G

Allow index reads to start with G. Still filters out indexes that start with two Gs (or i5 indexes that end with two Cs).

E.g. both GTATGCAC and GGTAATCG fail by default, but with this flag, GTATGCAC passes while GGTAATCG still fails.

--no-filter-self-priming

Don't filter indexes for high self-priming potential. By default, indexes with high self-priming potential are filtered out.

--library-type

Changes the type of library that primer sequences are designed for. Options: TruSeq, Nextera, or custom. By default the program makes TruSeq-compatible primers. If custom is chosen, sequences for the 5' and 3' ends of both forward and reverse primers must be supplied with the --primer_sequences option.

--primer-sequences

Used to supply custom primer sequences. --library_type custom must be chosen to use this option. Usage is --primer_sequences "<forward 5'> <forward 3'> <reverse 5'> <reverse 3'>". For example, supplying the default TruSeq sequences: --primer_sequences "AATGATACGGCGACCACCGAGATCTACAC ACACTCTTTCCCTACACGACG CAAGCAGAAGACGGCATACGAGAT GTGACTGGAGTTCAGACGTG"

--library-name

The name that appears in all generated files. By default this is the same as --library_type, which is TruSeq by default.

--blocklist

The path to a file containing indexes that the generated indexes should be compatible with; i.e. no generated indexes will have a Levenshtein distance of less than 3 to any index in the blocklist.

The file should be a plain text file with one index per line. The index sequences should be the expected index read (for i5 indexes, the expected read from MiniSeq®, NextSeq®, HiSeq4000, and HiSeq3000 machines). This is the reverse complement of the index sequence present in the primer.

--company

Format used for oligo order sheet. Different companies use different formats for their order sheets and for specifying a phosphorothioate bond. Choices: IDT, Thermo, or Sigma. Eurofins uses the same format as IDT. Default: IDT.

--no-mod

For IDT and Sigma order sheets, don't add a '*' between the last two bases of the primers in the order sheet, which specifies a phosphorothioate bond (protects primers against 3'-5' exonuclease activity of high fidelity polymerases). For Thermo, don't add the letter code for phosphorothioate bond (a different letter for each base). By default, this modification is added to primers in the order sheet.

--seed

Seed for randomly selecting compatible sequences. Setting this allows for reproducible generation of indexes. By default, sequences will be different each time the program is run (no seed).

--out-dir

The path of the directory that all files will be saved to. By default, this is "Output".

Arguments for Sample Sheet Creation

The following arguments can be can be used when making sample sheets (e.g. for combinatorial dual index plates) with the create_sample_sheets command:

--unique

Create sample sheets for unique dual indexes. By default create_sample_sheets will create sample sheets for combinatorial dual indexes without this flag.

--i7s

Required. Path to a TSV file containing i7 indexes, arranged according to their position in the 96-well plate (8 rows of 12 indexes each). The sequence of the indexes should match the index sequence in the primer, not the sequence read during sequencing.

--i5s

Required. Path to a TSV file containing i5 indexes, arranged according to their position in the 96-well plate (8 rows of 12 indexes each). The sequence of the indexes should match the index sequence in the primer, not the sequence read during sequencing.

--plate-name

Required. Name of the plate. This will appear in the index and sample sheet filenames.

--i7-row

Required unless --unique flag is used. The row (A-H) from the i7 plate used to make the CDI plate.

--i5-row

Required unless --unique flag is used. The row (A-H) from the i5 plate used to make the CDI plate.

--out-dir

The path to the directory that index and sample sheets will be saved to. Default is Output/Sample_Sheets, which is the same directory that index and sample sheets are saved to when generating indexes.

Testing

If you've modified the code and you'd like to run our tests, run

python -m unittest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running GIL from the Command Line

Running GIL From the Command Line

Arguments for Index Generation

--length

--dist

--sample-n

--min-GC

--max-GC

--max-homopolymer

--max-dinu

--allow-start-G

--no-filter-self-priming

--library-type

--primer-sequences

--library-name

--blocklist

--company

--no-mod

--seed

--out-dir

Arguments for Sample Sheet Creation

--unique

--i7s

--i5s

--plate-name

--i7-row

--i5-row

--out-dir

Testing

Introduction

Installation

Running GIL From the Command Line

Using the GIL Web App

Usage Examples

Ordering Primers

Clone this wiki locally