- Migrate to python3 in full and stop supporting python2
- Open-source project
- Bug fixes to chunkify from raw data
- Take the
--drop
into account when reporting label accuracy during training
- Can train and basecall off raw data
- Examples in
models/
directory use temporal convolution to extract features from the raw signal - Training data may be labelled from events or remapped on the fly using a pretrained raw data model
- The interfaces for most scripts have changed to support raw data:
./bin/basecall_network
and./bin/train_network.py
provideraw
andevents
routes./bin/chunkify.py
providesidentity
,remap
,raw_identity
andraw_remap
routes
- Examples in
./bin/basecall_network.py
may be used via./bin/basecall_network
entry point that sets up its environment- Full builds are performed on CI only for
master
,release
and branches with names havingci_
prefix - Layers:
- All layers now have
insize
andsize
attributes; models pickled with previous versions of Sloika will not be unpicklabe in Sloika 1.2 - Added
Convolution
andMaxPool
layers
- All layers now have
- Improvements to
pekarnya.py
:- Updated database schema
- Runs are marked with time stamps and git commits
- Uniquely generated output directories facilitate restart of failed jobs
- New script
./bin/extract_reference.py
for extraction of references from a directory of fast5 files
- Minimum required numpy version bumped to 1.9.0
verify_network.py
tests network execution on random inputsalign.py
outputs reference coordinatesalign.py
fixed under python3- Randomly choosing chunk size during training is now the default
Addresses the following issue:
- Make Sloika 1.1 avoid h5py version 2.7.0. This version works in Dev environment, but we are unable to import it on CI nodes h5py/h5py#803
Addresses the following issues:
- Brown bag 1 did not update changelog https://git.oxfordnanolabs.local/algorithm/sloika/issues/43
- Fixes TypeError in json dump script when invoked with
--out_file
option https://git.oxfordnanolabs.local/algorithm/sloika/issues/52
Addresses the following issues:
- Remapping doesn't appear to work with 2d reads https://git.oxfordnanolabs.local/algorithm/sloika/issues/40
- Can't change segmentation in chunkify remap https://git.oxfordnanolabs.local/algorithm/sloika/issues/41
- Work around Isilon problems https://git.oxfordnanolabs.local/algorithm/sloika/issues/42
- Activation functions have been separated into their own module and many new functions have been added See https://wiki/display/~tmassingham/2016/10/17/Activation+functions Note: this rearrangement breaks compatibility with older model pickle files
- Refactoring of
NBASE
constant Now a single source of responsibilitysloika/variables.py
Models importing_NBASE
fromsloika/module_tools.py
should now importNBASE
instead - Default for training and basecalling are transducer based models
- Compilation of networks is handled automatically by
basecall_network.py
- Compiled network may be saved for future use
compile_network.py
executable has been removed
- Recurrent layers
- New recurrent unit types have been added
- Detailed tests to ensure recurrent layers work
- Type of gate function is now an option on layer initialisation
- Pekarnya server for scheduling model training jobs https://wiki/display/RES/Pekarnya
- Considerable work on the building and testing infrastructure
- Stable and development branches were created
- Binary artefacts are built for each commit in development branch
- Artefacts are automatically versioned in development branch
- Unit and acceptance tests are exercising artefact before it is marked as a release candidate
- Remapping using RNN from fast5 directly to chunks
chunkify.py
chunkify.py identity
has similar behaviour tochunk_hdf5.py
chunkify.py remap
will remap a directory of fast5 files using a transducer RNN before chunking
remap_hdf5.py
andchunk_hdf5.py
removed in favour ofchunkify.py
- Per chunk normalisation optional
--normalisation
- Default is still to normalise over entire read
- Chunk size for training can be randomly selected from batch to batch
--chunk_len_range min max
- Default is to always train with maximum possible chunk size
- Chunk size chosen uniformly in specified interval
- Edge events are not used when assessing loss function
--drop n
- Default to drop 20 events from start and end before assessing loss
- Changed default trimming from ends of sequence
- Fix to allow trimming of zero events
- Minimum read length (in events) for chunking to take place
- Removed vestigial
networks.py
file that has been replaced by the contents of themodels/
directory - Seed for random number generator can be set on command line of
train_network.py
- Enable HDF5 compression
- Fix to ensure every chunk starts with a non-zero (not stay) label
- Trim first and last events from loss function calculation (burn-in)
- Fix bug in how kmers are merged into sequence in low complexity regions
- Increased PEP8 compliance
- Default location of segmentation information has changed (see Untangled 0.5.1)
- Location of segmentation information can now be given as commandline option in many programs
- Trainer copies logging information to stdout. May be silenced with
--quiet
- JSON may be dumped to file rather than stdout
Initial release