Major updates to CPU, SLURM, and UGER (note that we cannot test PBS or LSF and thus are not updating; since AWS will soon be superseded by ENCODE, we have also ceased development on that branch).
This is the last release before we convert entirely to Juicer 2.0 (the ENCODE version, currently under pre-release).
Major:
- Intra fragment reads are NO LONGER discarded by default. To discard them from the hic file, use the flag
--skip-intra-frag
when calling "pre". This depends on using the latest jar for Juicer Tools: https://github.com/aidenlab/juicer/wiki/Download . Using old jars will result in old behavior (silently discarding intrafragment reads) - The latest jar has extensive bug fixes
- BWA now aligns in paired end mode. This requires BWA version 0.7.17 or higher; short read and short end mode are now deprecated
- Changed chimeric blacklist to handle quadruple reads and eliminate MT exception
- Rewrite of generate_site_positions
- The default site is now "none" if no site is sent in
- Fragment maps no longer included in Hi-C file by default. Before you would exclude them with the -x flag; now use -f to include.
- Dups has a bug fix for some degenerate cases resulting in large memory usage. Also now have flag -j for "just exact matches"; this will only eliminate exact match duplicates. Overall this flag is not recommended. However, if you find your jobs are often getting stuck at the dedup phase, it can be because of low complexity or low mapping quality and this flag will allow the jobs to finish much faster. You will still be left with near-duplicates in your library - so use caution when interpreting results. In particular note that near-duplicates are usually machine errors, not true biological results, and thus ought to be removed.
- Multiple ligation junctions now supported in juicer and statistics.pl script
Minor:
- The chimera handling script now includes the header and prints out tab-delimited, for better conversion to BAM; it also no longer looks for the /1, /2 but rather looks for the SAM flag
- We dedup collisions now
- An addition to the dups script makes it run faster and with less memory when there are a lot of duplicates
- Statistics updated in CPU to properly handle multiple ligations; also added scripts in CPU that were missing for mega
- Made the names correct in the stats_sub script
- Count ligations explicitly excludes the readname in the fastq file
- LibraryComplexity no longer a separate jar
- No more stats calculation on duplicates
- Java memory options now exported instead of separate scripts
- Multiple ligation junctions handled
- Flag added for no wobble / just exact duplicates (-j)
- Adding in options to mega script; updating memory requirements
- Making scripts more consistent by having check for system for juicer_tools path