Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 4.17 KB

FAQ.md

File metadata and controls

45 lines (36 loc) · 4.17 KB

My machine with built-in gcc <6. Even if I set LD_LIBRARY_PATH and PATH to use new gcc, the source code still tried to use old gcc in /usr/bin and failed compilation.

The key is setting LD_LIBRARY_PATH and PATH before typing cmake ./. The easiest way maybe:

  1. Delete the current source code using commands rm -rf anchorwave.
  2. Set LD_LIBRARY_PATH, PATH, CC, and GCC to the new gcc.
  3. Re-clone the AnchorWave repository and compile it.

Error message for installation: unrecognized command line option '-std=gnu++14'

Please check log information from the cmake ./ command.
-- The C compiler identification is GNU *****
-- The CXX compiler identification is GNU *****

Was C and CXX were correctly recognized by cmake? Was their version larger than or equal with 7.0?
If you have newer version of GNU gcc installed, but cmake could recognize them correctly. I used the following commands to tell cmake where to find the correct GNU gcc

export LD_LIBRARY_PATH=/usr/local/gcc-7.3.0/lib64:/usr/local/gcc-7.3.0/lib:$LD_LIBRARY_PATH
export CC=/usr/local/gcc-7.3.0/bin/gcc
export CXX=/usr/local/gcc-7.3.0/bin/g++

On a different computer, you need figure out where are those programms located.

AnchorWave is very slow when using a network storage

To save memory and increase the number of threads running in parallel, AnchorWave does not catch genome sequences in memory, but read them on demand for a lot of times. Please avoid using network storage if it is possible. Or if you have a big memory, you could copy the genome file, especially the query genome file into memory, /dev/shm .

Should I perform genome repeat masking before feed into AnchorWave

Genome masking is not expected to improve the performance of AnchorWave.
AnchorWave do not utilize any soft masking information. Hard masking would increase the computational cost of AnchorWave.

End with error message "TransferGffWithNucmerResult.cpp:203"

anchorwave: /net/eichler/vol26/projects/primate_sv/nobackups/Tools/anchorwave/src/service/TransferGffWithNucmerResult.cpp:203: void readSam(std::vector<AlignmentMatch>&, std::ifstream&, std::map<std::__cxx11::basic_string<char>, Transcript>&, int&, const double&, double&, std::set<std::__cxx11::basic_string<char> >&, const string&, int32_t&, int32_t&, int32_t&, int32_t&, int&, bool&, int&, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&): Assertion 'databaseStart < databaseEnd' failed. Aborted (core dumped) We observed three possibilities that cause this problem:

  1. AnchorWave may check the open reading frame of the input GFF file and genome sequence. If there are many transcript annotations that do not start with a start codon, do not end with a stop codon, have premature stop codon or non-standard splice sites, AnchorWave maybe fail.
  2. If inputting the CDS fasta file is generated by other applications, AnchorWave could fail. The anchorwave gff2seq command filter some short CDS records from the GFF file, to reduce problem that minimap2 could not deal with short CDS very well. Please use the fasta file generated by anchorwave gff2seq command as input for minimap2 and anchorwave proali or anchorwave genoAli.
  3. If you set parameters -x or -m for anchorwave gff2seq, please set the identical parameter for anchorwave proali or anchorwave genoAli.

"Segmentation fault (core dumped)" error from genoAli function

The genoAli function aligns the query chromosome sequence against the reference chromosome with the same name. Those pairs of sequences with identical names should be similar to each other.

  1. Some assemblies contain contigs or scaffolds. Those contigs or scaffolds with identical names from the reference genome and query genome are not similar to each other. They should be removed from the input file before performing genome alignment.
  2. Some assemblies concatenate those contigs or scaffolds together as chr0/chr00 or something similar. Those chr0/chr00s from the reference genome and query genome should not be aligned using the global alignment strategy implemented as proali function.