Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kraken2-build error when creating sequence ID to taxonomy ID map #18

Open
joshsimcock opened this issue May 9, 2022 · 9 comments
Open

Comments

@joshsimcock
Copy link

Hi,

I'm near the end of the Struo2 pipeline trying to create a custom kraken2 database using gtdb r207.

I've hit a wall though at the kraken2-build command, specifically one spot within the build_kraken2_db.sh script that the command calls. It seems that this section:

echo "Creating sequence ID to taxonomy ID map (step 1)..."
if [ -d "library/added" ]; then
  find library/added/ -name 'prelim_map_*.txt' | xargs cat > library/added/prelim_map.txt
fi
seqid2taxid_map_file=seqid2taxid.map
if [ -e "$seqid2taxid_map_file" ]; then
  echo "Sequence ID to taxonomy ID map already present, skipping map creation."
else
  step_time=$(get_current_time)
  find library/ -maxdepth 2 -name prelim_map.txt | xargs cat > taxonomy/prelim_map.txt
  if [ ! -s "taxonomy/prelim_map.txt" ]; then
    echo "No preliminary seqid/taxid mapping files found, aborting."
    exit 1
  fi
  grep "^TAXID" taxonomy/prelim_map.txt | cut -f 2- > $seqid2taxid_map_file.tmp || true
  if grep "^ACCNUM" taxonomy/prelim_map.txt | cut -f 2- > accmap_file.tmp; then
    if compgen -G "taxonomy/*.accession2taxid" > /dev/null; then
      lookup_accession_numbers accmap_file.tmp taxonomy/*.accession2taxid > seqid2taxid_acc.tmp
      cat seqid2taxid_acc.tmp >> $seqid2taxid_map_file.tmp
      rm seqid2taxid_acc.tmp
    else
      echo "Accession to taxid map files are required to build this DB."
      echo "Run 'kraken2-build --db $KRAKEN2_DB_NAME --download-taxonomy' again?"
      exit 1
    fi
  fi
  rm -f accmap_file.tmp
  finalize_file $seqid2taxid_map_file
  echo "Sequence ID to taxonomy ID map complete. [$(report_time_elapsed $step_time)]"
fi

Produces the error messages:

Accession to taxid map files are required to build this DB.
Run 'kraken2-build --db $KRAKEN2_DB_NAME --download-taxonomy again?

When I try to run through this line by line myself everything is fine until lookup_accession_numbers accmap_file.tmp taxonomy/*.accession2taxid > seqid2taxid_acc.tmp at which point I get the error Found 0/1363031 targets...lookup_accession_numbers: unable to open taxonomy/*.accession2taxid: No such file or directory

my ./taxonomy/ directory only contains the following:

-rw-r--r--+ 1  names.dmp
-rw-r--r--+ 1  nodes.dmp
drwxr-sr-x+ 2  .
-rw-r--r--+ 1  prelim_map.txt
drwxr-sr-x+ 5  ..

Should there be accession2taxid files in here? If so, when should they have been generated?

Happy to post on the kraken2 github if this is more appropriate but figured this maybe something that should have been generated elsewhere in the Struo2 pipeline.

Any help much appreciated, thanks!

@joshsimcock joshsimcock changed the title kraken error when creating sequence ID to taxonomy ID map kraken-build error when creating sequence ID to taxonomy ID map May 9, 2022
@joshsimcock joshsimcock changed the title kraken-build error when creating sequence ID to taxonomy ID map kraken2-build error when creating sequence ID to taxonomy ID map May 9, 2022
@nick-youngblut
Copy link
Contributor

hmm... an accession2taxid file shouldn't be needed, unless that recently changed. Can you please try just creating an empty accession2taxid file in the appropriate directory?

@joshsimcock
Copy link
Author

Thanks for the quick suggestion, no luck unfortunately.

Creating a blank file .accession2taxid or accession2taxid gives the same error and giving the file a filename like 1.accession2taxid, test.accession2taxid, blank.accession2taxid, etc just produces lookup_accession_numbers: unable to mmap taxonomy/1.accession2taxid: Invalid argument

@nick-youngblut
Copy link
Contributor

I’m on vacation this week, but I’ll have a look at the problem ASAP

@joshsimcock
Copy link
Author

No worries, thanks! Enjoy your vacation.

@nick-youngblut
Copy link
Contributor

@joshsimcock I haven't been able to reproduce this issue. Can you provide more info, such as:

  • the version of snakemake that you are using
  • the versions of kraken2 & bracken in the conda env that is used by snakemake (in the .snakemake/conda/ directory)

@nick-youngblut
Copy link
Contributor

FYI: I'm working on creating Kraken2 & Bracken databases for Release 207 (followed later by the humann3 databases). They should be complete by the end of the week.

@joshsimcock
Copy link
Author

@nick-youngblut sorry for the long delay in replying.

snakemake = 7.6.2
kraken2 = 2.1.2
bracken = 2.5

Thanks for uploading the 207 release! Saves me a lot of time. If you can figure out what happened here great, but there is no rush as I can use your r207 builds for now thanks!

@haolilan
Copy link

I am encountering the same problem, and I have resolved it by "chmod +x n*.dmp" after much time and effort. I am afraid that the problem is that the names.dmp and nodes.dmp are not able to be read. be read, as files in your ./taxonomy/ directory also were "-rw-r--r--".

@haolilan
Copy link

haolilan commented Sep 25, 2023

After solving the above problem, I encountered another problem with the same warning. I found the head of my library fna contain many space that may cause the kraken tax id can't be read which were reported in accmap_file.tmp
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants