Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failing with new BioPython deprecation warning #1727

Open
victorlin opened this issue Jan 17, 2025 · 5 comments · May be fixed by #1731
Open

CI failing with new BioPython deprecation warning #1727

victorlin opened this issue Jan 17, 2025 · 5 comments · May be fixed by #1731

Comments

@victorlin
Copy link
Member

victorlin commented Jan 17, 2025

First appeared on today's scheduled run:

Previously, the FASTA parser silently ignored comments at the beginning of the FASTA file (before the first sequence).

Nowadays, the FASTA file format is usually understood not to have any such comments, and most software packages do not allow them. Therefore, the use of comments at the beginning of a FASTA file is now deprecated in Biopython.

In a future Biopython release, this deprecation warning will be replaced by a ValueError. To avoid this, there are three options:

  1. Modify your FASTA file to remove such comments at the beginning of the file.

  2. Use SeqIO.parse with the 'fasta-pearson' format instead of 'fasta'. This format is consistent with the FASTA format defined by William Pearson's FASTA aligner software. Thie format allows for comments before the first sequence; lines starting with the ';' character anywhere in the file are also regarded as comment lines and are ignored.

  3. Use the 'fasta-blast' format. This format regards any lines starting with '!', '#', or ';' as comment lines. The 'fasta-blast' format may be safer than the 'fasta-pearson' format, as it explicitly indicates which lines are comments.

Timing aligns with yesterday's release of Biopython version 1.85 (changes).

@tsibley
Copy link
Member

tsibley commented Jan 17, 2025

(What a nice error message though!)

@victorlin
Copy link
Member Author

victorlin commented Jan 17, 2025

The status quo for both Biopython and Augur has been to support comments in FASTA files. With version 1.85, Biopython is switching things up. Our options are:

  1. Let Biopython decide when Augur should stop supporting comments in FASTA files.
    • Expose this warning, or at least some variant, to our users when reading/parsing FASTA files.
    • The case of parsing with format='fasta' on a GenBank file in ancestral.py is tricky since it appears when attempting to parse a GenBank file as FASTA, however suppression of the warning will also suppress it for when the root sequence is a FASTA with comments.
  2. Decide independently if/when Augur should stop supporting comments in FASTA files.
    • This warning should not be exposed to our users when reading/parsing sequence files. It can be suppressed by something like warnings.filterwarnings("ignore", category=BiopythonDeprecationWarning, message="Previously, the FASTA parser …").
    • To continue supporting comments from FASTA files, use format='fasta-pearson' and pin Biopython >=1.85 as that's the first version to support the format. We may not want to add this pin immediately to allow users to install Augur with older versions of Biopython for some more time.

I think (2) is the way to go.

@tsibley
Copy link
Member

tsibley commented Jan 17, 2025

  • To continue supporting comments from FASTA files, use format='fasta-pearson' and pin Biopython >=1.85 as that's the first version to support the format. We may not want to add this pin immediately to allow users to install Augur with older versions of Biopython for some more time.

Or don't pin but use feature/version detection to use format='fasta' for <1.85 and format='fasta-pearson' for >=1.85? They're functionally equivalent, right? (Or close to it?)

@corneliusroemer
Copy link
Member

Here's the full deprecation announcement (unfortunately hidden in DEPRECATED.rst and not at all mentioned in the release notes): https://github.com/biopython/biopython/blob/e8d27e912ee8c9e75483728f6ac8094713563c1d/DEPRECATED.rst#bioseqiofastaio

Or don't pin but use feature/version detection to use format='fasta' for <1.85 and format='fasta-pearson' for >=1.85? They're functionally equivalent, right? (Or close to it?)

I agree this is the way to go. We have no reason to change.

fasta-pearson is unfortunately only supported in >=1.85 (it's missing here https://biopython.org/docs/1.84/api/Bio.SeqIO.FastaIO.html) so we need to do a version switch here, but that's better than having inconsistent behaviour depending on a dependency's version.

@victorlin
Copy link
Member Author

Thanks for the reviews @corneliusroemer. I'll try to address things properly in #1730 and #1731, which means bigger changes in code rather than a quick fix. In the meantime, can we disable the failing test to improve usefulness of CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants