Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: add csv2fasta #61

Closed
wants to merge 6 commits into from
Closed

FEAT: add csv2fasta #61

wants to merge 6 commits into from

Conversation

DriesSchaumont
Copy link
Contributor

Description

FEAT: add csv2fasta

Issue ticket number

Closes #xxxx

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributing guidelines

  • Proposed changes are described in the CHANGELOG.md

  • I have tested my code with viash ns test --parallel -q <name or namespace>

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes

Copy link
Contributor

@tverbeiren tverbeiren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive tests, very nice!

One of the tests is failing (as you probably have noticed yourself).

I wonder if we're going to run into issues with the --quote_character. CSV parsing libraries exist that auto-detect this and I wonder if we should do the same? The same holds for the delimiter, people often don't know what Excel will generate?

src/sequenceformats/csv2fasta/config.vsh.yaml Outdated Show resolved Hide resolved
src/sequenceformats/csv2fasta/config.vsh.yaml Show resolved Hide resolved
@DriesSchaumont
Copy link
Contributor Author

DriesSchaumont commented Jun 12, 2024

Comprehensive tests, very nice!

One of the tests is failing (as you probably have noticed yourself).

I wonder if we're going to run into issues with the --quote_character. CSV parsing libraries exist that auto-detect this and I wonder if we should do the same? The same holds for the delimiter, people often don't know what Excel will generate?

I agree, here the python csv reader is used, which uses the 'excel' dialect by default, see https://docs.python.org/3/library/csv.html#csv.reader. Alternernatively, we could use a Sniffer to infer the dialect and use that: https://docs.python.org/3/library/csv.html#csv.Sniffer I would keep the 'quote_character' and 'delimiter' options, in order to override what the sniffer provided (with no override by default). Not really useful in automated pipelines, but worth including nonetheless WDYT?

@tverbeiren
Copy link
Contributor

I would keep the 'quote_character' and 'delimiter' options, in order to override what the sniffer provided (with no override by default). Not really useful in automated pipelines, but worth including nonetheless WDYT?

Sure, good idea!

@rcannood
Copy link
Contributor

@DriesSchaumont Is this PR ready to be merged?

Copy link
Contributor

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to move this component to a separate package :)

@DriesSchaumont
Copy link
Contributor Author

Closing in favour of viash-hub/craftbox#1

@rcannood rcannood deleted the feat/add_csv2fasta branch July 4, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants