Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusing ID column names #129

Open
redbluewater opened this issue May 6, 2020 · 1 comment
Open

confusing ID column names #129

redbluewater opened this issue May 6, 2020 · 1 comment

Comments

@redbluewater
Copy link

I am putting this comment here, but it also impacts the examples for mmvec (and maybe other programs).

The 'red sea' example for songbird uses sampleid as an identifier for the sequence data (feature_metadata.txt) and also uses sampleid as an identifier for the samples (in redsea_metadata.txt). As I am still learning qiime, I am not sure of the best way around this. However, having only two choices (some variant of sampleid and featureid) does not seem like enough choices. For example:

  • sequence data .... needs sequence_metadata
  • metabolites data .... needs metabolites_metadata
  • sample data ... needs sample_metadata

How about this idea:

  • sequence data .... sequence_id
  • metabolites data .... metabolite_id
  • sample data ... sample_id

This is more precise than 'sampleid' or 'featureid', especially as a mass spectrometry group who uses 'features' to define peaks in mass spectrometry data (the opposite of the use of features in the examples here).

Thanks as ever for developing these tools. They are extremely useful and I am excited to use them to dig into my own data.

@mortonjt
Copy link
Collaborator

@KujawinskiLaboratory this is a very good point - one that stemmed from the traditional definitions of metadata.

There have been a couple of discussions about this in other contexts, in particular
biocore/emperor#726
qiime2/q2-emperor#81

I'd think it'll take a fairly extensive refactor of qiime2 to make sure that these types propagate accordingly (i.e. what about all of the other omics datatypes, such as transcriptomics, proteomics).
CC @ElDeveloper @ebolyen for further discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants