confusing ID column names #129

redbluewater · 2020-05-06T13:54:51Z

I am putting this comment here, but it also impacts the examples for mmvec (and maybe other programs).

The 'red sea' example for songbird uses sampleid as an identifier for the sequence data (feature_metadata.txt) and also uses sampleid as an identifier for the samples (in redsea_metadata.txt). As I am still learning qiime, I am not sure of the best way around this. However, having only two choices (some variant of sampleid and featureid) does not seem like enough choices. For example:

sequence data .... needs sequence_metadata
metabolites data .... needs metabolites_metadata
sample data ... needs sample_metadata

How about this idea:

sequence data .... sequence_id
metabolites data .... metabolite_id
sample data ... sample_id

This is more precise than 'sampleid' or 'featureid', especially as a mass spectrometry group who uses 'features' to define peaks in mass spectrometry data (the opposite of the use of features in the examples here).

Thanks as ever for developing these tools. They are extremely useful and I am excited to use them to dig into my own data.

mortonjt · 2020-05-22T22:20:49Z

@KujawinskiLaboratory this is a very good point - one that stemmed from the traditional definitions of metadata.

There have been a couple of discussions about this in other contexts, in particular
biocore/emperor#726
qiime2/q2-emperor#81

I'd think it'll take a fairly extensive refactor of qiime2 to make sure that these types propagate accordingly (i.e. what about all of the other omics datatypes, such as transcriptomics, proteomics).
CC @ElDeveloper @ebolyen for further discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confusing ID column names #129

confusing ID column names #129

redbluewater commented May 6, 2020

mortonjt commented May 22, 2020

confusing ID column names #129

confusing ID column names #129

Comments

redbluewater commented May 6, 2020

mortonjt commented May 22, 2020