What is the format for input using a csv file? #3

DadongZ · 2019-12-18T15:13:17Z

I have a large list of peptides and wondering what is the format if using a csv file as input. I have tried few ways but doesn't work. Is there a template?

ikizhvatov · 2019-12-18T19:48:58Z

Mentioning csv in the help is confusing, I see. Input csv should contain a peptide per line. I have added example_input.csv and the corresponding output csv to the repo.

DadongZ · 2019-12-26T01:35:00Z

Thanks! That's helpful.

DadongZ · 2020-01-03T18:28:36Z

Thanks, I got the below error with csv as input (~150K peptides):

Traceback (most recent call last):
  File "/home/dz33/.conda/envs/hmmhc/bin/hmmhc-predict", line 8, in <module>
    sys.exit(main())
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/cmdline.py", line 53, in main
    predictions = predictor.predict(peptides)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/hmmhc.py", line 70, in predict
    normalizedLogOdds = self.computeLogOdds(peptidesList)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/hmmhc.py", line 103, in computeLogOdds
    sequenceSetBlocks = self.toSequenceSetBlocks(peptideList)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/hmmhc.py", line 156, in toSequenceSetBlocks
    [list(p) for p in peptideList[rangeStart:rangeEnd]]
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/ghmm.py", line 948, in __init__
    internalInput = [self.emissionDomain.internalSequence(seq) for seq in sequenceSetInput]
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/ghmm.py", line 393, in internalSequence
    result = map(lambda i: self.index[i], result)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/ghmm.py", line 393, in <lambda>
    result = map(lambda i: self.index[i], result)
KeyError: 'X'

Some suggestions?

ikizhvatov · 2020-01-03T18:44:37Z

At least one of your peptide-encoding strings contains character 'X' which denotes an unknown amino acid and is not supported by the predictor. 'U' is also not supported.

As a quick solution, for now please input only the peptide strings containing the 20 amino acids ACDEFGHIKLMNPQRSTVWY.

We will consider filtering the peptides with unsupported amino acids in the tool itself.

DadongZ · 2020-01-03T18:53:27Z

I removed all peptides contains X and U but still got error

Traceback (most recent call last):
  File "/home/dz33/.conda/envs/hmmhc/bin/hmmhc-predict", line 8, in <module>
    sys.exit(main())
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/cmdline.py", line 53, in main
    predictions = predictor.predict(peptides)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/hmmhc.py", line 70, in predict
    normalizedLogOdds = self.computeLogOdds(peptidesList)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/hmmhc.py", line 103, in computeLogOdds
    sequenceSetBlocks = self.toSequenceSetBlocks(peptideList)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/hmmhc/hmmhc.py", line 156, in toSequenceSetBlocks
    [list(p) for p in peptideList[rangeStart:rangeEnd]]
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/ghmm.py", line 948, in __init__
    internalInput = [self.emissionDomain.internalSequence(seq) for seq in sequenceSetInput]
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/ghmm.py", line 393, in internalSequence
    result = map(lambda i: self.index[i], result)
  File "/home/dz33/.conda/envs/hmmhc/lib/python2.7/site-packages/ghmm.py", line 393, in <lambda>
    result = map(lambda i: self.index[i], result)
KeyError: '*'

ikizhvatov · 2020-01-03T19:22:32Z

As said, peptide strings shall only contain the 20 canonical amino acids. You have '*' (it usually denotes a stop codon) in at least one of the strings.

DadongZ · 2020-01-03T20:35:07Z

A stop codon in peptide? These are peptides from genes CDS though and I have removed all X/U letters.

ikizhvatov self-assigned this Jan 3, 2020

ikizhvatov added the enhancement New feature or request label Jan 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the format for input using a csv file? #3

What is the format for input using a csv file? #3

DadongZ commented Dec 18, 2019

ikizhvatov commented Dec 18, 2019 •

edited

Loading

DadongZ commented Dec 26, 2019

DadongZ commented Jan 3, 2020

ikizhvatov commented Jan 3, 2020

DadongZ commented Jan 3, 2020

ikizhvatov commented Jan 3, 2020

DadongZ commented Jan 3, 2020 •

edited

Loading

What is the format for input using a csv file? #3

What is the format for input using a csv file? #3

Comments

DadongZ commented Dec 18, 2019

ikizhvatov commented Dec 18, 2019 • edited Loading

DadongZ commented Dec 26, 2019

DadongZ commented Jan 3, 2020

ikizhvatov commented Jan 3, 2020

DadongZ commented Jan 3, 2020

ikizhvatov commented Jan 3, 2020

DadongZ commented Jan 3, 2020 • edited Loading

ikizhvatov commented Dec 18, 2019 •

edited

Loading

DadongZ commented Jan 3, 2020 •

edited

Loading