A question about the example. #1
Labels
bug
Something isn't working
data format
How to format data into graph format
documentation
Improvements or additions to documentation
question
Further information is requested
Hello, you have done a great job and thank you for your contribution to the field of protein-based genome representation.
When I was embedding the genome, I used the sample you provided (test.faa) and used the command (esm-embed --input ./test1.faa --outdir ./test_output --esm esm2_t6_8M --torch-hub test) to convert the protein sequence into esm2 embedding, and then used the command (pst graphify --file ./test_output/esm2_t6_8M_results.h5 --fasta-file ./test.faa --output ./test_graph.h5) to try to convert the esm2 embedding into a graph structure. An error occurred when converting the graph structure:
###################################################
ValueError: FASTA file headers must be in prodigal format: '>scaffold_ptn#' with the additional metadata separated by ' # '
####################################################
It seems to be a problem with the input file test.faa of the parameter --fasta-file ./test.faa. You mentioned in the Readme and -h that it should conform to the prodigal format (>scaffold_ptn#). I tried to modify the name line (the line starting with >) in test.faa to the prodigal format (>SAMEA.110_1 or >SAMEA_1), but the above error still occurred. How can I solve this problem? I noticed that you didn't provide a sample file for the complete process. If my understanding is wrong, could you provide the detailed format of the prodigal, or provide all the sample files required for the complete process? Thank you.
The text was updated successfully, but these errors were encountered: