Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FMTree,search报错 #2

Open
Pennyroyal-Tea opened this issue Mar 10, 2023 · 4 comments
Open

FMTree,search报错 #2

Pennyroyal-Tea opened this issue Mar 10, 2023 · 4 comments

Comments

@Pennyroyal-Tea
Copy link

image

程博,按照您的论文,我将dna200MB经过preprocess处理,然后建了index,采样距离D=2。patterns也是从中随机抽取10个lengh=5的短字串。进行比对时候,就出现了如图所示的错误,这是怎么回事?

@chhylp123
Copy link
Owner

抱歉因为代码比较久了,所以细节我可能记错了。这个主要问题是似乎没有找到index.occ,这说明没有找到索引。似乎你是应该先把索引简历起来,然后再进行查询。所以是不是应该先用输入1建立index,然后再输入2进行查找?

@Pennyroyal-Tea
Copy link
Author

image
步骤顺序是没错的,然而还是无法进行查询,貌似是因为指针所对应的地址是无效地址,不知道哪里代码出现了问题

@chhylp123
Copy link
Owner

能麻烦把完整命令贴给我么?还有输入文件。我可能要花一些时间看一下。多谢

@Pennyroyal-Tea
Copy link
Author

Pennyroyal-Tea commented Mar 16, 2023

小数据集是dna.200MB(http://pizzachili.dcc.uchile.cl/texts/dna/dna.200MB.gz)。我首先对该数据集进行处理,./preprocess --index dna.200MB后生成dna.200MB.not_N,然后运行./FMtree。第一步建立index,input file name:dna.200MB.not_N,采样距离D=2,index可以成功建立,生成dna.200MB.not_N.index.occ、dna.200MB.not_B.bwt等6个文件。第二步make pattern,输入文件为dna.200MB.not_N,从中随机生成10条长度为5的子串,结果生成patterns.txt,这一步也成功了。第三步search,需要input the prefix of index name,我的输入是dna.200MB.not_N,结果报错为Failed to open .index.occ!我以为是前缀的原因,我重复以上步骤,并把名字尽量修改简单,例如dna200、ref200,仍然是同样的错误。大数据集是human.fasta(http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz)。同样的操作步骤,进行search时候输入human.fasta,显示2902918 segmentation fault ./FMtree,可能是因为代码哪里出现了内存泄漏。无论是FMtree、Original_s、Original_v,都出现了以上问题

我的课题也是基于FM-Index,另外我注意到可能是您在科大的师弟发表了FMAlign,那个也是基于FMtree,所以能够成功复现FMtree蛮重要的,希望能够得到您的帮助。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants