-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance on abundance.tsv file and output #109
Comments
Hm, that is strange, are you sure that these values fit exactly? |
Hi and thank you for your quick reply. Your calculations are also what I expected to happen. When I use grep -c on my readmappings.tsv file, I find this file has a total of 1333338 lines, with the counts of lines and percentage: So while not exact, my readmappings file does fit closer to the abundance I put in the abundance.tsv file, then what I would expact from making these caclulations myself. Is there a better way to check the fraction of reads simulated besides grep -c on the readmappings.tsv? |
I think |
wgsim, so error free simulation. Could this cause the behaviour? |
I will have to look into it, but I have the strong suspicion that this is a bug in CAMISIM when using |
sure! |
Hi again, I ran the simulation again with: readsim=tools/art_illumina-2.3.6/art_illumina This resulted in:
The expected row is what I (and you :) )expected my distribution to look like. It is a very good match when using art! |
Okay, that is good to hear. It is still a bug/inconsistency for |
Dear authors,
I remember asking this question before, but when looking back, I still am having some last doubts I was hoping you could help me with.
This is a summary of organism, abundance.tsv, and genome size
Salmonella bongori | 0.10 | 4487548
Salmonella enterica | 0.4 | 5028552
escherichia coli | 0.5 | 4643559
When I use CAMISIM to simulate my sample, I use the readmappings.tsv output file with grep -c to check my abundance.
In this output file, I have exactly 10% of reads belonging o S bongori, 40% to S enterica, and 50 to e.coli
Is this expected behaviour? I read that the simulation is supposed to take genome length into consideration, so I expected that the amount of reads belonging to S.enterica would be inflated, since the genome is bigger. Why is this not the case?
The text was updated successfully, but these errors were encountered: