Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does readID represent in the output simulation data? #40

Open
xujialupaoli opened this issue Jun 23, 2024 · 0 comments
Open

What does readID represent in the output simulation data? #40

xujialupaoli opened this issue Jun 23, 2024 · 0 comments

Comments

@xujialupaoli
Copy link

xujialupaoli commented Jun 23, 2024

Thank you for providing such a useful software. I used sim3C to simulate a hic data for my E. coli genome.
Use the following code:

sim3C --profile mycom.txt -n 5000000 -l 150 -e DpnII -m hic /home/work/jialu/tetraploid_assembly/simulate_data/strain_fq/ref_genome/hap4.fa hap4_R1.fq hap4_R2.fq

The content of mycom.txt is as follows:

image

Here is the readID of the simulated data output:


$ cat  /home/work//simulate_data/strain_fq/hic/hap1/hap1_R2.fq |grep "^@" |head -n 7
@SIM3C:3C:1:1:1:1 2:Y:18:1 HIC hap1:2220392 hap1:4072595
@SIM3C:WGS:1:1:1:2 2:Y:18:1 WGS hap1:3356458..3356840:R
@SIM3C:3C:1:1:1:3 2:Y:18:1 HIC hap1:3940006 hap1:3456994
@SIM3C:3C:1:1:1:4 2:Y:18:1 HIC hap1:22115 hap1:195624
@SIM3C:WGS:1:1:1:5 2:Y:18:1 WGS hap1:4349782..4350231:F
@SIM3C:WGS:1:1:1:6 2:Y:18:1 WGS hap1:3599187..3599569:R
@SIM3C:3C:1:1:1:7 2:Y:18:1 HIC hap1:622455 hap1:4763592


$ cat  /home/work/simulate_data/strain_fq/hic/hap4/hap4_R2.fq |grep "^@" |head -n 7
@SIM3C:WGS:1:1:1:1 2:Y:18:1 WGS hap4:3272044..3272421:F
@SIM3C:WGS:1:1:1:2 2:Y:18:1 WGS hap4:4037903..4038223:F
@SIM3C:WGS:1:1:1:3 2:Y:18:1 WGS hap4:555578..556018:F
@SIM3C:WGS:1:1:1:4 2:Y:18:1 WGS hap4:88266..88635:F
@SIM3C:3C:1:1:1:5 2:Y:18:1 HIC hap4:3929062 hap4:3222724
@SIM3C:3C:1:1:1:6 2:Y:18:1 HIC hap4:2390316 hap4:397284
@SIM3C:3C:1:1:1:7 2:Y:18:1 HIC hap4:1763726 hap4:706931


I don't understand what the readID naming in the output means? In addition, why do some reads start with "@SIM3C:3C" and some start with "@SIM3C:WGS:", are there any differences between these reads? Do these differences lead to different naming meanings for the subsequent "WGS hap1:3356458..3356840:R" and "HIC hap1:22115 hap1:195624"?
Looking forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant