Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purpose of the -group and -target flags #4

Open
derekcg opened this issue Feb 2, 2023 · 1 comment
Open

purpose of the -group and -target flags #4

derekcg opened this issue Feb 2, 2023 · 1 comment

Comments

@derekcg
Copy link

derekcg commented Feb 2, 2023

Hello Dr. Qian Feng,

I'm investigating recombination in a large gene family and I think detREC is a good approach, but I'm still struggling to implement the parallelization pipeline you and Dr. Gerry Tonkin-Hill used in your two papers. Could you tell me how the -group and -target flags work in mosaic? I can't find any documentation on them. I'm trying to understand the "-group 2 db target -target target" part of the mosaic command used both in 1nd_mosaic_est_par.sh and in Gerry's supplemental scripts. When I include this in a mosaic command I just get the error, "Could not assign sequence to group". It seems to run without that part but I don't want to exclude that part of the command without understanding what its doing.

Also, in order to replicate the pipeline of Dr. Gerry Tonkin-Hill, when estimating the recombination rate parameter I'll ultimately want to perform some 1000-vs-all runs like he did, where one set of sequences (a subset with 1000 sequences) are aligned to another set of sequences (all of the sequences). However I don't know how to get mosaic to use two sets of sequences like that. I suspect it involves the -group and -target flags, but that's just me guessing.

I'd appreciate any insight you can offer.

Best,
Derek Conkle-Gutierrez

@qianfeng2
Copy link
Owner

Hi Derek,

Very sorry for the late reply. I didn't notice this comment until today.

We ran mosaic by firstly defining target and source sequences. The target sequences are searched against all source sequences to get the mosaic representations. Therefore, there are two groups of sequences (target and source). All the target sequences need to be labeled with a prefix "target_", and all sources are labeled with a prefix "db_". See examples in https://github.com/qianfeng2/Ghana_data_analysis/blob/main/mosaic_processed_data/results_final_alignment/Protein_translateable_pilot_upper_centroids_run100.fasta_align.txt. You have to obey this rule in order to running mosaic program (determined by its source scripts). 1000-vs-all uses the same rule as above.

Please let me know if any problems.

Kind regards,
Qian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants