/data3/debaryomyces/debaryomyces_merce/
Contents (Folders that start with a number come from Mahesha):
- 00_ReferenceGenome
- Reference genome CBS767
- Sequence of a plasmid pDH1A
- 03_Assembly
- Folder for each strain with the assemblies done.
- 04_Validation
- Folder for each strain with the validation of the assemblies.
- 05_Final_Assembly
- Fasta file of the final assembly for each strain and the stats file.
- RNAseq_data
- 5 RNAseq fastq sequences.
- data_references
- Reference1: CBS767
- Reference2: MTCC234
- Work_files: space of work for Merce Montoliu Nerin. README files explaining each of the contents and results in each folder.
-
Mapping of raw reads to Debaryomyces hansenii reference strain CBS767, results can be found here.
-
Mapping of raw reads to Saccharomyces cerevisiae reference strain S288c, results can be found here.
-
Mapping of raw reads to known previously mistaken species like Meyerozyma guilliermondii, results can be found here.
First meeting - 15th of September, first planned workflow.
- Used on raw reads, see here
- blastn and blastx tests here.
Second meeting - 2nd of November, resume of results.
- All information here, part left aside for now.
Third meeting - 8th of December, results, more resumed results.
Not possible, it needs a GI number to work, it needs to be included on the taxonomy.
- Bowtie2 used. Results can be found here
- Bowtie2 used. Results can be found here
- Process and results here
-
Only of 1006 and 1012, can be found here
-
Only commands and config files, results stored on sparc1:
/data3/debaryomyces/debaryomyces_merce/Work_files/merce_assemblies_workfolder
-
To check is it is worth it to continue trying to improve the assembly.
-
Can be found here. Not many diferences between before and after removing duplicates.
- From Mahesh. Found here.
- Table of coverages here. Some have really bad coverages, impossible to improve assembly. Those should be sequenced again.
-
Done with blastn and parsing resulting table, results here
-
Check results comparing them with other strains known to be Debaryomyces hansenii (In the same folder).
Fourth meeting - 22nd of January, report.
- Conclusions for next steps: SNP calling, phylogeny.
-
Alignment - Already done in step 2 with Bowtie2. Usage was found here.
-
Remove PCR duplicates - Usage here
-
Add read groups - Usage here.
-
Local realignment/BAQ - Usage here.
- Usage here.
- Problems and pipeline here.
- Usage here.
-
Because of the publication of its genome
-
Results here
- Results freebayes here.
-
Initial information about control files here
-
First tests on CBS767. Control files here
-
Analysis and results of the D. hansenii strains here
-
Different strains still running.
- All the info can be found here.
Fifth meeting - 26th of February, pdf.
-
Start from 0 again to merge bam files
-
Pipeline: Pre_vcalling, vcalling, Post_vcalling.
-
Complete failure of the Variant calling 2 pipeline, never got till the point of doing the vcalling step... Error in freebayes due to RG. Even when having used AddOrReplaceReadGroups, even twice! The problem might be on the merging. New one using the flag -c.
-
Pipeline: Pre-vcalling3, vcalling
-
It was a space!!!! D: D:
-
Pipeline: PCRduplicates removal, Merge fastq, Bowtie2, Pre-vcalling and vcalling, Post-vcalling, Filtering and tests.
-
All the strains except for S. cerevisiae
-
Can be found here.
https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Project_pipeline
Extract sequences: https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Work_files/Only_Debaryomyceshansenii/Phylogeny/genes/BLAST