Skip to content

The-Bioinformatics-Group/Debaryomyces_hansenii

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Debaryomyces hansenii project

Organization of folders (sparc1):

/data3/debaryomyces/debaryomyces_merce/

Contents (Folders that start with a number come from Mahesha):

  • 00_ReferenceGenome
    • Reference genome CBS767
    • Sequence of a plasmid pDH1A
  • 03_Assembly
    • Folder for each strain with the assemblies done.
  • 04_Validation
    • Folder for each strain with the validation of the assemblies.
  • 05_Final_Assembly
    • Fasta file of the final assembly for each strain and the stats file.
  • RNAseq_data
    • 5 RNAseq fastq sequences.
  • data_references
    • Reference1: CBS767
    • Reference2: MTCC234
  • Work_files: space of work for Merce Montoliu Nerin. README files explaining each of the contents and results in each folder.

Workflow

1. Information about strains

Alternative names

Origin of strains

2. Bowtie2
  • Mapping of raw reads to Debaryomyces hansenii reference strain CBS767, results can be found here.

  • Mapping of raw reads to Saccharomyces cerevisiae reference strain S288c, results can be found here.

  • Mapping of raw reads to known previously mistaken species like Meyerozyma guilliermondii, results can be found here.

First meeting - 15th of September, first planned workflow.

3. Information on how kraken works.
  • Manual here

  • Article for more specific information here.

4. Kraken standard database to check contamination
5. VarScan first tests
  • Used on raw reads, see here
6. BLAST tests
  • blastn and blastx tests here.

Second meeting - 2nd of November, resume of results.

7. Kraken custom database
  • Raw reads here

  • Mahesh assemblies here.

8. ITS regions study
  • All information here, part left aside for now.

Third meeting - 8th of December, results, more resumed results.

9. Include hybrid strains in Kraken database

Not possible, it needs a GI number to work, it needs to be included on the taxonomy.

10. Map raw reads to its own assembly to check assemblies
  • Bowtie2 used. Results can be found here
11. Map weird strains between each other to check how close they are.
  • Bowtie2 used. Results can be found here
12. PreQC on 1006 and 1012 to prepare for an improvement of the assembly.
  • Process and results here
13. SOAPdenovo assembly
  • Only of 1006 and 1012, can be found here

  • Only commands and config files, results stored on sparc1: /data3/debaryomyces/debaryomyces_merce/Work_files/merce_assemblies_workfolder

14. Coverage before and after removing duplicates of 1006 and 1012
  • To check is it is worth it to continue trying to improve the assembly.

  • Can be found here. Not many diferences between before and after removing duplicates.

15. Debaryomyces Quality Assessment Report
  • From Mahesh. Found here.
16. Coverage of all the raw data of all the strains.
  • Table of coverages here. Some have really bad coverages, impossible to improve assembly. Those should be sequenced again.
17. Compare weird strains with CBS767 to check regions in common
  • Done with blastn and parsing resulting table, results here

  • Check results comparing them with other strains known to be Debaryomyces hansenii (In the same folder).

Fourth meeting - 22nd of January, report.

  • Conclusions for next steps: SNP calling, phylogeny.
18. Pre-Variant calling
  • Alignment - Already done in step 2 with Bowtie2. Usage was found here.

  • Remove PCR duplicates - Usage here

  • Add read groups - Usage here.

  • Local realignment/BAQ - Usage here.

19. Variant calling - Freebayes (cohort and individual)
20. Post-freebayes
  • Problems and pipeline here.
21. VarScan
22. Bowtie2 mapping against Debaryomyces fabryi
23. Post-varscan
24. Variant calling in numbers
  • Results freebayes here.
25. Annotation using Maker
  • Initial information about control files here

  • First tests on CBS767. Control files here

  • Analysis and results of the D. hansenii strains here

  • Different strains still running.

26. Repeat post-Variant calling freebayes - without filtering
  • All the info can be found here.

Fifth meeting - 26th of February, pdf.

27. Variant calling 2.
28. Variant calling 3.
  • Complete failure of the Variant calling 2 pipeline, never got till the point of doing the vcalling step... Error in freebayes due to RG. Even when having used AddOrReplaceReadGroups, even twice! The problem might be on the merging. New one using the flag -c.

  • Pipeline: Pre-vcalling3, vcalling

29. Variant calling 4.

Sixth meeting - 23th of March

30. Phylogeny
31. Map Reads to Assemblies in pairs
  • All the strains except for S. cerevisiae

  • Can be found here.

32. Pipelines added to GitHub

https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Project_pipeline

33. CD-Hit

https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Work_files/Only_Debaryomyceshansenii/Phylogeny/cd-hit

34. GPD1 and ACT1 phylogenies

Extract sequences: https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Work_files/Only_Debaryomyceshansenii/Phylogeny/genes/BLAST

ACT1 Phylogeny: https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Work_files/Only_Debaryomyceshansenii/Phylogeny/genes/phylogeny_act1

GPD1 Phylogeny: https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/tree/master/Work_files/Only_Debaryomyceshansenii/Phylogeny/genes/phylogeny_gpd1

Master's thesis final report

https://github.com/The-Bioinformatics-Group/Debaryomyces_hansenii/blob/master/Mastersthesis_mercemontoliunerin.pdf

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published