3C-based methods, such as Hi-C, produce a huge amount of raw data as pairs of DNA reads that are spatially close in the cell nucleus. Overall, these interaction matrices have been used to study how the genome folds within the nucleus, that is one of the most fascinating problems in modern biology. The rigorous analysis of the paired-reads using computational tools has been essential to fully exploit the experimental technique, and to study how the genome is folded in the space. Currently, there is a huge expansion on the wealth of data on genome structure with the availability of many datasets of Hi-C experiments down to 1 kb resolution (see for example: http://hic.umassmed.edu/welcome/welcome.php ; http://promoter.bx.psu.edu/hi-c/view.php or http://www.aidenlab.org/data.html ). In this course, participants will learn to use TADbit, a software designed and developed to manage all the dimensionalities of the Hi-C data:
- 1D - Map paired-end sequences to generate Hi-C interaction matrices
- 2D - Normalize matrices and identify constitutive domains (compartments, TADs)
- 3D - Generate populations of model structures which reproduce the Hi-C interaction matrices
- 4D - Compare samples at different time points
Participants can bring specific biological questions and/or their own 3C data to analyze during the course. At the end of the course, participants will be familiar with the TADbit software, and will be able to fully analyze Hi-C data. Note: Although the TADbit software is central in this course, alternative software will be discussed for each part of the analysis.
Marc A. Marti-Renom obtained a Ph.D. in Biophysics from the Universidad Autonoma de Barcelona where he worked on protein folding under the supervision of B. Oliva, F.X. Aviles and M. Karplus. After that, he went to the US for a postdoctoral training on protein structure modeling at the Sali Lab (Rockefeller University) as the recipient of the Burroughs Wellcome Fund fellowship. Later on, Marc was appointed Assistant Adjunct Professor at UCSF. Between 2006 and 2011, he headed of the Structural Genomics Group at the CIPF in Valencia (Spain). Currently, Marc is an ICREA research professor and leads the Structural Genomics Group at the National Center for Genomic Analysis - Centre for Genomic Regulation (CNAG-CRG) in Barcelona. His group is broadly interested on how RNA, proteins and genomes organize and regulate cell fate. Finally, Marc is an Associate Editor of the PLoS Computational Biology journal and has published over 90 articles in international peer-reviewed journals.
Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES
François Serra obtained his Degree in Biology, specialized in Physiology and Neurophysiology, his Master's Degree in Structural genomics and bioinformatics (Strasbourg I University, France) and it's PhD in Evolutionary Genomics in the Department of Bioinformatics at the CIPF (Valencia). He is now part of the Structural Genomic team of Marc Marti-Renom at CNAG and at CRG (Barcelona). His main research interests are grounded on comparative genomics and evolution with a special focus on the effect of evolution in the structural arrangement of genomes. He has taught MEPA and 3DMOG for GTPB, and also in similar courses at CIPF (Valencia, ES) and the Department of Genetics of the University of Cambridge (UK).
Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES
David Castillo obtained his MSc in Photonics from the Universitat Politècnica de Catalunya in Barcelona (Spain) where he worked in Super-resolution microscopy. He has a background in Physics and Engineering. He works as a technician in the Structural Genomics team of Marc A. Martí-Renom at CNAG-CRG (Barcelona), developing tools for the analysis, modelling and visualization of HiC data. He is also interested in the integration of microscopy to the modeling of genomic 3D structures.
Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES
The course is designed for experimental researchers and bioinformaticians at the graduate and post-graduate levels which are interested in studying the genome spatial organization.
It is likely that the participants to this course aim at getting involved in generating Hi-C data for chromosome structure determination, or that they just want to be able to correctly interpret and analyse publicly available data.
Recommended Linux and basic Python programming skills, graduate level in Life Sciences. All hands-on will be given at 3 levels of computational expertise (web platform, command-line tool and python scripting).
This tutorial is associated with a specific version of TADbit, if you wish to reproduce exactly the results in the notebooks you should use the version of TADbit tagged 3DAROC_2018
.
To install this version do:
git clone https://github.com/3dgenomes/tadbit
cd tadbit
git checkout tags/3DAROC_2018
sudo python setup.py install
Most of the tasks of the "core pipeline" can be tunned directly from command line (without any python), using TADbit tool. Have a look to the commands, and the metadata of the results.
For now TADbit tool is not incuded in the general documetation, as it is still "under construction". Use it carefully, and don't hesitate to repport any weird behaviour you observe.
With small datasets TADbit core pipeline can be runned through a new Virtual Research Environment (VRE), hosted by the MuG project.
This might also be the best place to try TADkit for visualizing genomes in 3D together with interactions matrices and any other genomic track.
Lectures (pdf) | Core pipeline (notebooks) | Annex (notebooks) | |
---|---|---|---|
Day1 | |||
Day2 | |||
Day3 | |||
Day4 | |||
Day5 |
(provisional)
Day #1 | Monday, Sep 17th |
---|---|
09:30 - 10:00 | Welcome and introductions |
10:00 - 11:00 | Overview on structure determination |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | 3D modeling of genomes and genomic domains |
12:30 - 14:00 | Lunch Break |
14:00 - 15:00 | Introduction to Linux and Python: the Jupyter notebook |
15:00 - 16:00 | Next Generation Sequencing (NGS) and data handling |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | From raw data to Hi-C contact matrices |
Day #2 | *Tuesday, Sep 18th |
09:30 - 10:00 | Morning wrap-up: what have we done so far? |
10:00 - 11:00 | Chromatin structure and Hi-C data |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Integrative modeling applied to chromatin |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Biological applications (I) |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Hi-C contact matrices: filtering and normalization |
Day #3 | Wednesday, Sep 19th |
09:30 - 10:00 | Morning wrap-up: what have we done so far? |
10:00 - 11:00 | Biological applications (II) |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Compartment detection and analysis |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Topologically Associated Domains detection and analysis |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Comparison between experiments |
Day #4 | Thursday, Sep 20th |
09:30 - 10:00 | Morning wrap-up: what have we done so far? |
10:00 - 11:00 | Biological applications (III) |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | 3D Modeling of real Hi-C data with TADbit (I) |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | 3D Modeling of real Hi-C data with TADbit (II) |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Final wrap-up session |
Day #5 | Friday, Sep 21st |
09:30 - 10:00 | Morning wrap-up: what have we done so far? |
10:00 - 11:00 | Multiscale Genomics: From genomes to structures |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Nucleosome positioning and Nucleosome Dynamics |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Coarse-Grained DNA |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Chromatin Dynamics |
Feedback (0: not clear; 5: very clear) | ||
---|---|---|
Day1 | ||
Day2 | ||
Day3 | ||
Day4 | ||
Day5 |