Bioinformatics and Data Science Part II Spring 2023
BIOL 792-1036
Prof: Julie Allen; SFB 206; [email protected]
Class: Tuesdays and Thursdays 3:00 - 4:15; PE 102
Office Hours: By appointment
The nature of biological datasets have changed dramatically in the last few decades. The need for bioinformatic and data science skills is rapidly growing. The goals of the second part of this two part series is to continue building on the linux and python skills the students learned in the first semester and to add an understanding of data science and tools for managing large datasets. The course will focus on python programming and working in the shell along with introduction to data standards and version control, tools for cleaning dirty data, data visualization, relational databases and working with High Performance Clusters (HPCs).
With an understanding of how to integrate different data sources we will increase not only the creativity of our science, but also expand our ability to do more broad-scale research. A prerequisite for this course is enrollment as an M.S. or PhD student and have taken Data_Science_For_Biology_I. If you have not taken this course email me - to determine eligibility. The course will be capped at 15 students.
The goal of the course is to learn data science tools/tricks and hacks from a bioinformatics angle. By the end of the course you should feel comfortable with the tools data scientists use in biology and be able to solve and/or trouble shoot both small and large-scale data challenges in biology.
All readings, lab instructions, datasets, etc. will be available here.
Because this is a graduate class, I expect full attendance and participation, including all in class exercises, homework, and projects.
Homework assignments (40%) Assignments will involve working in Unix, writing simple Python scripts, and other small assignments given during each module. These will be working with data sets that will be provided over the course of the semester. Assignments will be evaluated based on completion. You can work in teams of 2 or 3 but will turn in your own notes and scripts for each assignment. More guidelines on these files and each specific assignment will be available on github.
Participation (20%) Participation entails showing up for class, prepared and doing your best to work through assigned tasks and programming example problems. Becasue all classes build on previous classes if you need to miss a class contact me. Some of the material we cover might be easy and quick to figure out. Other material and tasks will present roadblocks that are more difficult. We are building a positive community in this class, your attitude and helpfulness will be evaluated.
Independent project (40%) Everyone will be responsible for an independent project (this can be done either individually, or as a group no more than 2 people). The goal of your semester project is to incorporate the tools learned in this classroom into a project of your design. Ideally this will be something related to your research and will help you move your PhD forward, but you could decide to work on new project. A requirement of the project will be to incorporate at least 2 tools learned in the class to resolve a biological question or computational problem. You will turn in a 1-2 page write up of the project and how you will solve it by week 6. On the last day of class you will turn in a one to three page write up of the project, put the documented code on github, (or submit to me) and present your project in a 10-15 min presentation the last day of class.
- 1-2 page White Paper: The 1-2 page write up should be similar format to a whtie paper. Therefore there should be an introduction to the biological or other type of problem you are trying to solve (with references), just like a white paper followed by a methods section. The methods will fully describe your plan. For example "I will write a python script to take the data from a phyllip format to a fasta format". There should be two techniques from the class used (e.g. python, shell scripts, Github, Relational Database, Cleaning Data).
-
1-2 page Project Paper: The 1-2 page final paper should be similar format to the whiite paper but added results and discussion section. Explain in detail the tools from class you used. In the diiscussion talk about how this helped your project and what you would do next and what you leaarned.
-
10 - 15 min presentation: On the last day of class each of you will present your project to the class. No more than 15 min each - Feel free to show GitHub repos anad or run code in class.
*this is the tentative outline of the schedule – the events may change according to the speed and needs of the students in the course the course is going to be set up into 5 parts
Week | Month | Date | Day | Class | Due |
---|---|---|---|---|---|
1 | Jan | 24 | Tues | Course intro | |
1 | Jan | 26 | Thurs | Part 1 Linux Refresh | |
2 | Jan | 31 | Tues | Go over homework 1 and start Part 2 Version Control with Git | Homework_1 Linux_Refresh |
2 | Feb | 2 | Thurs | Tracking Changes | |
3 | Feb | 7 | Tues | Exploring History Gitignore, Remotes in Github, Practice | |
3 | Feb | 9 | Thurs | Collaborating | |
4 | Feb | 14 | Tues | no class work on projects | |
4 | Feb | 16 | Thurs | no class Conflicts - work on Homework 2 | |
5 | Feb | 21 | Tues | work on projects | |
5 | Feb | 23 | Thurs | Work on Homework 2 | |
6 | Feb | 28 | Tues | Version Control Finish/Introduction to Programming/Notebooks | Homework_2 Github |
6 | Mar | 2 | Thurs | Work on Homework 3 | |
7 | Mar | 7 | Tues | Intro to Pandas Chandra Sarkar | Homework_3 Python Refres |
7 | Mar | 9 | Thurs | Pandas | |
8 | Mar | 14 | Tues | Part 4 Data Visualization - ggplot2 Avery Grant | Homework_4 Pandas |
8 | Mar | 16 | Thurs | GGplot Avery Grant | |
SB | Mar | 21 - 23 | Tues-Thurs | Spring Break | |
9 | Mar | 28 | Tues | Data Visualization - Bobby del Carlo | Homework_5 DV ggplot |
9 | Mar | 30 | Thurs | Data Visualization | Data Vis Homework 6 |
10 | Apr | 4 | Tues | Part 5 Data Science + Open Refine | *1-2 Page Project Writeup Due |
10 | Apr | 6 | Thurs | MetaData Relational Databases - Sqlite | |
11 | Apr | 11 | Tues | Sqlite | Homework_7_Open Refine |
11 | Apr | 13 | Thurs | Sqlite + homework | |
12 | Apr | 18 | Tues | Working with HPCs | Homework_8 Sqlite |
13 | Apr | 20 | Thurs | Slurm scripts aTRAM intro | |
15 | Apr | 25 | Tues | Working with Pronghorn | |
15 | Apr | 27 | Thurs | Machine Learning | |
16 | May | 2 | Tues | Machine Learning | |
16 | May | 4 | Thurs | Project Prep | Homework_9 High Performance Clusters |
17 | May | 9 | Tues | Project presentations | *presentations due |
"Cheating, plagiarism or otherwise obtaining grades under false pretenses constitute academic dishonesty according to the code of this university. Academic dishonesty will not be tolerated and penalties can include canceling a student's enrollment without a grade, giving an F for the course or for the assignment. For more details, see the University of Nevada, Reno General Catalog."
Statement of Disability Services For Traditional and Seated Classrooms: “Any student with a disability needing academic adjustments or accommodations is requested to speak with me or the Disability Resource Center (Pennington Achievement Center Suite 230) as soon as possible to arrange for appropriate accommodations.”
"Surreptitious or covert video-taping of class or unauthorized audio recording of class is prohibited by law and by Board of Regents policy. This class may be videotaped or audio recorded only with the written permission of the instructor. In order to accommodate students with disabilities, some students may be given permission to record class lectures and discussions. Therefore, students should understand that their comments during class may be recorded."
The University of Nevada, Reno is committed to providing a safe learning and work environment for all. If you believe you have experienced discrimination, sexual harassment, sexual assault, domestic/dating violence, or stalking, whether on or off campus, or need information related to immigration concerns, please contact the University's Equal Opportunity & Title IX office at 775-784-1547. Resources and interim measures are available to assist you. For more information, please visit the
Statement on Academic Success Services Your student fees cover usage of the Math Center (775) 784-4433, Tutoring Center (775) 784-6801, and University Writing Center (775) 784-6030. These centers support your classroom learning; it is your responsibility to take advantage of their services. Keep in mind that seeking help outside of class is the sign of a responsible and successful student.