Archive
Reproducible and FAIR Bioinformatics Analysis of Omics Data
A training course for graduate students, post-doctoral trainees, and others who would like to incorporate bioinformatics into their biomedical research
- July 6-20, 2022
- MDI Biological Laboratory
- Contact Our Education Office
20220615
Overview
This course is an updated and extended introduction to our previous Applied Bioinformatics course. The renewed focus is on FAIR data – that is data that are Findable, Accessible, Interoperable and Reusable. This addresses a key initiative of the NIH and will prepare participants to benefit from the vast amount of publicly available biomedical data. We have maintained our emphasis on teaching students how to analyze gene expression data, because the skills required to analyze large transcriptomic data sets are rapidly transferable to proteomics and metabolomics.
The course begins with a complete introduction to the R statistical programming environment, and is designed throughout to be comfortable for participants who are new to R, bioinformatics and biostatistics. At the same time, the course is designed to be rewarding for participants with substantial experience in these areas, because each learning module includes exercises appropriate for beginner, intermediate and advanced students. A substantial amount of the course is dedicated to independent work on assigned problems. We have found that this approach leads to much higher levels of confidence and better retention of key concepts as long as challenges are appropriate to a specific student and students have plenty of access to knowledgeable teaching assistants. This class will have at least one teaching assistant for every six attendees.
The two week format of Reproducible and FAIR Bioinformatics Analysis of Omics Data enables students to build confidence in diverse areas including the following:
- Planning Omics Experiments
- Accessing the UNIX Environment
- Identifying Differentially Expressed Genes
- Pathway Analysis of Gene Expression Data
- Applying Machine-Learning and Data-Driven Approaches to Gene Expression Data
- Taking Advantage of Publicly Available Data
- Ensuring Rigor and Reproducibility
- Creating Publication Quality Visualizations of Complex Data
- Sharing Code and Data
- Analyzing Single-Cell RNA-seq Experiments
- Analyzing Microbiome Data
- Documenting Statistical Approach in a Publication
- Developing a Data Management Plan
Course Directors
Thomas H. Hampton
Geisel School of Medicine at Dartmouth CollegeBruce Stanton
Geisel School of Medicine at Dartmouth CollegeCourse Faculty
Sam Neff
Dartmouth CollegeRebecca Valls
Dartmouth CollegePamela Bagley
Dartmouth CollegeRichard Brittain
Dartmouth CollegeAndrew Creamer
Brown UniversityBritton Goodale
Geisel School of Medicine, Dartmouth CollegeKatja Koeppen
Boehringer IngelheimZhongyou Li
Geisel School of Medicine at DartmouthTodd MacKenzie
Geisel School of Medicine at DartmouthJaclyn Taroni
Childhood Cancer Data LabW. Kelley Thomas
University of New HampshireDevin Thomas
University of New HampshireInvited Speakers
Julia Oh
The Jackson LaboratoryGary Churchill
The Jackson LaboratoryJane Disney
MDI Biological LaboratoryBen Brown
BiosciencesStephanie Hicks
Johns Hopkins Bloomberg School of Public HealthCourtney (Kozul) Horvath
Novartis Institutes for BioMedical ResearchSchedule
Day 1
1:00 pm Course Introduction
2:00 pm Break
2:30 pm Defining your RNA-sequencing strategy
3:30 pm Break
4:00 pm Introduction to high-throughput data analysis
6:00 pm Dinner
7:00 pm Reception- Student Talks
Day 2
11:00 am Responsible Conduct of Research
12:00 pm Lunch
1:00 pm Introduction to R Studio
2:00 pm Break
2:30 pm R data types, exploratory statistics and graphs
3:30 pm Break
4:00 pm R logic loops and functions
6:00 pm Dinner
7:00 pm Promises and challenges in contemporary biology
Day 3
11:00 am Understanding GitHub
12:00 pm Lunch
1:00 pm Getting comfortable with UNIX server enviornments
2:00 pm Break
2:30 pm Installing open-source UNIX software
3:30 pm Break
4:00 pm Pre-processing RNA-sequencing data with fastp
6:00 pm Dinner
7:00 pm Collaborative open research: lessons learned from working reproducibility with others
Day 4
11:00 am Quantification with salmon
12:00 pm Lunch
1:00 pm Gene ID conversion in R
2:00 pm Break
2:30 pm Exploratory data analysis and normalization of transcriptomic data
3:30 pm Break
4:00 pm EdgeR and differential gene expression
6:00 pm Dinner
Day 5
11:00 am Over representation analysis
12:00 pm Lunch
1:00 pm Analyzing contingency tables
2:00 pm Break
2:30 pm GSEA and pathway activation analysis
3:30 pm Break
4:00 pm Online tools for gene set and pathway analysis
6:00 pm Dinner
7:00 pm How data drives translational medicine in drug development
Day 6
11:00 am Group project: RNA-sequencing data reanalysis
12:00 pm Lunch
1:00 pm Group project: RNA-sequencing data reanalysis
2:00 pm Break
2:30 pm RNA-sequencing data reanalysis
3:30 pm Break
4:00 pm RNA-sequencing data reanalysis
6:00 pm Dinner
7:00 pm Group project- presentations
Day 7
11:00 am Group project: Choosing a machine learning approach to match your research question
12:00 pm Lunch
1:00 pm Group project: Rigor and reproducibility in machine learning
2:00 pm Break
2:30 pm Hands on machine learning in R
3:30 pm Break
4:00 pm Office hours- MICR 150 students
6:00 pm Dinner
7:00 pm How data reuse is revolutionizing biomedical research
Day 8
11:00 am Group project: Findability, accessibility, interoperability and reusability in the biomedical context
12:00 pm Lunch
1:00 pm Group project: Tools to access publicly available transcriptomic databases
2:00 pm Break
2:30 pm Overview of rigor and reproducibility
3:30 pm Break
4:00 pm Data cleaning and merging multiple databases
6:00 pm Dinner
7:00 pm Integrating library and institutional data to facilitate management sharing and reuse (Reception)
Day 9
12:00 pm Lunch
1:00 pm Introduction to graded projects for MICR 150 students
2:00 pm Break
2:30 pm Linear models
3:30 pm Break
4:00 pm Modeling batch effects
6:00 pm Dinner
7:00 pm Identifying and avoiding technical bias on ‘omics research
Day 10
11:00 am Principle component analysis
12:00 pm Lunch
1:00 pm Factors that make transcriptomic factors difficult to reproduce
2:00 pm Break
2:30 pm Introduction to ggplot- basic syntax and data format
3:30 pm Break
4:00 pm Common ggplot geometries and statistics
6:00 pm Dinner
7:00 pm Statistical modeling for research scientists
Day 11
11:00 am Special ggplot modules- Dendextend and ComplexHeatmap
12:00 pm Lunch
1:00 pm Special ggplot modules- FactoExtra and GGally
2:00 pm Break
2:30 pm Pirate plot and corrplots
3:30 pm Break
4:00 pm R Markdown
6:00 pm Dinner
Day 12
11:00 am Writing a data management plan
12:00 pm Lunch
1:00 pm Reanalysis of publicly available data on a shiny web server
2:00 pm Break
2:30 pm Public repositories for ‘omics data
3:30 pm Break
4:00 pm Sharing metadata- how to annotate your experiment
6:00 pm Dinner
7:00 pm The human microbiome in health and disease
Day 13-A
11:00 am Introduction to single cell RNA-sequencing
12:00 pm Lunch
1:00 pm Data QC, filtering and normalization
2:00 pm Break
2:30 pm Feature selection and clustering
3:30 pm Break
4:00 pm Differential expression analysis
6:00 pm Dinner
7:00 pm Statistical and methodological approaches to increase the reproducibility and rigor of single cell RNA-sequencing
Day 13-B
11:00 am Introduction to microbiome analysis and experiment design
12:00 pm Lunch
1:00 pm Denoising and QC filtering
2:00 pm Break
2:30 pm Taxonomic analysis
3:30 pm Break
4:00 pm Phylogenic diversity analysis
6:00 pm Dinner
7:00 pm Statistical and methodological approaches to increase the reproducibility and rigor of single cell RNA-sequencing
Day 14
11:00 am Basic UNIX systems administration concepts for users
12:00 pm Lunch
1:00 pm Shell script basics
2:00 pm Break
2:30 pm Implementing data pipelines
3:30 pm Break
4:00 pm Course evaluations
6:00 pm Lobster bake!
Day 15
11:00 am Graded presentations
Tuition
2022 rates for in person course:
Students/post-docs: $1,750 USD
Faculty/professionals: $2,000 USD
Limited financial aid may be available for students with need. The funding request form is included in the online application.
CEUs
Students currently enrolled in the Molecular and Cellular Biology (MCB) Graduate Program at the Geisel School of Medicine at Dartmouth College may receive 1 full credit for completing this as an elective.
Funding
This research training opportunity is supported by a research education grant from the National Human Genome Research Institute of the National Institutes of Health under grant number R25 HG011447.