MDI Biological Laboratory
Archive

Reproducible and FAIR Bioinformatics Analysis of Omics Data

A training course for graduate students, post-doctoral trainees, and others who would like to incorporate bioinformatics into their biomedical research

Apply Now
20220615

Overview

This course is an updated and extended introduction to our previous Applied Bioinformatics course. The renewed focus is on FAIR data – that is data that are Findable, Accessible, Interoperable and Reusable. This addresses a key initiative of the NIH and will prepare participants to benefit from the vast amount of publicly available biomedical data. We have maintained our emphasis on teaching students how to analyze gene expression data, because the skills required to analyze large transcriptomic data sets are rapidly transferable to proteomics and metabolomics.

The course begins with a complete introduction to the R statistical programming environment, and is designed throughout to be comfortable for participants who are new to R, bioinformatics and biostatistics. At the same time, the course is designed to be rewarding for participants with substantial experience in these areas, because each learning module includes exercises appropriate for beginner, intermediate and advanced students. A substantial amount of the course is dedicated to independent work on assigned problems. We have found that this approach leads to much higher levels of confidence and better retention of key concepts as long as challenges are appropriate to a specific student and students have plenty of access to knowledgeable teaching assistants. This class will have at least one teaching assistant for every six attendees.

The two week format of Reproducible and FAIR Bioinformatics Analysis of Omics Data enables students to build confidence in diverse areas including the following:

  • Planning Omics Experiments
  • Accessing the UNIX Environment
  • Identifying Differentially Expressed Genes
  • Pathway Analysis of Gene Expression Data
  • Applying Machine-Learning and Data-Driven Approaches to Gene Expression Data
  • Taking Advantage of Publicly Available Data
  • Ensuring Rigor and Reproducibility
  • Creating Publication Quality Visualizations of Complex Data
  • Sharing Code and Data
  • Analyzing Single-Cell RNA-seq Experiments
  • Analyzing Microbiome Data
  • Documenting Statistical Approach in a Publication
  • Developing a Data Management Plan

 

Course Directors

Course Faculty

Invited Speakers

Schedule

Day 1

1:00 pm Course Introduction

2:00 pm Break

2:30 pm Defining your RNA-sequencing strategy

3:30 pm Break

4:00 pm Introduction to high-throughput data analysis

6:00 pm Dinner

7:00 pm Reception- Student Talks

Day 2

11:00 am Responsible Conduct of Research

12:00 pm Lunch

1:00 pm Introduction to R Studio

2:00 pm Break

2:30 pm R data types, exploratory statistics and graphs

3:30 pm Break

4:00 pm R logic loops and functions

6:00 pm Dinner

7:00 pm Promises and challenges in contemporary biology

Day 3

11:00 am Understanding GitHub

12:00 pm Lunch

1:00 pm Getting comfortable with UNIX server enviornments

2:00 pm Break

2:30 pm Installing open-source UNIX software

3:30 pm Break

4:00 pm Pre-processing RNA-sequencing data with fastp

6:00 pm Dinner

7:00 pm Collaborative open research: lessons learned from working reproducibility with others

Day 4

11:00 am Quantification with salmon

12:00 pm Lunch

1:00 pm Gene ID conversion in R

2:00 pm Break

2:30 pm Exploratory data analysis and normalization of transcriptomic data

3:30 pm Break

4:00 pm EdgeR and differential gene expression

6:00 pm Dinner

Day 5

11:00 am Over representation analysis

12:00 pm Lunch

1:00 pm Analyzing contingency tables

2:00 pm Break

2:30 pm GSEA and pathway activation analysis

3:30 pm Break

4:00 pm Online tools for gene set and pathway analysis

6:00 pm Dinner

7:00 pm How data drives translational medicine in drug development

Day 6

11:00 am Group project: RNA-sequencing data reanalysis

12:00 pm Lunch

1:00 pm Group project: RNA-sequencing data reanalysis

2:00 pm Break

2:30 pm RNA-sequencing data reanalysis

3:30 pm Break

4:00 pm RNA-sequencing data reanalysis

6:00 pm Dinner

7:00 pm Group project- presentations

Day 7

11:00 am Group project: Choosing a machine learning approach to match your research question

12:00 pm Lunch

1:00 pm Group project: Rigor and reproducibility in machine learning

2:00 pm Break

2:30 pm Hands on machine learning in R

3:30 pm Break

4:00 pm Office hours- MICR 150 students

6:00 pm Dinner

7:00 pm How data reuse is revolutionizing biomedical research

Day 8

11:00 am Group project: Findability, accessibility, interoperability and reusability in the biomedical context

12:00 pm Lunch

1:00 pm Group project: Tools to access publicly available transcriptomic databases

2:00 pm Break

2:30 pm Overview of rigor and reproducibility

3:30 pm Break

4:00 pm Data cleaning and merging multiple databases

6:00 pm Dinner

7:00 pm Integrating library and institutional data to facilitate management sharing and reuse (Reception)

Day 9

12:00 pm Lunch

1:00 pm Introduction to graded projects for MICR 150 students

2:00 pm Break

2:30 pm Linear models

3:30 pm Break

4:00 pm Modeling batch effects

6:00 pm Dinner

7:00 pm Identifying and avoiding technical bias on ‘omics research

Day 10

11:00 am Principle component analysis

12:00 pm Lunch

1:00 pm Factors that make transcriptomic factors difficult to reproduce

2:00 pm Break

2:30 pm Introduction to ggplot- basic syntax and data format

3:30 pm Break

4:00 pm Common ggplot geometries and statistics

6:00 pm Dinner

7:00 pm Statistical modeling for research scientists

Day 11

11:00 am Special ggplot modules- Dendextend and ComplexHeatmap

12:00 pm Lunch

1:00 pm Special ggplot modules- FactoExtra and GGally

2:00 pm Break

2:30 pm Pirate plot and corrplots

3:30 pm Break

4:00 pm R Markdown

6:00 pm Dinner

Day 12

11:00 am Writing a data management plan

12:00 pm Lunch

1:00 pm Reanalysis of publicly available data on a shiny web server

2:00 pm Break

2:30 pm Public repositories for ‘omics data

3:30 pm Break

4:00 pm Sharing metadata- how to annotate your experiment

6:00 pm Dinner

7:00 pm The human microbiome in health and disease

Day 13-A

11:00 am Introduction to single cell RNA-sequencing

12:00 pm Lunch

1:00 pm Data QC, filtering and normalization

2:00 pm Break

2:30 pm Feature selection and clustering

3:30 pm Break

4:00 pm Differential expression analysis

6:00 pm Dinner

7:00 pm Statistical and methodological approaches to increase the reproducibility and rigor of single cell RNA-sequencing

Day 13-B

11:00 am Introduction to microbiome analysis and experiment design

12:00 pm Lunch

1:00 pm Denoising and QC filtering

2:00 pm Break

2:30 pm Taxonomic analysis

3:30 pm Break

4:00 pm Phylogenic diversity analysis

6:00 pm Dinner

7:00 pm Statistical and methodological approaches to increase the reproducibility and rigor of single cell RNA-sequencing

Day 14

11:00 am Basic UNIX systems administration concepts for users

12:00 pm Lunch

1:00 pm Shell script basics

2:00 pm Break

2:30 pm Implementing data pipelines

3:30 pm Break

4:00 pm Course evaluations

6:00 pm Lobster bake!

Day 15

11:00 am Graded presentations

Tuition

2022 rates for in person course:

Students/post-docs: $1,750 USD

Faculty/professionals: $2,000 USD

Limited financial aid may be available for students with need. The funding request form is included in the online application.

CEUs

Students currently enrolled in the Molecular and Cellular Biology (MCB) Graduate Program at the Geisel School of Medicine at Dartmouth College may receive 1 full credit for completing this as an elective.

MCB logo

Funding

This research training opportunity is supported by a research education grant from the National Human Genome Research Institute of the National Institutes of Health under grant number R25 HG011447.

NHGRI logo