The goal of the Applied Bioinformatics Course is to provide hands-on training on major bioinformatics resources through the analysis of an RNA-Seq data set to find differentially expressed genes and investigate previously described functions of those genes and the pathways in which they are involved.
Topics include web-based gene and protein resources, genome browsers, pathways and gene set enrichment analyses, and RNA-Seq data analysis. RNA-Seq data analysis will be conducted using CLC Genomics Workbench, the web-based Galaxy system, R statistical computing environment and Ingenuity Pathways Analysis. The course will feature several modules that will have written worked examples to demonstrate how to apply the major tools or resources featured in the module. Participants should have a strong background in molecular biology. Prior computer programming skills are not required, but participants need to have a strong interest in learning some programming concepts.
Course Directors
- Benjamin L. King, Ph.D.Assistant Professor of BioinformaticsUniversity of Maine
- Bruce A. Stanton, Ph.D.Andrew C. Vail Memorial Professor of PhysiologyDartmouth Medical School
Faculty
- Britton Goodale, Ph.D.Postdoctoral FellowGeisel School of Medicine, Dartmouth College
- Thomas H. Hampton, Ph.D.Senior Bioinformatics AnalystGeisel School of Medicine at Dartmouth
- Katja Koeppen, Ph.D.Research ScientistGeisel School of Medicine, Dartmouth College
- W. Kelley Thomas, Ph.D.Director, Hubbard Center for Genome StudiesUniversity of New Hampshire
Additional Faculty
Guest Speaker
BioTeam
Sample schedule, subject to change
Day 1 (Introduction)
4:00 pm – 5:00 pm – Housing check in
6:00 pm – 7:00 pm Dinner
7:00 pm – 9:00 pm – Course Introduction and Overview
- Boundaries with biology, statistics, computer science
- Contemporary biological examples
- Cell Biology
- Evolution
- Biomedical
- Statistical Challenges and Solutions
- Raw Computational Challenges and Solutions
- Problems of Data Representation and Solutions
Day 2 (Sequence Analysis)
7:00 am – 9:00 am Self-Serve Breakfast
9:00 am – 10:30 am – Introduction to High-Throughput Sequencing
- Technologies and Applications
- History
- Chemistry
- Instruments
- Costs
- High-level analysis workflow
10:30 am – 10:45 am – Break
10:45 am – 12:00 pm – Overview of Sequence Analysis Workflow
- Workflow outline
- Read diagnostics
- Trimming
- Read alignment
- Visualization of alignments
- Common file formats
- unaligned reads (FASTQ, .sff etc.)
- aligned reads (SAM/BAM)
12:00 pm – 1:00 pm – Lunch
1:00 pm – 2:30 pm – Read Alignment and Analysis Workshop (Part 1)
- Importing reads
- Trimming and QC
- Read mapping to reference sequence(s)
2:30 pm – 3:00 pm – Break
3:00 pm – 6:00 pm – Read Alignment and Analysis Workshop (Part 2)
- Quantifying gene expression
- Variant detection
- Visualization of aligned reads
6:00 pm – 7:00 pm – Dinner
7:00 pm – 8:00 pm – Keynote Lecture
Day 3 (Gene, Protein and Sequence Tools)
7:00 am – 9:00 am Self-Serve Breakfast
9:00 am – 10:30 am – Gene, Protein and Sequence Resources
- NCBI Entrez system
- UniProt
- Gene Ontology
- miRNA data bases
- RNA-Seq data repositories
- NCBI Gene Expression Omnibus and EBI Array Express
- NCBI Short Read Archive EBI European Nucleotide Archive
10:30 am – 10:45 am – Break
10:45 am – 12:00 pm – Genome Browsers & Data Retrieval
- UCSC Genome Browser
- UCSC Table Browser
- Ensembl
- Biomart
12:00 pm – 1:00 pm – Lunch
1:00 pm – 2:00 pm – Analysis of High Throughput Data
- Exploratory Analysis
- Normalization
- Inference
2:00 pm – 2:45 pm – R Power Tools: Way Beyond Word & Excel
- Why R
- Packages: CRAN, Bioconductor
- Reproducible, “literate” statistics
2:45 pm – 3:15 pm – Break
3:15 pm – 4:00 pm – Introduction to R Studio
4:00 pm – 5:00 pm – R Statistical Computing Environment I
- Basic Math, Stats and plots
6:00 pm – 7:00 pm – Dinner (Dining Hall)
7:00 pm – 8:00 pm – Large-Scale Computing for Genomics
Day 4 (R Statistical Computing Environment)
7:00 am – 9:00 am Self-Serve Breakfast
9:00 am – 10:00 am – R Statistical Computing Environment II
- Variables and Functions
- Simulation
10:00 am – 10:45 am – Advanced R and Exploratory Data Analysis I
- Introduction to dataset visualization
10:45 am – 11:00 am – Break
11:00 am – 12:00 pm – Advanced R and Exploratory Data Analysis II
- PCA
- Clustering
12:00 pm – 1:00 pm – Lunch
1:00 pm – 2:45 pm – EdgeR and Differential Expression
- Specify Design
- Normalization
- Estimating Common Dispersion
- Identify Differentially Expressed Genes
2:45 pm – 3:15 pm – Break
3:15 pm – 5:00 pm – Gene Set Enrichment
- Concepts: Hypergeometric distribution
- Gene Ontology and KEGG Pathway annotation
6:00 pm – 7:00 pm – Dinner
Day 5 (Hands-on Bioinformatics & Ingenuity Pathway Analysis)
7:00 am – 9:00 am Self-Serve Breakfast
9:00 am – 10:45 am – Small Group Exercise Exploratory Analysis in R
- Practice exploratory data analysis in R using an example dataset
- Submit at least one figure + explanation
10:45 am – 11:00 am – Break
11:00 am – 12:00 pm – Small Group Exercise Exploratory Analysis in R
- Group presentations of results
12:00 pm – 1:00 pm – Lunch
1:00 pm – 3:00 pm Ingenuity Pathway Analysis
- Pathway analysis of EdgeR-identified differentially expressed genes
- Canonical paths, networks and upstream regulators
3:00 pm – 3:30 pm – Break
3:30 pm – 6:00 pm – Your
- Applied Bioinformatics Consultation Clinic
6:00 pm – 7:00 pm – Dinner (Dining Hall)
7:00 pm – 8:30 pm – Applied Bioinformatics Real World Examples
Day 6 (Machine Learning)
7:00 am – 9:00 am Self-Serve Breakfast
9:00 am – 10:00 am – Beyond What is Known – Machine Learning
10:00 am – 10:15 am – Break
10:15 am – 11:15 pm – Hands-on Machine Learning
- PLGRIM
- IMP
- ScanGeo
11:15 am – 12:00 pm – Course Summary & Evaluations
12:00 pm – 1:00 pm – Lunch & Departure
On-campus housing is included in tuition.