The goal of the Applied Bioinformatics Course is to provide hands-on training on major bioinformatics resources through the analysis of an RNA-Seq data set to find differentially expressed genes and investigate previously described functions of those genes and the pathways they are involved in.
Topics include web-based gene and protein resources, genome browsers, pathways and gene set enrichment analyses, and RNA-Seq data analysis. RNA-Seq data analysis will be conducted using CLC Genomics Workbench, the web-based Galaxy system, R statistical computing environment and Ingenuity Pathways Analysis. The course will feature several modules that will have written worked examples to demonstrate how to apply the major tools or resources featured in the module. Participants should have a strong background in molecular biology. Prior computer programming skills are not required, but participants need to have a strong interest in learning some programming concepts.
Course Directors
- Benjamin L. King, Ph.D.Assistant Professor of BioinformaticsUniversity of Maine
- Bruce A. Stanton, Ph.D.Andrew C. Vail Memorial Professor of Microbiology and ImmunologyGeisel School of Medicine at Dartmouth
Additional Faculty
Geisel School of Medicine at Dartmouth
The Jackson Laboratory
Geisel School of Medicine at Dartmouth
University of New Hampshire
Guest Speaker
BioTeam
Guest Speaker
CLCBio
Guest Speaker
Ingenuity Systems
Thursday, Oct. 9, Day 1 (Introduction)
5:00 pm – 6:00 pm – Registration and housing check in
6:00 pm – 7:00 pm Dinner (Dining Hall)
7:00 pm – 9:00 pm – Course Introduction and Overview: Introduction to Applied Bioinformatics (Ben King, Maren Auditorium)
- Boundaries with biology, statistics, computer science
- Contemporary biological examples
- Cell Biology
- Evolution
- Biomedical
- Statistical Challenges and Solutions
- Raw Computational Challenges and Solutions
- Problems of Data Representation and Solutions
Friday, Oct. 10, Day 2 (CLC Genomics Workbench)
7:00 am – 9:00 am Continental Breakfast (Dining Hall)
9:00 am – 10:30 am – Introduction to High-Throughput Sequencing (Kelley Thomas, Maren Auditorium)
- Technologies and Applications
- History
- Chemistry
- Instruments
- Costs
- High-level analysis workflow
10:30 am – 10:45 am – Break
10:45 am – 12:00 pm – Reference Genomes and Alignment Concepts (CLC Team, Dahlgren Hall)
- Navigating the CLC Genomics Workbench – Screen Elements, Display setup
- Next Generation Sequencing (NGS) data import
- unaligned reads (FASTQ, .sff etc.)
- aligned reads (SAM/BAM)
- Non-NGS data import
- Defining a reference genome
- Curating reference sequences with annotations of interest
- Working with Annotation Tracks
12:00 pm – 1:00 pm – Lunch (Dining Hall)
1:00 pm – 2:30 pm – CLC Genomics (CLC Team, Dahlgren Hall)
- Trimming and QC
- Read mapping to reference sequence(s)
- Exome or Amplicon sequencing – target enrichment and coverage analysis
- Variant detection, Filtering and Annotation
- Differential gene expression analysis
2:30 pm – 3:00 pm – Break
3:00 pm – 5:30 pm – CLC Genomics (CLC Team, Dahlgren Hall)
- De novo assembly
- Transcriptome assembly
- ChIP Seq – Peak detection
- Small RNA analysis
- BLAST – Find and compare genes, protein products and place contigs
- Workflow Automation- Visually Creating and Editing Analysis Pipelines
6:00 pm – 7:00 pm – Dinner (Dining Hall)
7:00 pm – 8:00 pm – Keynote Lecture, TBA (Kelley Thomas, Maren Auditorium)
Saturday, Oct. 11, Day 3 (Gene, Protein and Sequence Tools)
7:00 am – 9:00 am Continental Breakfast (Dining Hall)
9:00 am – 10:30 am – Gene, Protein and Sequence Resources (Gareth Howell + THH/KK/BG, Dahlgren Hall)
- NCBI Entrez system
- UniProt
- Gene Ontology
- miRNA data bases
- RNA-Seq data repositories
- NCBI Gene Expression Omnibus and EBI Array Express
- NCBI Short Read Archive EBI European Nucleotide Archive
10:30 am – 10:45 am – Break
10:45 am – 12:00 pm – Genome Browsers & Data Retrieval (Gareth Howell + THH/KK/BG, Dahlgren Hall)
- UCSC Genome Browser
- UCSC Table Browser
- Ensembl
- Biomart
12:00 pm – 1:00 pm – Lunch (Dining Hall)
1:00 pm – 2:30 pm – RNA-Seq Experimental Design & Workflow (Steven Munger, Maren Auditorium)
2:30 pm – 3:00 pm – Break
3:00 pm – 5:00 pm – Analysis of High Throughput Data (Tom Hampton, Maren Auditorium)
- Exploration
- Pairs
- Histograms
- Heatmaps
- Principle Component Analysis
- Normalization
- Express as a fraction of total
- Means, medians and quantiles
- Ranks
- Inference
- multiple tests
- CART models
- Comparison of multidimensional distances
6:00 pm – 7:00 pm – Dinner (Dining Hall)
Sunday, Oct. 12, Day 4 (R)
7:00 am – 9:00 am Continental Breakfast (Dining Hall)
9:00 am – 10:00 am – R Power Tools: Way Beyond Word & Excel (Tom Hampton, Maren Auditorium)
- Why R
- Packages: CRAN, Bioconductor
- Reproducible, “literate” statistics
- Rstudio
10:00 am – 10:45 am – R Statistical Computing Environment I (Tom Hampton + KK/BG, Dahlgren Hall)
- Basic Math, Stats and plots
10:45 am – 11:00 am – Break
11:00 am – 12:00 pm – R Statistical Computing Environment II (Tom Hampton + KK/BG, Dahlgren Hall)
- Variables and Functions
- Simulation
12:00 pm – 1:00 pm – Lunch (Dining Hall)
1:00 pm – 2:45 pm – EdgeR and Differential Expression (Katja Koeppen + THH/BG, Dahlgren Hall)
- Specify Design
- Normalization
- Estimating Common Dispersion
- Identify Differentially Expressed Genes
2:45 pm – 3:15 pm – Break
3:15 pm – 5:00 pm – Gene Set Enrichment in R (Britton Goodale + THH/KK, Dahlgren Hall)
- Concepts: Hypergeometric distribution
- Paths: KEGG
- Simulation & Results
6:00 pm – 7:00 pm – Dinner (Dining Hall)
Monday, Oct. 13, Day 5 (Ingenuity)
7:00 am – 9:00 am Continental Breakfast (Dining Hall)
9:00 am – 10:30 am – Pathway and Network Concepts (Stuart Tugendreich/Ingenuity, Maren Auditorium)
10:30 am – 10:45 am – Break
10:45 am – 12:00 pm – Ingenuity I (Ingenuity + THH/KK/BG, Dahlgren Hall)
- Load CLC data
- Load data from EdgeR analysis
- Load other data
12:00 pm – 1:00 pm – Lunch (Dining Hall)
1:00 pm – 2:00 pm – Ingenuity II (Ingenuity + THH/KK/BG, Dahlgren Hall)
- Canonical Paths & Networks
2:00 pm – 3:00 pm – Ingenuity III (Ingenuity + THH/KK/BG, Dahlgren Hall)
- Upstream Regulation and Path Editing
3:00 pm – 3:30 pm – Break
3:30pm – 6:00 pm – Cloud Computing Workshop Using Galaxy (Chris Dagdigian and Ben King, Maren Auditorium)
- Analyze RNA-Seq dataset using Tuxedo suite
6:00 pm – 7:00 pm – Dinner (Dining Hall)
7:00 pm – 8:00 pm – Large-Scale Computing for Genomics (Chris Dagdigian, Maren Auditorium)
Tuesday, Oct. 14, Day 6 (Machine Learning)
7:00 am – 9:00 am Continental Breakfast (Dining Hall)
9:00 am – 10:00 am – Beyond What is Known – Machine Learning (Casey Greene, Maren Auditorium)
10:00 am – 10:15 am – Break
10:15 am – 11:15 pm – Hands-on Machine Learning (Casey Greene + THH/KK/BG, Dahlgren Hall)
- PLGRIM
- IMP
- ScanGeo
11:15 am – 12:00 pm – Course Summary & Evaluations
12:00 pm – 1:00 pm – Lunch & Departure
Tuition includes the cost of on-campus housing for course attendees. Housing units are double occupancy dorm rooms and shared cottages. Family lodging is not available.
Some partial tuition fellowships are available for students and trainees, please indicate on the registration form if you wish to be considered.