Applied Bioinformatics
A two-day, hands-on workshop that takes you from raw FASTQ files to a finished variant and RNA-seq analysis. Eight modules, nine Jupyter notebooks, runnable shell pipelines, and a synthetic chr22 capstone dataset.
8
Modules
9
Notebooks
5
Pipeline scripts
1
Capstone dataset
What you'll leave with
- ▸Run a complete short-read pipeline: QC → trim → align → call → annotate → interpret.
- ▸Read and manipulate FASTA, FASTQ, SAM/BAM, VCF, BED, and GFF3 files with confidence.
- ▸Drive command-line tools (BWA, SAMtools, GATK, Trimmomatic, fastp) from reproducible shell and Python scripts.
- ▸Run a differential expression analysis end-to-end with DESeq2.
- ▸Diagnose pipeline failures from FastQC and MultiQC reports.
Who it's for
- ·PhD students and postdocs entering bioinformatics from a wet-lab or stats background.
- ·Data scientists rotating into a genomics team.
- ·Research software engineers who need to read and own existing pipelines.
Prerequisites
- ·Comfortable opening a terminal and running basic commands.
- ·Reading-level Python (loops, dicts, simple functions).
- ·No prior bioinformatics experience required.
- ·Laptop with conda or Docker — see conda_env.yml.
Day 1 · Raw data to aligned reads
Day 1 slides ↗Linux CLI for Bioinformatics
Pipes, grep, awk, and the shell scripting that powers every genomics pipeline.
Sequence Data Formats
FASTA, FASTQ, SAM/BAM, VCF, BED, GFF — the file formats genomics runs on.
Quality Control + Read Trimming
FastQC, MultiQC, Trimmomatic, fastp — diagnose and clean raw reads.
Read Alignment to Reference Genome
BWA-MEM2, SAMtools, and the mechanics of mapping short reads to a reference.
Day 2 · From aligned reads to biology
Day 2 slides ↗BAM Processing + Variant Calling
Picard, GATK4 HaplotypeCaller, BQSR, and variant filtering strategies.
RNA-seq: Quantification + Differential Expression
HISAT2, featureCounts/Salmon, and DESeq2 for differential gene expression.
Visualization + Pathway Analysis
IGV, matplotlib, Biopython, and pathway-level interpretation.
Capstone Project
End-to-end mini-analysis on a synthetic dataset: QC → align → call → annotate → interpret.
Downloads
Book this workshop for your lab or team
Public cohorts, private corporate training, and self-paced licenses available. Custom modules on single-cell, ATAC-seq, or long-read sequencing on request.