Workshop · 2 days · 4 hours each2026

Applied Bioinformatics

A two-day, hands-on workshop that takes you from raw FASTQ files to a finished variant and RNA-seq analysis. Eight modules, nine Jupyter notebooks, runnable shell pipelines, and a synthetic chr22 capstone dataset.

Modules

Notebooks

Pipeline scripts

Capstone dataset

What you'll leave with

▸Run a complete short-read pipeline: QC → trim → align → call → annotate → interpret.
▸Read and manipulate FASTA, FASTQ, SAM/BAM, VCF, BED, and GFF3 files with confidence.
▸Drive command-line tools (BWA, SAMtools, GATK, Trimmomatic, fastp) from reproducible shell and Python scripts.
▸Run a differential expression analysis end-to-end with DESeq2.
▸Diagnose pipeline failures from FastQC and MultiQC reports.

Who it's for

·PhD students and postdocs entering bioinformatics from a wet-lab or stats background.
·Data scientists rotating into a genomics team.
·Research software engineers who need to read and own existing pipelines.

What you need

▸Just a web browser. Every module — the shell, the notebooks, the visualizations — runs right here on this site. Nothing to install, no accounts, no setup before you start.
·No prior bioinformatics experience required.
·Reading-level Python helps, but the live notebooks walk you through every cell.
·Optional, for later: want to run the full pipelines on your own hardware? The conda_env.yml in Downloads rebuilds the toolset locally.

Day 1 · Raw data to aligned reads

Day 1 slides ↗

Module 109:00 – 09:45·45 min

Linux CLI for Bioinformatics

Pipes, grep, awk, and the shell scripting that powers every genomics pipeline.

Module 209:45 – 10:30·45 min

Sequence Data Formats

FASTA, FASTQ, SAM/BAM, VCF, BED, GFF — the file formats genomics runs on.

Module 310:30 – 11:30·60 min

Quality Control + Read Trimming

FastQC, MultiQC, Trimmomatic, fastp — diagnose and clean raw reads.

Module 411:30 – 13:00·90 min

Read Alignment to Reference Genome

BWA-MEM2, SAMtools, and the mechanics of mapping short reads to a reference.

Day 2 · From aligned reads to biology

Day 2 slides ↗

Module 509:00 – 10:10·70 min

BAM Processing + Variant Calling

Picard, GATK4 HaplotypeCaller, BQSR, and variant filtering strategies.

Module 610:10 – 11:20·70 min

RNA-seq: Quantification + Differential Expression

HISAT2, featureCounts/Salmon, and DESeq2 for differential gene expression.

Module 711:20 – 12:10·50 min

Visualization + Pathway Analysis

IGV, matplotlib, Biopython, and pathway-level interpretation.

Module 812:10 – 13:00·50 min

Capstone Project

End-to-end mini-analysis on a synthetic dataset: QC → align → call → annotate → interpret.

Downloads

Everything you need to learn is already on the module pages — these are extras for going deeper, teaching the workshop, or rebuilding the pipelines on your own machine.

Workshop README Full schedule (markdown)conda_env.yml requirements.txt Environment setup script Instructor notes How to teach this workshop (PDF)Example FASTA / FASTQ Capstone synthetic dataset

Book this workshop for your lab or team

Public cohorts, private corporate training, and self-paced licenses available. Custom modules on single-cell, ATAC-seq, or long-read sequencing on request.

Email to book Contact form