Read Alignment to Reference Genome
BWA-MEM2, SAMtools, and the mechanics of mapping short reads to a reference.
What this module covers
- ▸BWA-MEM2: index building, alignment, SAM output
- ▸SAMtools: sort, index, flagstat, view, idxstats
- ▸Alignment QC: coverage depth, mapping rate, insert size
- ▸Visualization in IGV
Start here — the data journey
live in your browser · no installWatch the data move through the pipeline below, then read on — each section has its own interactive explorer embedded right where the code builds that figure, so you can turn the knobs as you go.
From clean reads to a pileup
The trimmed, adapter-free FASTQ we finished Module 3 with. Each read is a short string with no idea where on the genome it came from — yet.
Built once per reference (FM-index / BWT). It turns 'where does this read occur?' from a whole-genome scan into a near-instant lookup.
Find short exact matches (seeds) between read and reference using the index. Cheap, and they pin down roughly where each read belongs.
Around each seed, run Smith-Waterman to extend the alignment over mismatches and indels. This is the algorithm you implement by hand in this module.
Each placed read becomes a record: reference position, a FLAG, mapping quality, and a CIGAR string (e.g. 75M2I73M) describing the match. BAM is the compressed form.
samtools sort orders reads by position; samtools index makes any region instantly seekable. flagstat then reports mapping rate, duplicates, proper pairs.
Stack the aligned reads at each position and count the bases. Coverage depth and per-position allele frequency fall out — the raw input for variant calling in Module 5.
Steps 2–4 are what bwa-mem2 does for you in one command; the rest is samtools. This module opens up the Extend step so the aligner stops being a black box.
The notebook — live & editable
runs in your browser · no installEvery section's code is already filled in below. Press the ▶ next to any cell (or Shift+Enter) to run it, edit it and run again, or hit Run all to execute the whole notebook top to bottom. No Python or Jupyter install needed — the kernel boots right here in your browser.