Linux CLI for Bioinformatics
Pipes, grep, awk, and the shell scripting that powers every genomics pipeline.
What this module covers
- ▸File system navigation, pipes, grep, awk, cut, sort, uniq
- ▸Working with compressed files (gzip, bgzip, tabix)
- ▸Driving shell from Python via subprocess and pathlib
- ▸Writing reproducible shell scripts (set -euo pipefail)
Workshop recording
The recorded walkthrough for this module will appear here.
Coming after the live workshop.
Try it live — sandbox shell
runs in your browser · no installNew to the command line? You don't need to install anything. This is a real (simulated) bash shell with the workshop's sample data already loaded. Work through the tasks on the right, or just type help to explore.
Start here — the data journey
live in your browser · no installWatch the data move through the pipeline below, then read on — each section has its own interactive explorer embedded right where the code builds that figure, so you can turn the knobs as you go.
The data journey — one pipeline, six small commands
1. genes.tsv A tab-separated file: one gene per row, with columns for chromosome, start, end and name — plus a header line on top.
2. grep -E '^chr' Keep only lines that start with 'chr' — i.e. real data rows. The '#' header line is dropped.
3. cut -f1 Slice out just the first field of every row — the chromosome. Everything else falls away.
4. sort Order the chromosomes alphabetically so identical values sit next to each other — required before we can count runs.
5. uniq -c Collapse each run of identical lines into a single line, prefixed by how many times it occurred.
6. sort -rn Sort by that leading count, numerically and in reverse — so the busiest chromosome lands on top. That's your answer.
output preview
#chrom start end name chr22 101 900 GENEA chr1 50 420 GENEB
chr22 101 900 GENEA chr1 50 420 GENEB chr22 980 1500 GENEC
chr22 chr1 chr22 chrX chr1
chr1 chr1 chr22 chr22 chrX
61 chr1 58 chr2 44 chr22 12 chrX
61 chr1 58 chr2 44 chr22 12 chrX
Each command does one job and pipes its output into the next — 240 rows narrow to a ranked count of 24 chromosomes. That's the whole philosophy of the shell: grep -E '^chr' | cut -f1 | sort | uniq -c | sort -rn
The notebook — live & editable
runs in your browser · no installEvery section's code is already filled in below. Press the ▶ next to any cell (or Shift+Enter) to run it, edit it and run again, or hit Run all to execute the whole notebook top to bottom. No Python or Jupyter install needed — the kernel boots right here in your browser.