Day 1 · Module 109:00 – 09:45·45 min

Linux CLI for Bioinformatics

Pipes, grep, awk, and the shell scripting that powers every genomics pipeline.

What this module covers

  • File system navigation, pipes, grep, awk, cut, sort, uniq
  • Working with compressed files (gzip, bgzip, tabix)
  • Driving shell from Python via subprocess and pathlib
  • Writing reproducible shell scripts (set -euo pipefail)
Download .ipynbExercisesSolutions (bash)

Workshop recording

🎥

The recorded walkthrough for this module will appear here.

Coming after the live workshop.

Try it live — sandbox shell

runs in your browser · no install

New to the command line? You don't need to install anything. This is a real (simulated) bash shell with the workshop's sample data already loaded. Work through the tasks on the right, or just type help to explore.

student@sandbox: ~ — bash
student@sandbox:~$

Start here — the data journey

live in your browser · no install

Watch the data move through the pipeline below, then read on — each section has its own interactive explorer embedded right where the code builds that figure, so you can turn the knobs as you go.

The data journey — one pipeline, six small commands

| | | | |

1. genes.tsv A tab-separated file: one gene per row, with columns for chromosome, start, end and name — plus a header line on top.

output preview

#chrom  start  end   name
chr22   101    900   GENEA
chr1    50     420   GENEB

Each command does one job and pipes its output into the next — 240 rows narrow to a ranked count of 24 chromosomes. That's the whole philosophy of the shell: grep -E '^chr' | cut -f1 | sort | uniq -c | sort -rn

The notebook — live & editable

runs in your browser · no install

Every section's code is already filled in below. Press the ▶ next to any cell (or Shift+Enter) to run it, edit it and run again, or hit Run all to execute the whole notebook top to bottom. No Python or Jupyter install needed — the kernel boots right here in your browser.

Python kernel — not started
first run downloads the runtime (~once, a few seconds)open in full Jupyter ↗
Heads up: this module's pipeline uses command-line tools (e.g. bwa, samtools) that aren't available in the browser kernel. The Python cells run here; tool/shell lines print a note instead.
Loading notebook…