# Module 1: Exercises

Work through these at your own pace. Solutions are in solutions.sh — try each one
before looking.

## Exercise 1 — File exploration
Navigate to `data/example/` and list all files showing their sizes in human-readable
format. How many lines are in `example.fastq`?

```bash
# your command here
```

## Exercise 2 — Counting reads
A FASTQ file has 4 lines per read. How many reads are in `data/example/example.fastq`?
Write a one-liner using `wc -l` and arithmetic.

```bash
# hint: total_lines / 4 = read count
```

## Exercise 3 — Extracting headers
Print only the header lines (lines starting with `@`) from the FASTQ file.
How many unique instrument names appear?

```bash
# hint: grep + cut + sort + uniq
```

## Exercise 4 — Working with compressed files
`data/raw/sample_R1.fastq.gz` is gzip-compressed. Count its reads WITHOUT
decompressing it to disk.

```bash
# hint: zcat
```

## Exercise 5 — Extracting columns
The file `data/example/genes.tsv` is tab-separated with columns:
  gene_id, gene_name, chromosome, start, end, strand
Extract only gene_name and chromosome, sorted alphabetically by chromosome.

```bash
# hint: cut -f | sort
```

## Exercise 6 — Grep + regex
Find all genes on chromosome X or Y in genes.tsv.
Then find any genes whose names start with "BRCA".

```bash
# hint: grep -E
```

## Exercise 7 — Counting occurrences
How many genes are on each chromosome? Output sorted by count (most to least).

```bash
# hint: cut | sort | uniq -c | sort -rn
```

## Exercise 8 — Write a loop
Write a bash for-loop that prints "Processing sample: <name>" for the samples
SRR7890001, SRR7890002, SRR7890003. Then modify it to create a directory for
each sample under results/.

```bash
#!/usr/bin/env bash
# your loop here
```