# Instructor Notes

## Before the Workshop

### 1 week out
- Send participants the setup instructions (environment/setup.sh)
- Test environment on a fresh machine — installation takes ~20 min
- Check all tool versions in conda_env.yml are still current
- Run download_reference.sh and confirm chr22 files are intact

### Day before
- Run all pipeline scripts end-to-end with sample data
- Pre-download chr22 reference so participants can copy it (save 5 min on Day 1)
- Set up shared folder / USB drive with:
  - Miniconda installer (all platforms)
  - Pre-built conda package cache
  - chr22.fa + chr22.fa.fai
  - chr22.gtf
  - BWA-MEM2 pre-built index (chr22.fa.bwt.2bit.64 etc.)

### Morning of Day 1
- Arrive 30 min early
- Test projector + Jupyter lab on your machine
- Make sure WiFi can handle N participants downloading simultaneously
  (or serve files locally with `python3 -m http.server`)

## Common Issues + Fixes

### "conda activate not working"
```bash
# Run this first:
conda init bash && source ~/.bashrc
conda activate bioworkshop
```

### "Java not found" (Picard / GATK)
```bash
conda install -c conda-forge openjdk -n bioworkshop
```

### "BWA-MEM2 index not found"
The index must be in the same directory as the .fa file.
Files: chr22.fa.bwt.2bit.64, chr22.fa.pac, chr22.fa.sa, chr22.fa.ann, chr22.fa.amb

### Low RAM machines (< 8GB)
- Reduce --threads to 2 everywhere
- Use samtools sort -m 1G to limit sort memory
- Module 5 GATK may be slow — demo on projector while participants follow along

### Windows participants
- Must use WSL2 (not Git Bash / Cygwin)
- Docker fallback: `docker run -p 8888:8888 bioworkshop-image`

## Timing Tips

- Module 1 often runs short (people find CLI easy or hard in equal measure)
  - If fast: introduce `tmux` for running multiple pipelines at once
  - If slow: skip exercise 8, assign as homework
- Module 4 alignment is the longest — start it running, then explain while it runs
- Day 2 Module 5 (GATK) is compute-heavy — run on instructor machine + project output
- Module 8 capstone: keep Tier 1 mandatory, Tier 2/3 optional for advanced learners

## Energy Management
- Day 1 break at 10:30 is crucial — people are absorbing a lot
- Day 2 break at 11:20 — post-DESeq2 processing gives natural pause
- Keep energy up: quick "what did we just learn?" check-ins after each module

## Q&A to anticipate

Q: "What's the difference between BWA-MEM and BWA-MEM2?"
A: MEM2 is 3-4x faster, same algorithm, uses SIMD/AVX instructions. Always use MEM2.

Q: "Why not just use STAR instead of HISAT2?"
A: Both are good. HISAT2 uses less RAM (8GB vs 30GB for genome-wide), which matters
   in this setting. STAR is faster on large genomes with enough RAM.

Q: "Should I use featureCounts or Salmon?"
A: Salmon (quasi-mapping) is faster and good for well-annotated genomes. featureCounts
   gives you a BAM file you can inspect visually, which is better for teaching.

Q: "When would I use hard filtering vs VQSR?"
A: VQSR needs many samples (30+ for SNPs, 50+ for indels). Single-sample or small
   cohorts: use hard filtering or DeepVariant.