Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

sort

Category: ALIGNMENT

Sort BAM file by coordinate, queryname, or template-coordinate

Description

Sort a BAM file using high-performance external merge-sort.

This tool provides efficient BAM sorting with support for multiple sort orders:

SORT ORDERS:

coordinate Standard genomic coordinate sort (tid → pos → strand). Use for IGV visualization, variant calling, fgumi review.

queryname Lexicographic read name sort (fast, default sub-sort). queryname::lex Short alias for lexicographic ordering (same as above). queryname::lexicographic Explicit lexicographic ordering (same as above). queryname::natural Natural numeric ordering (samtools-compatible). Use for fgumi zipper, template-level operations.

template-coordinate Template-level position sort for UMI grouping. Use for fgumi group, fgumi dedup, and fgumi downsample input.

PERFORMANCE:

  • 1.9x faster than samtools on template-coordinate sort
  • Handles BAM files larger than available RAM via spill-to-disk
  • Uses parallel sorting (–threads) for in-memory chunks
  • Configurable temp file compression (–temp-compression)
  • Default 768M per-thread memory limit (samtools-compatible); pass --max-memory auto to detect system memory (opt-in)

EXAMPLES:

Sort for fgumi group input

fgumi sort -i aligned.bam -o sorted.bam –order template-coordinate

Sort by coordinate for IGV

fgumi sort -i input.bam -o sorted.bam –order coordinate

Sort by queryname for zipper

fgumi sort -i input.bam -o sorted.bam –order queryname

Multi-threaded sort (default 768M per thread)

fgumi sort -i input.bam -o sorted.bam –order template-coordinate –threads 8

Override the per-thread memory limit

fgumi sort -i input.bam -o sorted.bam -m 2GiB –threads 8

Opt in to auto-detected system memory (subtracts –memory-reserve)

fgumi sort -i input.bam -o sorted.bam -m auto –threads 8

Reserve extra memory for bwa mem running in a pipeline

fgumi sort -i input.bam -o sorted.bam –memory-reserve 12GiB –threads 4

Verify a BAM file is correctly sorted

fgumi sort -i sorted.bam –verify –order template-coordinate

Spread spill chunks across multiple temp dirs (round-robin, free-space aware)

fgumi sort -i in.bam -o out.bam -T /mnt/ssd1 -T /mnt/ssd2

Same via FGUMI_TMP_DIRS env var (PATH-style list)

FGUMI_TMP_DIRS=/mnt/ssd1:/mnt/ssd2 fgumi sort -i in.bam -o out.bam

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input BAM filerequired
-o, --output <OUTPUT>Output BAM file (required unless –verify is used)
--verify <VERIFY>Verify the input file is correctly sorted (no output written)false
--order <ORDER>Sort ordertemplate-coordinate
--key-types <KEY_TYPES>Which optional lanes to keep in the template-coordinate sort key
-m, --max-memory <MAX_MEMORY>Maximum memory for in-memory sorting768M
--memory-reserve <MEMORY_RESERVE>Memory to reserve for other processes when –max-memory=autoauto
--memory-per-thread <MEMORY_PER_THREAD>Scale memory limit by thread count (samtools behavior)true
-T, --tmp-dir <TMP_DIRS>Temporary directory for intermediate files. Repeatable
-@, --threads <THREADS>Number of threads for parallel operations1
--compression-level <COMPRESSION_LEVEL>Compression level for output BAM (0-12)1
--temp-compression <TEMP_COMPRESSION>Compression level for temporary chunk files (0-9)1
--temp-codec <TEMP_CODEC>Codec used for temporary spill chunks: zstd (default) or bgzfzstd
--write-index <WRITE_INDEX>Write BAM index (.bai) alongside outputfalse
--async-reader <ASYNC_READER>Enable async userspace prefetch on the input BAMfalse