sort
Category: ALIGNMENT
Sort BAM file by coordinate, queryname, or template-coordinate
Description
Sort a BAM file using high-performance external merge-sort.
This tool provides efficient BAM sorting with support for multiple sort orders:
SORT ORDERS:
coordinate Standard genomic coordinate sort (tid → pos → strand).
Use for IGV visualization, variant calling, fgumi review.
queryname Lexicographic read name sort (fast, default sub-sort).
queryname::lex Short alias for lexicographic ordering (same as above).
queryname::lexicographic Explicit lexicographic ordering (same as above).
queryname::natural Natural numeric ordering (samtools-compatible).
Use for fgumi zipper, template-level operations.
template-coordinate Template-level position sort for UMI grouping.
Use for fgumi group, fgumi dedup, and fgumi downsample input.
PERFORMANCE:
- 1.9x faster than samtools on template-coordinate sort
- Handles BAM files larger than available RAM via spill-to-disk
- Uses parallel sorting (–threads) for in-memory chunks
- Configurable temp file compression (–temp-compression)
- Default 768M per-thread memory limit (samtools-compatible); pass
--max-memory autoto detect system memory (opt-in)
EXAMPLES:
Sort for fgumi group input
fgumi sort -i aligned.bam -o sorted.bam –order template-coordinate
Sort by coordinate for IGV
fgumi sort -i input.bam -o sorted.bam –order coordinate
Sort by queryname for zipper
fgumi sort -i input.bam -o sorted.bam –order queryname
Multi-threaded sort (default 768M per thread)
fgumi sort -i input.bam -o sorted.bam –order template-coordinate –threads 8
Override the per-thread memory limit
fgumi sort -i input.bam -o sorted.bam -m 2GiB –threads 8
Opt in to auto-detected system memory (subtracts –memory-reserve)
fgumi sort -i input.bam -o sorted.bam -m auto –threads 8
Reserve extra memory for bwa mem running in a pipeline
fgumi sort -i input.bam -o sorted.bam –memory-reserve 12GiB –threads 4
Verify a BAM file is correctly sorted
fgumi sort -i sorted.bam –verify –order template-coordinate
Spread spill chunks across multiple temp dirs (round-robin, free-space aware)
fgumi sort -i in.bam -o out.bam -T /mnt/ssd1 -T /mnt/ssd2
Same via FGUMI_TMP_DIRS env var (PATH-style list)
FGUMI_TMP_DIRS=/mnt/ssd1:/mnt/ssd2 fgumi sort -i in.bam -o out.bam
Arguments
| Flag | Description | Default |
|---|---|---|
-i, --input <INPUT> | Input BAM file | required |
-o, --output <OUTPUT> | Output BAM file (required unless –verify is used) | |
--verify <VERIFY> | Verify the input file is correctly sorted (no output written) | false |
--order <ORDER> | Sort order | template-coordinate |
--key-types <KEY_TYPES> | Which optional lanes to keep in the template-coordinate sort key | |
-m, --max-memory <MAX_MEMORY> | Maximum memory for in-memory sorting | 768M |
--memory-reserve <MEMORY_RESERVE> | Memory to reserve for other processes when –max-memory=auto | auto |
--memory-per-thread <MEMORY_PER_THREAD> | Scale memory limit by thread count (samtools behavior) | true |
-T, --tmp-dir <TMP_DIRS> | Temporary directory for intermediate files. Repeatable | |
-@, --threads <THREADS> | Number of threads for parallel operations | 1 |
--compression-level <COMPRESSION_LEVEL> | Compression level for output BAM (0-12) | 1 |
--temp-compression <TEMP_COMPRESSION> | Compression level for temporary chunk files (0-9) | 1 |
--temp-codec <TEMP_CODEC> | Codec used for temporary spill chunks: zstd (default) or bgzf | zstd |
--write-index <WRITE_INDEX> | Write BAM index (.bai) alongside output | false |
--async-reader <ASYNC_READER> | Enable async userspace prefetch on the input BAM | false |