Migration from fgbio
fgumi is the Rust successor to fgbio for UMI-based tools. This guide maps fgbio tools to their fgumi equivalents and highlights key differences.
Command Mapping
| fgbio Tool | fgumi Command | Notes |
|---|---|---|
ExtractUmisFromBam | extract | Extracts directly from FASTQ (not BAM) |
CorrectUmis | correct | |
ZipperBams | zipper | Also replaces picard MergeBamAlignment; accepts SAM or BAM input |
SortBam | sort | Adds template-coordinate sort order with optional cell barcode key |
GroupReadsByUmi | group | Same strategies: identity, edit, adjacency, paired |
CallMolecularConsensusReads | simplex | |
CallDuplexConsensusReads | duplex | |
CallCodecConsensusReads | codec | |
FilterConsensusReads | filter | |
ClipBam | clip | |
CollectDuplexSeqMetrics | duplex-metrics | |
| (no equivalent) | simplex-metrics | New: simplex QC metrics (yield, family sizes, UMI counts) |
| (samtools merge) | merge | k-way merge of pre-sorted BAMs; supports all sort orders |
ReviewConsensusVariants | review |
Key Differences
Input Format
fgbio’s ExtractUmisFromBam takes an unmapped BAM as input. fgumi’s extract takes FASTQ files directly, which is more common in practice and avoids an unnecessary BAM conversion step.
Streaming Pipeline
fgumi supports Unix pipe-based streaming for the alignment workflow:
fgumi fastq --input unaligned.bam \
| bwa mem -p -K 150000000 -Y ref.fa - \
| fgumi zipper --unmapped unaligned.bam \
| fgumi sort --output sorted.bam --order template-coordinate
This replaces multiple separate fgbio/picard steps (SortBam, ZipperBams/MergeBamAlignment) with a single streaming pass. fgumi zipper accepts SAM or BAM on stdin or via --input; for best performance, pipe uncompressed BAM from the aligner (e.g. bwa-mem3 mem --bam=0).
Merging Multiple BAMs
fgbio users who relied on samtools merge to combine per-lane BAMs before grouping should use
fgumi merge instead. It performs an equivalent k-way merge and correctly handles
template-coordinate order with cell barcodes:
# fgbio/samtools workflow
samtools merge -n merged.bam lane1.bam lane2.bam lane3.bam
# fgumi equivalent (also supports template-coordinate and queryname sort orders)
fgumi merge --order template-coordinate --output merged.bam \
lane1.bam lane2.bam lane3.bam
If you produce a queryname-sorted output from fgumi merge (or from any
other source — fgumi extract, samtools sort -n, etc.), insert a
fgumi sort --order template-coordinate step before fgumi group,
fgumi dedup, or fgumi downsample. Unlike fgbio’s GroupReadsByUmi,
fgumi group does not sort internally — it requires its input to be
template-coordinate sorted with the SS:template-coordinate header tag,
and rejects any other sort order with an actionable error.
Simplex QC Metrics
fgbio has no equivalent to fgumi simplex-metrics. This command provides yield curves,
family size distributions, and UMI frequency statistics specifically for simplex sequencing
experiments, analogous to what duplex-metrics provides for duplex experiments.
Threading Model
fgumi uses a multi-threaded pipeline architecture where reading, processing, and writing happen
concurrently. Most commands accept --threads to control parallelism. See
Performance Tuning for details.
Grouping Strategies
fgumi supports the same four UMI assignment strategies as fgbio:
identity— exact UMI matching onlyedit— edit-distance clusteringadjacency— directional adjacency (recommended for most use cases)paired— paired adjacency for duplex workflows
The algorithms are equivalent but fgumi’s implementations are optimized for throughput.
Group Metrics
fgumi’s group command now produces a third metrics file beyond family sizes and grouping
metrics: position_group_sizes.txt, a histogram of how many UMI families appear at each
genomic position. This has no fgbio equivalent but is useful for detecting UMI exhaustion or
abnormal duplication patterns.
Use the --metrics PREFIX flag to write all three files in one step.
Metrics Compatibility
fgumi’s simplex and duplex stats output uses the same three-column key-value format as
fgbio’s CallMolecularConsensusReads, allowing direct comparison with fgumi compare metrics.
Sort Orders
fgumi’s sort command supports the same sort orders as fgbio:
coordinate— standard genomic coordinate sortqueryname— sort by read nametemplate-coordinate— sort by template 5’ positions (required input forgroup)
For single-cell data, fgumi sort --order template-coordinate automatically includes the CB
cell barcode tag in the sort key so that templates from different cells at the same locus are not
interleaved. fgbio’s template-coordinate sort does not support this.
Rejects BAM Sort Order
When --rejects is enabled on simplex, duplex, codec, or correct, fgumi writes
rejected records from worker threads in mutex-acquisition order, which is not guaranteed
to match input order under --threads > 1. Because of this, fgumi stamps the rejects BAM
header with SO:unsorted (and drops any GO/SS tags inherited from the input) so
downstream tools don’t assume the input’s sort order carried over.
fgbio’s equivalent tools copy the input header onto the rejects BAM unchanged, which can
leave a stale SO tag when more than one consensus-calling thread is used. If you were
relying on fgbio’s rejects header carrying the input’s sort order, sort the rejects BAM
explicitly after the fact.
Boolean Flag Values
fgumi boolean flags (e.g. --output-per-base-tags, --trim, --require-single-strand-agreement)
accept the following values: true/false, yes/no, y/n, t/f (case-insensitive).
fgbio uses standard true/false only.
Removed Options
The --sort-order flag has been removed from simplex and codec. Output sort order for
consensus reads is determined by the downstream pipeline step (zipper + sort), not by the
consensus caller itself.
What fgumi Does Not Replace
fgumi focuses on UMI-based tools. The following fgbio tools do not have fgumi equivalents:
- Non-UMI tools (e.g.,
TrimFastq,ErrorRateByReadPosition,EstimatePoolingFractions) - VCF tools (e.g.,
FilterSomaticVcf,HapTyper) - FASTQ/FASTA utilities (e.g.,
FastqToBam,HardMaskFasta)
Continue using fgbio for these tools.