Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Migration from fgbio

fgumi is the Rust successor to fgbio for UMI-based tools. This guide maps fgbio tools to their fgumi equivalents and highlights key differences.

Command Mapping

fgbio Toolfgumi CommandNotes
ExtractUmisFromBamextractExtracts directly from FASTQ (not BAM)
CorrectUmiscorrect
ZipperBamszipperAlso replaces picard MergeBamAlignment; accepts SAM or BAM input
SortBamsortAdds template-coordinate sort order with optional cell barcode key
GroupReadsByUmigroupSame strategies: identity, edit, adjacency, paired
CallMolecularConsensusReadssimplex
CallDuplexConsensusReadsduplex
CallCodecConsensusReadscodec
FilterConsensusReadsfilter
ClipBamclip
CollectDuplexSeqMetricsduplex-metrics
(no equivalent)simplex-metricsNew: simplex QC metrics (yield, family sizes, UMI counts)
(samtools merge)mergek-way merge of pre-sorted BAMs; supports all sort orders
ReviewConsensusVariantsreview

Key Differences

Input Format

fgbio’s ExtractUmisFromBam takes an unmapped BAM as input. fgumi’s extract takes FASTQ files directly, which is more common in practice and avoids an unnecessary BAM conversion step.

Streaming Pipeline

fgumi supports Unix pipe-based streaming for the alignment workflow:

fgumi fastq --input unaligned.bam \
  | bwa mem -p -K 150000000 -Y ref.fa - \
  | fgumi zipper --unmapped unaligned.bam \
  | fgumi sort --output sorted.bam --order template-coordinate

This replaces multiple separate fgbio/picard steps (SortBam, ZipperBams/MergeBamAlignment) with a single streaming pass. fgumi zipper accepts SAM piped from the aligner (preferred) or a BAM file via --input.

Merging Multiple BAMs

fgbio users who relied on samtools merge to combine per-lane BAMs before grouping should use fgumi merge instead. It performs an equivalent k-way merge and correctly handles template-coordinate order with cell barcodes:

# fgbio/samtools workflow
samtools merge -n merged.bam lane1.bam lane2.bam lane3.bam

# fgumi equivalent (also supports template-coordinate and queryname sort orders)
fgumi merge --order template-coordinate --output merged.bam \
  lane1.bam lane2.bam lane3.bam

Simplex QC Metrics

fgbio has no equivalent to fgumi simplex-metrics. This command provides yield curves, family size distributions, and UMI frequency statistics specifically for simplex sequencing experiments, analogous to what duplex-metrics provides for duplex experiments.

Threading Model

fgumi uses a multi-threaded pipeline architecture where reading, processing, and writing happen concurrently. Most commands accept --threads to control parallelism. See Performance Tuning for details.

Grouping Strategies

fgumi supports the same four UMI assignment strategies as fgbio:

  • identity — exact UMI matching only
  • edit — edit-distance clustering
  • adjacency — directional adjacency (recommended for most use cases)
  • paired — paired adjacency for duplex workflows

The algorithms are equivalent but fgumi’s implementations are optimized for throughput.

Group Metrics

fgumi’s group command now produces a third metrics file beyond family sizes and grouping metrics: position_group_sizes.txt, a histogram of how many UMI families appear at each genomic position. This has no fgbio equivalent but is useful for detecting UMI exhaustion or abnormal duplication patterns.

Use the --metrics PREFIX flag to write all three files in one step.

Metrics Compatibility

fgumi’s simplex and duplex stats output uses the same three-column key-value format as fgbio’s CallMolecularConsensusReads, allowing direct comparison with fgumi compare metrics.

Sort Orders

fgumi’s sort command supports the same sort orders as fgbio:

  • coordinate — standard genomic coordinate sort
  • queryname — sort by read name
  • template-coordinate — sort by template 5’ positions (required input for group)

For single-cell data, fgumi sort --order template-coordinate automatically includes the CB cell barcode tag in the sort key so that templates from different cells at the same locus are not interleaved. fgbio’s template-coordinate sort does not support this.

Rejects BAM Sort Order

When --rejects is enabled on simplex, duplex, codec, or correct, fgumi writes rejected records from worker threads in mutex-acquisition order, which is not guaranteed to match input order under --threads > 1. Because of this, fgumi stamps the rejects BAM header with SO:unsorted (and drops any GO/SS tags inherited from the input) so downstream tools don’t assume the input’s sort order carried over.

fgbio’s equivalent tools copy the input header onto the rejects BAM unchanged, which can leave a stale SO tag when more than one consensus-calling thread is used. If you were relying on fgbio’s rejects header carrying the input’s sort order, sort the rejects BAM explicitly after the fact.

Boolean Flag Values

fgumi boolean flags (e.g. --output-per-base-tags, --trim, --require-single-strand-agreement) accept the following values: true/false, yes/no, y/n, t/f (case-insensitive). fgbio uses standard true/false only.

Removed Options

The --sort-order flag has been removed from simplex and codec. Output sort order for consensus reads is determined by the downstream pipeline step (zipper + sort), not by the consensus caller itself.

What fgumi Does Not Replace

fgumi focuses on UMI-based tools. The following fgbio tools do not have fgumi equivalents:

  • Non-UMI tools (e.g., TrimFastq, ErrorRateByReadPosition, EstimatePoolingFractions)
  • VCF tools (e.g., FilterSomaticVcf, HapTyper)
  • FASTQ/FASTA utilities (e.g., FastqToBam, HardMaskFasta)

Continue using fgbio for these tools.