Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

downsample

Category: UTILITIES

Downsample BAM by UMI family using streaming

Description

Downsample a BAM file by UMI family using a single-pass streaming algorithm.

This tool reads a BAM file that has been processed by fgumi group (or fgbio GroupReadsByUmi) containing MI tags, uniformly samples UMI families, and outputs kept reads directly to a BAM file.

Requires input BAM to be in template-coordinate order:

  • SO:unsorted (or not set)
  • GO:query
  • SS:unsorted:template-coordinate or SS:template-coordinate

The tool processes families in streaming fashion by grouping consecutive reads with the same MI tag value. For each family, a random decision is made based on the fraction parameter to either keep or reject all reads in that family.

Example usage: fgumi downsample -i grouped.bam -o downsampled.bam -f 0.1 –seed 42 fgumi downsample -i grouped.bam -o kept.bam -f 0.5 –rejects rejected.bam fgumi downsample -i grouped.bam -o kept.bam -f 0.1 –histogram-kept kept_hist.txt

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input BAM filerequired
-o, --output <OUTPUT>Output BAM filerequired
--async-reader <ASYNC_READER>Enable async userspace prefetch on the input BAMfalse
-f, --fraction <FRACTION>Fraction of UMI families to keep (0.0 exclusive to 1.0 inclusive)required
--rejects <REJECTS>Optional output BAM file for rejected reads
--seed <SEED>Random seed for reproducibility
--validate-mi-order <VALIDATE_MI_ORDER>Validate that MI tags appear in consecutive groups (error if seen non-consecutively)false
--histogram-kept <HISTOGRAM_KEPT>Output file for kept family size histogram
--histogram-rejected <HISTOGRAM_REJECTED>Output file for rejected family size histogram
--compression-level <COMPRESSION_LEVEL>Compression level for output BAM (1-12)1