Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

zipper

Category: ALIGNMENT

Zip unmapped BAM with aligned BAM

Description

Merges unmapped and mapped BAM files, transferring tags and metadata.

Takes an unmapped BAM (typically from FASTQ) and a mapped BAM (after alignment) and merges them, copying tags from the unmapped to mapped reads. Both BAMs must be queryname sorted or grouped, and have the same read name ordering.

The tool transfers tags from the unmapped reads to their corresponding mapped reads. For reads mapped to the negative strand, tags can be optionally reversed or reverse-complemented. All QC pass/fail flags are also transferred from the unmapped to mapped reads.

Tag Manipulation

You can specify which tags to manipulate for reads mapped to the negative strand:

  • –tags-to-reverse: Reverses array and string tags (e.g., [1,2,3] becomes [3,2,1])
  • –tags-to-revcomp: Reverse complements sequence tags (e.g., AGAGG becomes CCTCT)

Named tag sets like “Consensus” are automatically expanded to their constituent tags:

  • Consensus: aD bD cD aM bM cM aE bE cE ad bd cd ae be ce ac bc

Default Behavior

By default, input is read from stdin and output is written to stdout, allowing for streaming workflows like:

Recommended when the aligner can emit uncompressed BAM:

bwa-mem3 mem –bam=0 -t 16 -p -K 150000000 -Y ref.fa reads.fq | fgumi zipper -u unmapped.bam -r ref.fa | fgumi sort -i /dev/stdin -o output.bam –order template-coordinate

SAM-only aligners (e.g. classic bwa mem, bwa-mem2):

bwa mem -t 16 -p -K 150000000 -Y ref.fa reads.fq | fgumi zipper -u unmapped.bam -r ref.fa | fgumi sort -i /dev/stdin -o output.bam –order template-coordinate

Uncompressed BAM avoids the SAM text formatting/parsing round-trip in both processes and adds only ~26 bytes of BGZF framing per ~64 KiB block. Compressed BAM on a pipe is not recommended — it burns CPU on the writer and reader for data the sort step will re-compress anyway.

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input mapped SAM or BAM file (or - for stdin; SAM or BAM is auto-detected). For streaming pipelines, uncompressed BAM (e.g. bwa-mem3 mem --bam=0) is the fastest option — it skips both SAM text formatting on the aligner side and SAM parsing on this side. SAM is fine if your aligner can’t emit BAM. Compressed BAM on a pipe wastes CPU on both ends-
-u, --unmapped <UNMAPPED>Input unmapped BAM file containing original tagsrequired
-r, --reference <REFERENCE>Reference FASTA file (must have accompanying .dict file)required
-o, --output <OUTPUT>Output BAM file (or - for stdout)-
--tags-to-remove <TAGS_TO_REMOVE>Tags to remove from mapped reads before copying unmapped tags
--tags-to-reverse <TAGS_TO_REVERSE>Tags to reverse for reads mapped to negative strand
--tags-to-revcomp <TAGS_TO_REVCOMP>Tags to reverse complement for reads mapped to negative strand
-b, --buffer <BUFFER>Buffer size for template channel (default: 50000)50000
-t, --threads <THREADS>Number of threads to use for processing (default: 1, single-threaded)1
--compression-level <COMPRESSION_LEVEL>Compression level for output BAM (0-12)1
-K, --bwa-chunk-size <BWA_CHUNK_SIZE>BWA -K parameter value (bases per batch). Used to optimize buffer sizing for stdin input. The buffer grows adaptively based on observed bytes per batch. Default matches common bwa mem usage150000000
--exclude-missing-reads <EXCLUDE_MISSING_READS>Exclude reads from the unmapped BAM that are not present in the aligned BAM. Useful when reads were intentionally removed (e.g., by adapter trimming) prior to alignmentfalse
--skip-pa-tags <SKIP_TC_TAGS>Skip adding pa (primary alignment) tags to secondary/supplementary reads. By default, zipper adds a pa tag containing the primary alignment’s template sort key coordinates, which enables correct template-coordinate sorting and deduplication of these reads. Use this flag if you don’t need this functionalityfalse
--restore-unconverted-bases <RESTORE_UNCONVERTED_BASES>Restore unconverted bases in EM-seq consensus reads after bwameth re-alignmentfalse