Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

duplex

Category: CONSENSUS

Call duplex consensus sequences from UMI-grouped reads

Description

Calls duplex consensus sequences from reads generated from the same double-stranded source molecule. Prior to running this tool, reads must have been grouped with group using the paired strategy. Doing so will apply (by default) MI tags to all reads of the form */A and */B where the /A and /B suffixes with the same identifier denote reads that are derived from opposite strands of the same source duplex molecule.

Reads from the same unique molecule are first partitioned by source strand and assembled into single strand consensus molecules as described by the simplex command. Subsequently, for molecules that have at least one observation of each strand, duplex consensus reads are assembled by combining the evidence from the two single strand consensus reads.

Because of the nature of duplex sequencing, this tool does not support fragment reads - if found in the input they are ignored. Similarly, read pairs for which consensus reads cannot be generated for one or other read (R1 or R2) are omitted from the output.

The consensus reads produced are unaligned, due to the difficulty and error-prone nature of inferring the consensus alignment. Consensus reads should therefore be aligned after, which should not be too expensive as likely there are far fewer consensus reads than input raw reads.

Consensus reads have a number of additional optional tags set in the resulting BAM file. The tag names follow a pattern where the first letter (a, b or c) denotes that the tag applies to the first single strand consensus (a), second single-strand consensus (b) or the final duplex consensus (c). The second letter is intended to capture the meaning of the tag (e.g. d=depth, m=min depth, e=errors/error-rate) and is upper case for values that are one per read and lower case for values that are one per base.

The tags break down into those that are single-valued per read:

consensus depth [aD,bD,cD] (int) : the maximum depth of raw reads at any point in the consensus reads consensus min depth [aM,bM,cM] (int) : the minimum depth of raw reads at any point in the consensus reads consensus error rate [aE,bE,cE] (float): the fraction of bases in raw reads disagreeing with the final consensus calls

And those that have a value per base (duplex values are not generated, but can be generated by summing):

consensus depth [ad,bd] (short[]): the count of bases contributing to each single-strand consensus read at each position consensus errors [ae,be] (short[]): the count of bases from raw reads disagreeing with the final single-strand consensus base consensus bases [ac,bc] (string) : the single-strand consensus bases consensus quals [aq,bq] (string) : the single-strand consensus qualities

The per base depths and errors are both capped at 32,767. In all cases no-calls (Ns) and bases below the min-input-base-quality are not counted in tag value calculations.

The –min-reads option can take 1-3 values similar to filter. For example:

fgumi duplex … –min-reads 10,5,3

If fewer than three values are supplied, the last value is repeated (i.e. 5,4 -> 5 4 4 and 1 -> 1 1 1). The first value applies to the final consensus read, the second value to one single-strand consensus, and the last value to the other single-strand consensus. It is required that if values two and three differ, the more stringent value comes earlier.

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input BAM filerequired
-o, --output <OUTPUT>Output BAM filerequired
--async-reader <ASYNC_READER>Enable async userspace prefetch on the input BAMfalse
-r, --rejects <REJECTS>Optional output BAM file for rejected reads
-s, --stats <STATS>Optional output file for statistics
-p, --read-name-prefix <READ_NAME_PREFIX>Prefix for consensus read names
-R, --read-group-id <READ_GROUP_ID>Read group ID for consensus readsA
-1, --error-rate-pre-umi <ERROR_RATE_PRE_UMI>Phred-scaled error rate prior to UMI integration45
-2, --error-rate-post-umi <ERROR_RATE_POST_UMI>Phred-scaled error rate post UMI integration40
-m, --min-input-base-quality <MIN_INPUT_BASE_QUALITY>Minimum base quality in raw reads to use for consensus10
-B, --output-per-base-tags <OUTPUT_PER_BASE_TAGS>Produce per-base tags (cd, ce) in addition to per-read tagstrue
--trim <TRIM>Quality-trim reads before consensus calling (removes low-quality bases from ends)false
--min-consensus-base-quality <MIN_CONSENSUS_BASE_QUALITY>Minimum consensus base quality (output consensus bases below this are masked to N)2
--consensus-call-overlapping-bases <CONSENSUS_CALL_OVERLAPPING_BASES>Consensus call overlapping bases in read pairs before UMI consensus callingtrue
--threads <THREADS>Number of threads for the multi-threaded pipeline
--compression-level <COMPRESSION_LEVEL>Compression level for output BAM (0-12)1
-M, --min-reads <MIN_READS>Minimum reads for consensus calling. Can specify 1-3 values: [duplex] or [duplex, AB/BA] or [duplex, AB, BA]1
--max-reads-per-strand <MAX_READS_PER_STRAND>Maximum reads per strand (downsample if exceeded)
--scheduler <SCHEDULER>Scheduler strategy for thread work assignmentbalanced-chase-drain
--pipeline-stats <PIPELINE_STATS>Print detailed pipeline statistics at completionfalse
--deadlock-timeout <DEADLOCK_TIMEOUT>Timeout in seconds for deadlock detection (default: 10, 0 = disabled)10
--deadlock-recover <DEADLOCK_RECOVER>Enable automatic deadlock recovery (default: false, detection only)false
--queue-memory <QUEUE_MEMORY>Pipeline queue memory limit per thread (default) or total768
--queue-memory-per-thread <QUEUE_MEMORY_PER_THREAD>Interpret –queue-memory as per-thread (true, default) or total (false)true
--queue-memory-limit-mb <QUEUE_MEMORY_LIMIT_MB>DEPRECATED: Use –queue-memory instead. Memory limit for pipeline queues in megabytes
--methylation-mode <METHYLATION_MODE>Methylation-aware consensus calling mode. EM-Seq: C→T at ref-C = unmethylated (enzymatic conversion); TAPs: C→T at ref-C = methylated. Emits MM/ML methylation tags and cu/ct per-base count tags on consensus reads. Requires –ref
--ref <REFERENCE>Path to the reference FASTA file (required when –methylation-mode is set)