duplex
Category: CONSENSUS
Call duplex consensus sequences from UMI-grouped reads
Description
Calls duplex consensus sequences from reads generated from the same double-stranded source molecule. Prior
to running this tool, reads must have been grouped with group using the paired strategy. Doing
so will apply (by default) MI tags to all reads of the form */A and */B where the /A and /B suffixes
with the same identifier denote reads that are derived from opposite strands of the same source duplex molecule.
Reads from the same unique molecule are first partitioned by source strand and assembled into single strand consensus molecules as described by the simplex command. Subsequently, for molecules that have at least one observation of each strand, duplex consensus reads are assembled by combining the evidence from the two single strand consensus reads.
Because of the nature of duplex sequencing, this tool does not support fragment reads - if found in the input they are ignored. Similarly, read pairs for which consensus reads cannot be generated for one or other read (R1 or R2) are omitted from the output.
The consensus reads produced are unaligned, due to the difficulty and error-prone nature of inferring the consensus alignment. Consensus reads should therefore be aligned after, which should not be too expensive as likely there are far fewer consensus reads than input raw reads.
Consensus reads have a number of additional optional tags set in the resulting BAM file. The tag names follow a pattern where the first letter (a, b or c) denotes that the tag applies to the first single strand consensus (a), second single-strand consensus (b) or the final duplex consensus (c). The second letter is intended to capture the meaning of the tag (e.g. d=depth, m=min depth, e=errors/error-rate) and is upper case for values that are one per read and lower case for values that are one per base.
The tags break down into those that are single-valued per read:
consensus depth [aD,bD,cD] (int) : the maximum depth of raw reads at any point in the consensus reads consensus min depth [aM,bM,cM] (int) : the minimum depth of raw reads at any point in the consensus reads consensus error rate [aE,bE,cE] (float): the fraction of bases in raw reads disagreeing with the final consensus calls
And those that have a value per base (duplex values are not generated, but can be generated by summing):
consensus depth [ad,bd] (short[]): the count of bases contributing to each single-strand consensus read at each position consensus errors [ae,be] (short[]): the count of bases from raw reads disagreeing with the final single-strand consensus base consensus bases [ac,bc] (string) : the single-strand consensus bases consensus quals [aq,bq] (string) : the single-strand consensus qualities
The per base depths and errors are both capped at 32,767. In all cases no-calls (Ns) and bases below the min-input-base-quality are not counted in tag value calculations.
The –min-reads option can take 1-3 values similar to filter. For example:
fgumi duplex … –min-reads 10,5,3
If fewer than three values are supplied, the last value is repeated (i.e. 5,4 -> 5 4 4 and 1 -> 1 1 1). The
first value applies to the final consensus read, the second value to one single-strand consensus, and the last
value to the other single-strand consensus. It is required that if values two and three differ,
the more stringent value comes earlier.
Arguments
| Flag | Description | Default |
|---|---|---|
-i, --input <INPUT> | Input BAM file | required |
-o, --output <OUTPUT> | Output BAM file | required |
--async-reader <ASYNC_READER> | Enable async userspace prefetch on the input BAM | false |
-r, --rejects <REJECTS> | Optional output BAM file for rejected reads | |
-s, --stats <STATS> | Optional output file for statistics | |
-p, --read-name-prefix <READ_NAME_PREFIX> | Prefix for consensus read names | |
-R, --read-group-id <READ_GROUP_ID> | Read group ID for consensus reads | A |
-1, --error-rate-pre-umi <ERROR_RATE_PRE_UMI> | Phred-scaled error rate prior to UMI integration | 45 |
-2, --error-rate-post-umi <ERROR_RATE_POST_UMI> | Phred-scaled error rate post UMI integration | 40 |
-m, --min-input-base-quality <MIN_INPUT_BASE_QUALITY> | Minimum base quality in raw reads to use for consensus | 10 |
-B, --output-per-base-tags <OUTPUT_PER_BASE_TAGS> | Produce per-base tags (cd, ce) in addition to per-read tags | true |
--trim <TRIM> | Quality-trim reads before consensus calling (removes low-quality bases from ends) | false |
--min-consensus-base-quality <MIN_CONSENSUS_BASE_QUALITY> | Minimum consensus base quality (output consensus bases below this are masked to N) | 2 |
--consensus-call-overlapping-bases <CONSENSUS_CALL_OVERLAPPING_BASES> | Consensus call overlapping bases in read pairs before UMI consensus calling | true |
--threads <THREADS> | Number of threads for the multi-threaded pipeline | |
--compression-level <COMPRESSION_LEVEL> | Compression level for output BAM (0-12) | 1 |
-M, --min-reads <MIN_READS> | Minimum reads for consensus calling. Can specify 1-3 values: [duplex] or [duplex, AB/BA] or [duplex, AB, BA] | 1 |
--max-reads-per-strand <MAX_READS_PER_STRAND> | Maximum reads per strand (downsample if exceeded) | |
--scheduler <SCHEDULER> | Scheduler strategy for thread work assignment | balanced-chase-drain |
--pipeline-stats <PIPELINE_STATS> | Print detailed pipeline statistics at completion | false |
--deadlock-timeout <DEADLOCK_TIMEOUT> | Timeout in seconds for deadlock detection (default: 10, 0 = disabled) | 10 |
--deadlock-recover <DEADLOCK_RECOVER> | Enable automatic deadlock recovery (default: false, detection only) | false |
--queue-memory <QUEUE_MEMORY> | Pipeline queue memory limit per thread (default) or total | 768 |
--queue-memory-per-thread <QUEUE_MEMORY_PER_THREAD> | Interpret –queue-memory as per-thread (true, default) or total (false) | true |
--queue-memory-limit-mb <QUEUE_MEMORY_LIMIT_MB> | DEPRECATED: Use –queue-memory instead. Memory limit for pipeline queues in megabytes | |
--methylation-mode <METHYLATION_MODE> | Methylation-aware consensus calling mode. EM-Seq: C→T at ref-C = unmethylated (enzymatic conversion); TAPs: C→T at ref-C = methylated. Emits MM/ML methylation tags and cu/ct per-base count tags on consensus reads. Requires –ref | |
--ref <REFERENCE> | Path to the reference FASTA file (required when –methylation-mode is set) |