Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

clip

Category: POST-CONSENSUS

Clip overlapping reads in BAM files

Description

Clips reads from the same template. Ensures that at least N bases are clipped from any end of the read (i.e. R1 5’ end, R1 3’ end, R2 5’ end, and R2 3’ end). Optionally clips reads from the same template to eliminate overlap between the reads. This ensures that downstream processes, particularly variant calling, cannot double-count evidence from the same template when both reads span a variant site in the same template.

Clipping overlapping reads is only performed on FR read pairs, and is implemented by clipping approximately half the overlapping bases from each read. By default soft clipping is performed.

Secondary alignments and supplemental alignments are not clipped, but are passed through into the output.

In order to correctly clip reads by template and update mate information, the input BAM must be either queryname sorted or query grouped. If your input BAM is not in an appropriate order the sort can be done in streaming fashion with, for example:

fgumi sort -i in.bam –order queryname | fgumi clip -i /dev/stdin …

The output sort order may be specified with –sort-order. If not given, then the output will be in the same order as input.

Any existing NM, UQ and MD tags are repaired, and mate-pair information is updated.

Three clipping modes are supported:

  1. soft - soft-clip the bases and qualities.
  2. soft-with-mask - soft-clip and mask the bases and qualities (make bases Ns and qualities the minimum).
  3. hard - hard-clip the bases and qualities.

The –upgrade-clipping parameter will convert all existing clipping in the input to the given more stringent mode: from soft to either soft-with-mask or hard, and soft-with-mask to hard. In all other cases, clipping remains the same prior to applying any other clipping criteria.

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input BAM filerequired
-o, --output <OUTPUT>Output BAM filerequired
--async-reader <ASYNC_READER>Enable async userspace prefetch on the input BAMfalse
-r, --reference <REFERENCE>Reference FASTA file (required for tag regeneration)required
-c, --clipping-mode <CLIPPING_MODE>Clipping mode: soft, soft-with-mask, or hardhard
-S, --sort-order <SORT_ORDER>Output sort order (if not specified, output is in same order as input)
--clip-overlapping-reads <CLIP_OVERLAPPING_READS>Clip overlapping read pairsfalse
--clip-bases-past-mate <CLIP_EXTENDING_PAST_MATE>Clip reads that extend past their mate’s start positionfalse
--read-one-five-prime <READ_ONE_FIVE_PRIME>Minimum bases to clip from 5’ end of R10
--read-one-three-prime <READ_ONE_THREE_PRIME>Minimum bases to clip from 3’ end of R10
--read-two-five-prime <READ_TWO_FIVE_PRIME>Minimum bases to clip from 5’ end of R20
--read-two-three-prime <READ_TWO_THREE_PRIME>Minimum bases to clip from 3’ end of R20
-H, --upgrade-clipping <UPGRADE_CLIPPING>Upgrade existing clipping to the specified clipping modefalse
-a, --auto-clip-attributes <AUTO_CLIP_ATTRIBUTES>Automatically clip extended attributes that match read lengthfalse
-m, --metrics <METRICS>Output file for clipping metrics
--threads <THREADS>Number of threads for the multi-threaded pipeline
--compression-level <COMPRESSION_LEVEL>Compression level for output BAM (1-12)1
--scheduler <SCHEDULER>Scheduler strategy for thread work assignmentbalanced-chase-drain
--pipeline-stats <PIPELINE_STATS>Print detailed pipeline statistics at completionfalse
--deadlock-timeout <DEADLOCK_TIMEOUT>Timeout in seconds for deadlock detection (default: 10, 0 = disabled)10
--deadlock-recover <DEADLOCK_RECOVER>Enable automatic deadlock recovery (default: false, detection only)false
--queue-memory <QUEUE_MEMORY>Pipeline queue memory limit per thread (default) or total768
--queue-memory-per-thread <QUEUE_MEMORY_PER_THREAD>Interpret –queue-memory as per-thread (true, default) or total (false)true
--queue-memory-limit-mb <QUEUE_MEMORY_LIMIT_MB>DEPRECATED: Use –queue-memory instead. Memory limit for pipeline queues in megabytes