clip
Category: POST-CONSENSUS
Clip overlapping reads in BAM files
Description
Clips reads from the same template. Ensures that at least N bases are clipped from any end of the read (i.e. R1 5’ end, R1 3’ end, R2 5’ end, and R2 3’ end). Optionally clips reads from the same template to eliminate overlap between the reads. This ensures that downstream processes, particularly variant calling, cannot double-count evidence from the same template when both reads span a variant site in the same template.
Clipping overlapping reads is only performed on FR read pairs, and is implemented by clipping approximately half the overlapping bases from each read. By default soft clipping is performed.
Secondary alignments and supplemental alignments are not clipped, but are passed through into the output.
In order to correctly clip reads by template and update mate information, the input BAM must be either queryname sorted or query grouped. If your input BAM is not in an appropriate order the sort can be done in streaming fashion with, for example:
fgumi sort -i in.bam –order queryname | fgumi clip -i /dev/stdin …
The output sort order may be specified with –sort-order. If not given, then the output will be in the same order as input.
Any existing NM, UQ and MD tags are repaired, and mate-pair information is updated.
Three clipping modes are supported:
soft- soft-clip the bases and qualities.soft-with-mask- soft-clip and mask the bases and qualities (make bases Ns and qualities the minimum).hard- hard-clip the bases and qualities.
The –upgrade-clipping parameter will convert all existing clipping in the input to the given more stringent mode:
from soft to either soft-with-mask or hard, and soft-with-mask to hard. In all other cases, clipping remains
the same prior to applying any other clipping criteria.
Arguments
| Flag | Description | Default |
|---|---|---|
-i, --input <INPUT> | Input BAM file | required |
-o, --output <OUTPUT> | Output BAM file | required |
--async-reader <ASYNC_READER> | Enable async userspace prefetch on the input BAM | false |
-r, --reference <REFERENCE> | Reference FASTA file (required for tag regeneration) | required |
-c, --clipping-mode <CLIPPING_MODE> | Clipping mode: soft, soft-with-mask, or hard | hard |
-S, --sort-order <SORT_ORDER> | Output sort order (if not specified, output is in same order as input) | |
--clip-overlapping-reads <CLIP_OVERLAPPING_READS> | Clip overlapping read pairs | false |
--clip-bases-past-mate <CLIP_EXTENDING_PAST_MATE> | Clip reads that extend past their mate’s start position | false |
--read-one-five-prime <READ_ONE_FIVE_PRIME> | Minimum bases to clip from 5’ end of R1 | 0 |
--read-one-three-prime <READ_ONE_THREE_PRIME> | Minimum bases to clip from 3’ end of R1 | 0 |
--read-two-five-prime <READ_TWO_FIVE_PRIME> | Minimum bases to clip from 5’ end of R2 | 0 |
--read-two-three-prime <READ_TWO_THREE_PRIME> | Minimum bases to clip from 3’ end of R2 | 0 |
-H, --upgrade-clipping <UPGRADE_CLIPPING> | Upgrade existing clipping to the specified clipping mode | false |
-a, --auto-clip-attributes <AUTO_CLIP_ATTRIBUTES> | Automatically clip extended attributes that match read length | false |
-m, --metrics <METRICS> | Output file for clipping metrics | |
--threads <THREADS> | Number of threads for the multi-threaded pipeline | |
--compression-level <COMPRESSION_LEVEL> | Compression level for output BAM (1-12) | 1 |
--scheduler <SCHEDULER> | Scheduler strategy for thread work assignment | balanced-chase-drain |
--pipeline-stats <PIPELINE_STATS> | Print detailed pipeline statistics at completion | false |
--deadlock-timeout <DEADLOCK_TIMEOUT> | Timeout in seconds for deadlock detection (default: 10, 0 = disabled) | 10 |
--deadlock-recover <DEADLOCK_RECOVER> | Enable automatic deadlock recovery (default: false, detection only) | false |
--queue-memory <QUEUE_MEMORY> | Pipeline queue memory limit per thread (default) or total | 768 |
--queue-memory-per-thread <QUEUE_MEMORY_PER_THREAD> | Interpret –queue-memory as per-thread (true, default) or total (false) | true |
--queue-memory-limit-mb <QUEUE_MEMORY_LIMIT_MB> | DEPRECATED: Use –queue-memory instead. Memory limit for pipeline queues in megabytes |