Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

simplex-metrics

Category: POST-CONSENSUS

Collect QC metrics for simplex sequencing data

Description

Collects a suite of metrics to QC simplex sequencing data.

Inputs

The input to this tool must be a BAM file that is either:

  1. The exact BAM output by the group tool (in the sort-order it was produced in)
  2. A BAM file that has MI tags present on all reads (usually set by group and has been sorted into template-coordinate order

Calculation of metrics may be restricted to a set of regions using the --intervals parameter. This can significantly affect results as off-target reads often have very different properties than on-target reads due to the lack of enrichment.

Outputs

The following output files are produced:

  1. <output>.family_sizes.txt: metrics on the frequency of CS and SS families of different sizes
  2. <output>.simplex_yield_metrics.txt: summary QC metrics produced using 5%, 10%, 15%…100% of the data
  3. <output>.umi_counts.txt: metrics on the frequency of observations of UMIs within reads and tag families
  4. <output>.simplex_qc.pdf: (optional) a series of plots generated from the preceding metrics files for visualization. This file is only produced if R is available with the required packages (ggplot2 and scales). Use --description to customize plot titles.

Within the metrics files the prefixes CS and SS are used to mean:

  • CS: tag families where membership is defined solely on matching genome coordinates and strand
  • SS: single-stranded tag families where membership is defined by genome coordinates, strand and UMI

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input BAM file (UMI-grouped, from group)required
-o, --output <OUTPUT>Output prefix for metrics filesrequired
--min-reads <MIN_READS>Minimum reads per SS family to count as a consensus family in yield metrics1
-l, --intervals <INTERVALS>Optional intervals file to restrict analysis (BED or Picard interval list format)
--description <DESCRIPTION>Optional sample name or description for PDF plot titles