Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

review

Category: POST-CONSENSUS

Extract data to review variant calls from consensus reads

Description

Extracts data to make reviewing of variant calls from consensus reads easier.

Creates a list of variant sites from the input VCF (SNPs only) or IntervalList then extracts all the consensus reads that do not contain a reference allele at the variant sites, and all raw reads that contributed to those consensus reads. This will include consensus reads that carry the alternate allele, a third allele, a no-call or a spanning deletion at the variant site.

Reads are correlated between consensus and grouped BAMs using a molecule ID stored in an optional attribute, MI by default. In order to support paired molecule IDs where two or more molecule IDs are related (e.g. see the Paired assignment strategy in group) the molecule ID is truncated at the last / if present (e.g. 1/A => 1 and 2 => 2).

Both input BAMs must be coordinate sorted and indexed.

Output Files

A pair of output BAMs are created:

  • <output>.consensus.bam: Contains the relevant consensus reads from the consensus BAM
  • <output>.grouped.bam: Contains the relevant raw reads from the grouped BAM

A review file <output>.txt is also created. The review file contains details on each variant position along with detailed information on each consensus read that supports the variant. If the --sample argument is supplied and the input is VCF, genotype information for that sample will be retrieved. If the sample name isn’t supplied and the VCF contains only a single sample then those genotypes will be used.

The --maf parameter controls which variants get detailed per-read information in the output file. Only variants with a minor allele frequency at or below this threshold will have detailed information written.

Arguments

FlagDescriptionDefault
-i, --input <INPUT>Input VCF or IntervalList of variant locationsrequired
-c, --consensus-bam <CONSENSUS_BAM>BAM file of consensus reads used to call variantsrequired
-g, --grouped-bam <GROUPED_BAM>BAM file of grouped raw reads used to build consensusesrequired
-r, --ref <REFERENCE>Reference FASTA filerequired
-o, --output <OUTPUT>Output prefix for generated filesrequired
-s, --sample <SAMPLE>Name of sample being reviewed (for VCF genotype extraction)
-N, --ignore-ns <IGNORE_NS>Ignore N bases in consensus readsfalse
-m, --maf <MAF>Only output detailed information for variants at or below this MAF0.05