review
Category: POST-CONSENSUS
Extract data to review variant calls from consensus reads
Description
Extracts data to make reviewing of variant calls from consensus reads easier.
Creates a list of variant sites from the input VCF (SNPs only) or IntervalList then extracts all the consensus reads that do not contain a reference allele at the variant sites, and all raw reads that contributed to those consensus reads. This will include consensus reads that carry the alternate allele, a third allele, a no-call or a spanning deletion at the variant site.
Reads are correlated between consensus and grouped BAMs using a molecule ID stored in an optional
attribute, MI by default. In order to support paired molecule IDs where two or more molecule IDs
are related (e.g. see the Paired assignment strategy in group) the molecule ID is truncated at
the last / if present (e.g. 1/A => 1 and 2 => 2).
Both input BAMs must be coordinate sorted and indexed.
Output Files
A pair of output BAMs are created:
- <output>.consensus.bam: Contains the relevant consensus reads from the consensus BAM
- <output>.grouped.bam: Contains the relevant raw reads from the grouped BAM
A review file <output>.txt is also created. The review file contains details on each variant
position along with detailed information on each consensus read that supports the variant. If the
--sample argument is supplied and the input is VCF, genotype information for that sample will be
retrieved. If the sample name isn’t supplied and the VCF contains only a single sample then those
genotypes will be used.
The --maf parameter controls which variants get detailed per-read information in the output file.
Only variants with a minor allele frequency at or below this threshold will have detailed information
written.
Arguments
| Flag | Description | Default |
|---|---|---|
-i, --input <INPUT> | Input VCF or IntervalList of variant locations | required |
-c, --consensus-bam <CONSENSUS_BAM> | BAM file of consensus reads used to call variants | required |
-g, --grouped-bam <GROUPED_BAM> | BAM file of grouped raw reads used to build consensuses | required |
-r, --ref <REFERENCE> | Reference FASTA file | required |
-o, --output <OUTPUT> | Output prefix for generated files | required |
-s, --sample <SAMPLE> | Name of sample being reviewed (for VCF genotype extraction) | |
-N, --ignore-ns <IGNORE_NS> | Ignore N bases in consensus reads | false |
-m, --maf <MAF> | Only output detailed information for variants at or below this MAF | 0.05 |