snv-indels

The snv-indels module is responsible for aligning the reads to the reference, and calling SNVs and insertions/deletion.

Tools

This module uses STAR to align the reads to the reference using twopass mode.`VarDict <https://github.com/AstraZeneca-NGS/VarDictJava>`_ is used to call variants, which are annotated using VEP. For each variant, this module determines if it is located inside one of the defined bed_variant_hotspots.

The variants annotated by VEP are then filtered based on a number of different criteria:

Variants that are present on the blacklist are excluded.
Only variants that are present on one of the specified transcripts in ref_id_mapping are included.
Only variants that match one of the consequences defined in vep_include_consequence are included.
Variant that have a population frequency of more than 1% in the gnomADe population are excluded.

Picard is used to generate various alignment statistics.

Input

The input for this module is a single pair of FastQ files per sample, specified in a PEP configuration file, as is shown below.

Example input for the snv-indels module
sample_name	R1	R2
MO1-RNAseq-1-16714	test/data/fastq/NOMO1-RNAseq-1-16714_R1.fastq.gz	test/data/fastq/NOMO1-RNAseq-1-16714_R2.fastq.gz

Output

The output of this module are a JSON file with an overview of the most important results, as well as a number of other output files:

A .bam and .bai per sample, which contain the aligned reads.
A VEP output file (vep_high), which contains the final set of filtered variants.
A VEP output file (vep_target), which contains the variants on the transcripts of interest. These variants have not been filtered on vep_include_consequence terms.
A VCF file that only contains those variants that fall in one of the bed_variant_hotspots regions.

Configuration

Configuration options :header-rows: 1
Option	Description	Required
forward_adapter	The forward adapter sequence	yes
reverse_adapter	The reverse adapter sequence	yes
genome_fasta	Reference genome, in FASTA format	yes
genome_fai	.fai index file for the reference fasta	yes
genome_dict	.dict index file for the reference fasta	yes
star_index	STAR index database	yes
ref_id_mapping	File of transcripts of interest	yes
rrna_refflat	File of rRNA transcripts	yes
bed_variant_hotspots	BED file of hotspot regions	yes
bed_variant_call_regions	BED file of regions to call variants	yes
gtf	GTF file with transcripts, used by STAR	yes
annotation_refflat	File used to determine exon coverage	yes
blacklist	File of blacklisted variants	yes
vep_include_consequence	List of VEP consequences <http://www.ensembl.org/info/genome/variation/prediction/predicted_data.html>_ to report	yes