snv-indels
The snv-indels module is responsible for aligning the reads to the reference, and calling SNVs and insertions/deletion.
Tools
This module uses STAR to align the reads to the reference using twopass mode.`VarDict <https://github.com/AstraZeneca-NGS/VarDictJava>`_ is used to call variants, which are annotated using VEP. For each variant, this module determines if it is located inside one of the defined bed_variant_hotspots.
The variants annotated by VEP are then filtered based on a number of different criteria:
Variants that are present on the blacklist are excluded.
Only variants that are present on one of the specified transcripts in ref_id_mapping are included.
Only variants that match one of the consequences defined in vep_include_consequence are included.
Variant that have a population frequency of more than 1% in the gnomADe population are excluded.
Picard is used to generate various alignment statistics.
Input
The input for this module is a single pair of FastQ files per sample, specified in a PEP configuration file, as is shown below.
sample_name |
R1 |
R2 |
MO1-RNAseq-1-16714 |
test/data/fastq/NOMO1-RNAseq-1-16714_R1.fastq.gz |
test/data/fastq/NOMO1-RNAseq-1-16714_R2.fastq.gz |
Output
The output of this module are a JSON file with an overview of the most important results, as well as a number of other output files:
A .bam and .bai per sample, which contain the aligned reads.
A VEP output file (vep_high), which contains the final set of filtered variants.
A VEP output file (vep_target), which contains the variants on the transcripts of interest. These variants have not been filtered on vep_include_consequence terms.
A VCF file that only contains those variants that fall in one of the bed_variant_hotspots regions.
Configuration
Option |
Description |
Required |
forward_adapter |
The forward adapter sequence |
yes |
reverse_adapter |
The reverse adapter sequence |
yes |
genome_fasta |
Reference genome, in FASTA format |
yes |
genome_fai |
.fai index file for the reference fasta |
yes |
genome_dict |
.dict index file for the reference fasta |
yes |
star_index |
STAR index database |
yes |
ref_id_mapping |
File of transcripts of interest |
yes |
rrna_refflat |
File of rRNA transcripts |
yes |
bed_variant_hotspots |
BED file of hotspot regions |
yes |
bed_variant_call_regions |
BED file of regions to call variants |
yes |
gtf |
GTF file with transcripts, used by STAR |
yes |
annotation_refflat |
File used to determine exon coverage |
yes |
blacklist |
File of blacklisted variants |
yes |
vep_include_consequence |
List of VEP consequences <http://www.ensembl.org/info/genome/variation/prediction/predicted_data.html>_ to report |
yes |