snv-indels ========== The `snv-indels` module is responsible for aligning the reads to the reference, and calling SNVs and insertions/deletion. Tools ----- This module uses `STAR `_ to align the reads to the reference using twopass mode.`VarDict `_ is used to call variants, which are annotated using `VEP `_. For each variant, this module determines if it is located inside one of the defined `bed_variant_hotspots`. The variants annotated by VEP are then filtered based on a number of different criteria: 1. Variants that are present on the `blacklist` are excluded. 2. Only variants that are present on one of the specified transcripts in `ref_id_mapping` are included. 3. Only variants that match one of the consequences defined in `vep_include_consequence` are included. 4. Variant that have a population frequency of more than 1% in the `gnomADe` population are excluded. Picard is used to generate various alignment statistics. Input ----- The input for this module is a single pair of FastQ files per sample, specified in a PEP configuration file, as is shown below. .. csv-table:: Example input for the snv-indels module :delim: , :file: ../../test/pep/targetted.csv Output ------ The output of this module are a JSON file with an overview of the most important results, as well as a number of other output files: * A .bam and .bai per sample, which contain the aligned reads. * A VEP output file (`vep_high`), which contains the final set of filtered variants. * A VEP output file (`vep_target`), which contains the variants on the transcripts of interest. These variants have not been filtered on `vep_include_consequence` terms. * A VCF file that only contains those variants that fall in one of the `bed_variant_hotspots` regions. Configuration ------------- .. list-table:: Configuration options :header-rows: 1 * - Option - Description - Required * - forward_adapter - The forward adapter sequence - yes * - reverse_adapter - The reverse adapter sequence - yes * - genome_fasta - Reference genome, in FASTA format - yes * - genome_fai - .fai index file for the reference fasta - yes * - genome_dict - .dict index file for the reference fasta - yes * - star_index - STAR index database - yes * - ref_id_mapping - File of transcripts of interest - yes * - rrna_refflat - File of rRNA transcripts - yes * - bed_variant_hotspots - BED file of hotspot regions - yes * - bed_variant_call_regions - BED file of regions to call variants - yes * - gtf - GTF file with transcripts, used by STAR - yes * - annotation_refflat - File used to determine exon coverage - yes * - blacklist - File of blacklisted variants - yes * - vep_include_consequence - List of `VEP consequences _` to report - yes