expression

The expression module is responsible for determining gene expression levels from STAR bam and count files. Although the strandedness of the library preparation is important when determining, the module itself is strand agnostic. Instead, we take inspiration from STAR and produce output files for unstranded, forward stranded and reverse stranded libraries, and leave it to the user to select the relevant output for their samples.

Tools

This module relies on the STAR count files in combination with a set of housekeeping genes to normalize gene expression levels.

Input

The input for this module is one BAM file and one STAR count table specified in a PEP configuration file, as is shown below.

Example input for the expression module

sample_name

bam

count

SRR8615409

test/data/expression/SRR8615409.bam

test/data/expression/SRR8615409.ReadsPerGene.out.tab

Output

  • Three files with the normalized gene expression levels, one for each strandedness.

  • A single MultiQC report which contains the same data.

Configuration

The following options are available for the expression module

Example

{
  "housekeeping": [
    "MT-CO2"
  ],
  "gtf": "test/data/reference/hamlet-ref.gtf",
  "bed": "test/data/reference/transcripts_chrM.bed"
}

Configuration options

Configuration options

Option

Description

Required

housekeeping

A list of genes to use for normalizing the expression

yes

gtf

A GTF file, to look up the ENSG for the housekeeping genes

yes

bed

A BED file with genomic regions (genes) to quantify

yes