Usage

Input files

HAMLET requires two separate input files. Firstly, a json file that contains the settings and reference files for the pipeline, which can be generated with the utilities/create-config.py script.

Secondly, HAMLET requires a Portable Encapsulated Project configuration that specifies the samples and their associated gzipped, paired-end mRNA-seq files. For simple use cases, this can be a CSV file with one line per read-pair, as can be seen below.

Example sample specification for HAMLET

sample_name

R1

R2

TestSample1

test/data/fastq/R1.fq.gz

test/data/fastq/R2.fq.gz

TestSample2

test/data/fastq/R1.fq.gz

test/data/fastq/R2.fq.gz

TestSample2

test/data/fastq/SRR8615409 chrM_1.fastq.gz

test/data/fastq/SRR8615409 chrM_2.fastq.gz

TestSample3

test/data/fastq/R1.fq.gz

test/data/fastq/R2.fq.gz

TestSample3

test/data/fastq/SRR8615409 chrM_1.fastq.gz

test/data/fastq/SRR8615409 chrM_2.fastq.gz

TestSample3

test/data/fastq/SRR8615687_flt3_1.fastq.gz

test/data/fastq/SRR8615687_flt3_2.fastq.gz

Any number of samples can be processed in a single execution, and each sample may have any number of read pairs, and HAMLET will handle those properly.

Note that spaces in the file paths are supported, but not in sample names

Execution

To run the HAMLET pipeline, you need to supply the input files, as well as a Snakemake profile, which configures Snakemake to run the HAMLET pipeline. The example profile, located in cfg/config.v8+.yaml is shown below.

Snakemake profile

# Cluster configuration settings
executor: slurm
jobs: 1000
retries: 0
latency-wait: 120
max-jobs-per-second: 30

# Singularity settings
use-singularity: true
singularity-args: '--containall --cleanenv --bind /home,/tmp'
singularity-prefix: '~/.singularity/cache/snakemake'

# Other settings
printshellcmds: true
rerun-incomplete: true

# Resource requirements
default-resources:
  cpus_per_task: 1
  mem: 8G
  runtime: 1h

set-resources:
  qc_seq_cutadapt:
    cpus_per_task: 8
    runtime: 4h

  align_STAR:
    mem: 100G 
    cpus_per_task: 8
    runtime: 4h

  align_exon_cov:
    mem: 120G
    cpus_per_task: 1
    runtime: 4h

  align_vardict:
    mem: 120G
    cpus_per_task: 11
    runtime: 4h

  align_picard_metrics:
    mem: 8G
    runtime: 4h

  align_VEP:
    cpus_per_task: 8
    runtime: 4h

  fusion_arriba:
    mem: 80G
    cpus_per_task: 1

  itd_align_reads:
    cpus_per_task: 3
    runtime: 4h

  create_star_index:
    mem: 60G
    cpus_per_task: 8
    runtime: 4h

Please consult the Snakemake documentation for an explanation of all settings.

Make sure to modify the Singularity settings to your specific situation. In particular, the –bind directive determines which parts of the file system will be visible to HAMLET. In the example, only /home and /tmp will be visible. Make sure that the locations of HAMLET itself, the HAMLET-data as well as the samples are included here, or HAMLET will not be able to find the required files.

Since HAMLET includes many tools, the singularity cache will grow to multiple gigabytes. If you have limited space in your home folder, modify singularity-prefix to a location with more available space.

The resource requirements will depend on the characteristics of your samples. The example configuration is based on poly-A captured RNAseq, with up to 200 million reads per sample.

Running HAMLET

Since all settings can be set in the Snakemake profile, the actual command to run HAMLET is quite simple.

$ snakemake \
    --snakefile Snakefile \
    --profile cfg \
    --configfile config.json \
    --config pepfile=sample_sheet.csv

Output files

HAMLET will create a separate folder for every sample in the current directory. Files which are shared across samples will be created once in the current folder. You can run HAMLET from anywhere, but preferably this is done outside of the HAMLET folder. This way, the temporary Snakemake files are written elsewhere and does not pollute the repository.

Inside each sample directory, there will be a PDF report called hamlet_report.{sample_name}.pdf which contains the overview of the essential results. The same data is also present in the JSON file called {sample_name}.summary.json.

HAMLET will also run MultiQC and generate a single html output file which contains quality control metrics for every sample. This can be used to assess the quality of each individual sample and find outliers in your sample set.

Grouping results from multiple samples

If you analysed multiple samples using HAMLET, you can generate an overview of multiple samples using the utilities/hamlet_table.py script, rather than relying on individual PDF files. This script uses the {sample_name}.summary.json files which are generated as part of the default HAMLET output. Simply specify the results you are interested in to generate the apropriate table. It is also possible to generate all output tables in a single go:

python3 utilities/hamlet_table.py all \
--output tables \
/path/to/sample1/sample1.summary.json \
/path/to/sample2/sample2.summary.json etc