Usage
Input files
HAMLET requires two separate input files. Firstly, a json file that contains
the settings and reference files for the pipeline, which can be generated with
the utilities/create-config.py script.
Secondly, HAMLET requires a Portable Encapsulated
Project configuration that specifies the
samples and their associated gzipped, paired-end mRNA-seq files. For simple use
cases, this can be a CSV file with one line per read-pair, as can be seen below.
sample_name |
R1 |
R2 |
TestSample1 |
test/data/fastq/R1.fq.gz |
test/data/fastq/R2.fq.gz |
TestSample2 |
test/data/fastq/R1.fq.gz |
test/data/fastq/R2.fq.gz |
TestSample2 |
test/data/fastq/SRR8615409 chrM_1.fastq.gz |
test/data/fastq/SRR8615409 chrM_2.fastq.gz |
TestSample3 |
test/data/fastq/R1.fq.gz |
test/data/fastq/R2.fq.gz |
TestSample3 |
test/data/fastq/SRR8615409 chrM_1.fastq.gz |
test/data/fastq/SRR8615409 chrM_2.fastq.gz |
TestSample3 |
test/data/fastq/SRR8615687_flt3_1.fastq.gz |
test/data/fastq/SRR8615687_flt3_2.fastq.gz |
Any number of samples can be processed in a single execution, and each sample may have any number of read pairs, and HAMLET will handle those properly.
Note that spaces in the file paths are supported, but not in sample names
Execution
To run the HAMLET pipeline, you need to supply the input files, as well as a
Snakemake profile,
which configures Snakemake to run the HAMLET pipeline. The example profile,
located in cfg/config.v8+.yaml is shown below.
Snakemake profile
# Cluster configuration settings
executor: slurm
jobs: 1000
retries: 0
latency-wait: 120
max-jobs-per-second: 30
# Singularity settings
use-singularity: true
singularity-args: '--containall --cleanenv --bind /home,/tmp'
singularity-prefix: '~/.singularity/cache/snakemake'
# Other settings
printshellcmds: true
rerun-incomplete: true
# Resource requirements
default-resources:
cpus_per_task: 1
mem: 8G
runtime: 480 # Runtime in minutes
set-resources:
qc_seq_cutadapt:
cpus_per_task: 8
align_STAR:
mem: 100G
cpus_per_task: 8
runtime: 2880
align_exon_cov:
mem: 120G
cpus_per_task: 1
align_vardict:
mem: 120G
cpus_per_task: 11
runtime: 2880
align_VEP:
cpus_per_task: 8
fusion_arriba:
mem: 80G
cpus_per_task: 1
runtime: 60
itd_align_reads:
cpus_per_task: 3
runtime: 1440
create_star_index:
mem: 60G
cpus_per_task: 8
Please consult the Snakemake documentation for an explanation of all settings.
Make sure to modify the Singularity settings to your specific situation. In
particular, the –bind directive determines which parts of the file system
will be visible to HAMLET. In the example, only /home and /tmp will be
visible. Make sure that the locations of HAMLET itself, the HAMLET-data as well
as the samples are included here, or HAMLET will not be able to find the
required files.
Since HAMLET includes many tools, the singularity cache will grow to multiple
gigabytes. If you have limited space in your home folder, modify
singularity-prefix to a location with more available space.
The resource requirements will depend on the characteristics of your samples. The example configuration is based on poly-A captured RNAseq, with up to 200 million reads per sample.
Running HAMLET
Since all settings can be set in the Snakemake profile, the actual command to run HAMLET is quite simple.
$ snakemake \
--snakefile Snakefile \
--profile cfg \
--configfile config.json \
--config pepfile=sample_sheet.csv
Output files
HAMLET will create a separate folder for every sample in the current directory. Files which are shared across samples will be created once in the current folder. You can run HAMLET from anywhere, but preferably this is done outside of the HAMLET folder. This way, the temporary Snakemake files are written elsewhere and does not pollute the repository.
Inside each sample directory, there will be a PDF report called
hamlet_report.{sample_name}.pdf which contains the overview of the essential
results. The same data is also present in the JSON file called
{sample_name}.summary.json.
HAMLET will also run MultiQC and generate a single html output file which contains quality control metrics for every sample. This can be used to assess the quality of each individual sample and find outliers in your sample set.
Grouping results from multiple samples
If you analysed multiple samples using HAMLET, you can generate an overview of
multiple samples using the utilities/hamlet_table.py script, rather than
relying on individual PDF files. This script uses the
{sample_name}.summary.json files which are generated as part of the default
HAMLET output. Simply specify the results you are interested in to generate the apropriate table.
It is also possible to generate all output tables in a single go:
python3 utilities/hamlet_table.py all \
--output tables \
/path/to/sample1/sample1.summary.json \
/path/to/sample2/sample2.summary.json etc