***** Usage ***** Input files =========== HAMLET requires two separate input files. Firstly, a ``json`` file that contains the settings and reference files for the pipeline, which can be generated with the ``utilities/create-config.py`` script. Secondly, HAMLET requires a `Portable Encapsulated Project `_ configuration that specifies the samples and their associated gzipped, paired-end mRNA-seq files. For simple use cases, this can be a ``CSV`` file with one line per read-pair, as can be seen below. .. csv-table:: Example sample specification for HAMLET :delim: , :file: ../../test/pep/chrM-trio-subsamples.csv Any number of samples can be processed in a single execution, and each sample may have any number of read pairs, and HAMLET will handle those properly. **Note that spaces in the file paths are supported, but not in sample names** Execution ========= To run the HAMLET pipeline, you need to supply the input files, as well as a `Snakemake profile `_, which configures Snakemake to run the HAMLET pipeline. The example profile, located in ``HAMLET/cfg/config.v8+.yaml`` is shown below. Snakemake profile ----------------- .. literalinclude:: ../../cfg/config.v8+.yaml :language: yaml Please consult the `Snakemake documentation `_ for an explanation of all settings. Make sure to modify the Singularity settings to your specific situation. In particular, the `--bind` directive determines which parts of the file system will be visible to HAMLET. In the example, only ``/home`` and ``/tmp`` will be visible. Make sure that the locations of HAMLET itself, the HAMLET-data as well as the samples are included here, or HAMLET will not be able to find the required files. Since HAMLET includes many tools, the singularity cache will grow to multiple gigabytes. If you have limited space in your home folder, modify ``singularity-prefix`` to a location with more available space. The resource requirements will depend on the characteristics of your samples. The example configuration is based on poly-A captured RNAseq, with up to 200 million reads per sample. Running HAMLET -------------- Since all settings can be set in the Snakemake profile, the actual command to run HAMLET is quite simple. .. code:: bash $ snakemake \ --snakefile HAMLET/Snakefile \ --profile HAMLET/cfg \ --configfile config.json \ --config pepfile=sample_sheet.csv Output files ============ HAMLET will create a separate folder for every sample in the current directory. Files which are shared across samples will be created once in the current folder. You can run HAMLET from anywhere, but preferably this is done outside of the HAMLET folder. This way, the temporary Snakemake files are written elsewhere and does not pollute the repository. Inside each sample directory, there will be a PDF report called ``hamlet_report.{sample_name}.pdf`` which contains the overview of the essential results. The same data is also present in the JSON file called ``{sample_name}.summary.json``. HAMLET will also run `MultiQC `_ and generate a single html output file which contains quality control metrics for every sample. This can be used to assess the quality of each individual sample and find outliers in your sample set. Grouping results from multiple samples -------------------------------------- If you analysed multiple samples using HAMLET, you can generate an overview of multiple samples using the ``utilities/hamlet_table.py`` script, rather than relying on individual PDF files. This script uses the ``{sample_name}.summary.json`` files which are generated as part of the default HAMLET output. Simply specify the results you are interested in (``variant``, ``fusion`` or ``itd``) as shown below. .. code:: bash usage: hamlet_table.py [-h] [--itd-gene ITD_GENE] {variant,fusion,itd} json_files [json_files ...] positional arguments: {variant,fusion,itd} Table to output json_files options: -h, --help show this help message and exit --itd-gene ITD_GENE