An Introduction to Snakemake for Workflow Management¶
Workflow management systems help to automate analyses and make them easier to maintain, reproduce, and share with others. In this tutorial, we will walk through the basic steps for creating a variant calling workflow with the Snakemake workflow management system.
This may not be the variant calling workflow you would necessarily use in practice, but it serves as a good example for teaching Snakemake. Many people do indeed use samtools
, but for particularly big or complex genomes, guidelines provided by GATK would serve best. Additionally, various parameters associated with mapping, visualization, etc. may require tuning.
These materials were adapted from the Data Intensive Biology lab course materials (course materials, lab materials).
Est. Time | Lesson name | Description |
---|---|---|
5 mins | Introduction | What is a workflow? What is Snakemake? |
30 mins | Set Up | Set up tutorial computing environment |
30 mins | The Snakefile | What is the Snakefile? What are Snakemake rules? |
45 mins | Decorating the Snakefile | How do the rules link together? |
30 mins | Continue Decorating | More on rule linking |
20 mins | Run the Whole Pipeline | Final Snakemake rule What are the results? |
Learning Objectives
The objectives of this tutorial are to:
-
learn how to write basic workflows with Snakemake rules
-
learn variable substitution for Snakemake rules
-
learn wildcard matching for Snakemake rules
-
understand why workflow systems can help you do your computing more easily
This tutorial is written for a Unix or Linux compute environment (e.g., MacOS, Linux-based HPC, pre-configured binder). It assumes basic knowledge of navigating, editing files, and executing scripts from the command line. Some knowledge of Python is useful, but not required.