Skip to content

An Introduction to Snakemake for Workflow Management

Workflow management systems help to automate analyses and make them easier to maintain, reproduce, and share with others. In this tutorial, we will walk through the basic steps for creating a variant calling workflow with the Snakemake workflow management system.

This may not be the variant calling workflow you would necessarily use in practice, but it serves as a good example for teaching Snakemake. Many people do indeed use samtools, but for particularly big or complex genomes, guidelines provided by GATK would serve best. Additionally, various parameters associated with mapping, visualization, etc. may require tuning.

These materials were adapted from the Data Intensive Biology lab course materials (course materials, lab materials).

Est. Time Lesson name Description
5 mins Introduction What is a workflow?
What is Snakemake?
30 mins Set Up Set up tutorial computing environment
30 mins The Snakefile What is the Snakefile?
What are Snakemake rules?
45 mins Decorating the Snakefile How do the rules link together?
30 mins Continue Decorating More on rule linking
20 mins Run the Whole Pipeline Final Snakemake rule
What are the results?

Learning Objectives

The objectives of this tutorial are to:

  • learn how to write basic workflows with Snakemake rules

  • learn variable substitution for Snakemake rules

  • learn wildcard matching for Snakemake rules

  • understand why workflow systems can help you do your computing more easily

This tutorial is written for a Unix or Linux compute environment (e.g., MacOS, Linux-based HPC, pre-configured binder). It assumes basic knowledge of navigating, editing files, and executing scripts from the command line. Some knowledge of Python is useful, but not required.

Last update: April 2, 2021