Snakemake uses a file called "Snakefile" to configure the steps, or rules, of your pipeline. The basic Snakefile consists of several rules defining the inputs, outputs, and rule commands.
The written contents of the tutorial are demonstrated in three short videos. We recommend watching the videos first and referring to the tutorial text for additional help.
Part 1: Introducing the Snakefile and Snakemake
Step 1: Editing the Snakefile¶
Make sure you're in the
(snaketest) conda environment. The remainder of this tutorial will be in the
Let's take a look at the Snakefile using the nano text editor:
nano -ET4 Snakefile
This is the skeleton of our Snakefile for calling variants. The command above tells nano to open the Snakefile and to create 4 spaces when you hit the Tab key. The Snakefile is written in the Python programming language, which uses specific indentation formatting to interpret the code. Incorrect indentation will result in syntax errors.
There are several rules defined with commands to run, but we'll need to add a few more details by editing with nano.
Add a comment to the Snakefile:
open Snakefile in
on a new line, add a comment with
# This is my first edit!
save the change by hitting
nanowill ask if you want to save the file as "Snakefile". We want to keep the same file name, so hit
view the Snakefile in Terminal with:
if you can see your comment, it worked! Exit the
lessview by hitting
Please refer to the bash command cheat sheet for commonly used
nano commands and other shortcuts for navigating your Terminal!
Ok, let's move on and take a look at the structure of the Snakefile rules.
Step 2: Looking at Snakefile rules¶
Each step in a pipeline is defined by a rule in the Snakefile. The components of each rule are indented 4 spaces. The most basic structure of a rule is:
rule rule_name: shell: # for single line commands # command must be enclosed in quotes "command"
There are several rules in the Snakefile. Let's do a search for all the rules in the file:
grep rule Snakefile
The output is a list of the lines in the Snakefile with the word 'rule' in them. There are 11 rules in this pipeline:
rule download_data: rule download_genome: rule uncompress_genome: rule index_genome_bwa: rule map_reads: rule index_genome_samtools: rule samtools_import: rule samtools_sort: rule samtools_index_sorted: rule samtools_mpileup: rule make_vcf:
Step 3: Running Snakemake & the Snakefile¶
Let's try running a Snakemake rule:
snakemake -p map_reads
-p means show the command that you're running.
Oops, this will fail! Why?
As the error message in red states, the rule failed because we don't have any of the input files required to run this rule yet! For the mapping rule to work, we need the raw read (
.fastq.gz) file and reference genome (
.fa.gz) that should be indexed.
The placement of snakemake flags must follow the
snakemake command, but otherwise the location does not matter. Thus,
snakemake -p map_reads will run the same as
snakemake map_reads -p.
Let's try again, starting with the first rule in the Snakefile:
Snakemake runs the shell command listed under the
download_data rule. In this case, the shell command downloads the raw read file from a public repository on osf.io.
snakemake -p download_data
Check the working directory. There should now be a
This command shows you the file permissions, number of links, owner name, owner group, file size in bytes, time of last modification, and file/directory name.
-rw-r--r-- 1 jovyan root 8.4M Aug 26 00:35 SRR2584857_1.fastq.gz -rw-r--r-- 1 jovyan jovyan 1.2K Jul 23 00:01 Snakefile
Next run some more rules sequentially – one at a time:
snakemake -p download_data
snakemake -p download_genome
snakemake -p uncompress_genome
snakemake -p index_genome_bwa
snakemake -p map_reads
Check the working directory again. The directory is populated by many output files including reference genome (
.fa), genome index (
.fa.amb etc) and mapped reads (
.sam) files. The
map_reads rule ran without any error!.
Please refer to the Snakemake command cheat sheet for commonly used Snakemake commands!
In this section we explored the basic template of a Snakefile which contains rules with all the necessary commands for variant calling.
- each rule can be run individually using
snakemake -p <rule_name>.
- each rule encompasses shell commands, with a bit of “decoration”. You could run them yourself directly in the terminal if you wanted!
- while the written order of the rules in Snakefile doesn’t matter, the order in which the rules are run on the terminal does matter! More on this in the next section!
- by default, if you don't specify a rule, Snakemake executes the first rule in the Snakefile (this is actually the only case where the order of rules in the Snakefile matters!)
- output Snakemake message is in red if it fails
- the code is case-sensitive
- tabs and spacing matters
In the next section, we'll cover how to connect the rules so Snakemake can recognize rules that depend on each other and run them in the correct order.