Running Command-Line BLAST

BLAST is the Basic Local Alignment Search large sequence databases; It starts by finding small matches between the two sequences and extending those matches. For in-depth information on how BLAST works and the different BLAST functionality, check out the resources page.

BLAST can be helpful for identifying the source of a sequence or finding a similar sequence in another organism. In this lesson, we will use BLAST to find zebrafish proteins that are similar to a small set of mouse proteins.

Why use the command line? BLAST has a very nice graphical interface for searching sequences in NCBI's database. However, running BLAST through the commmand line has many benefits:

  • It's much easier to run many BLAST queries using the command line than the GUI

  • Running BLAST with the command line is reproducible and can be documented in a script

  • The results can be saved in a machine-readable format that can be analyzed later

  • You can create your own databases to search rather than using NCBI's pre-built databases

  • It allows the queries to be automated

  • It allows you to use a remote computer to run the BLAST queries

Est. time Lesson name Description
30 mins Install BLAST Set up local BLAST installation on your computer
15 mins How to Run BLAST+ Run BLAST analysis with AWS

Learning Objectives

  • Gain hands-on exposure to the linux command line

  • Understand how data is turned into results by programs run at the command line

  • Some expertise in biology and genetics.

  • This tutorial was written to be run from an AWS remote instance. You need an AWS account. Please see our tutorial on setting up an AWS instance for help.

  • Basic shell scripting knowledge. Users must be comfortable with finding and opening a terminal window.

Last update: October 10, 2020