FACTOID # 24: Looking for table makers? Head to Mississippi, with an overwhlemingly large number of employees in furniture manufacturing.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "BLAST" also viewed:
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > BLAST
BLAST
Developed by Myers, E., Altschul S.F., Gish W., Miller E.W., Lipman D.J., NCBI
OS UNIX, Linux, Mac, MS-Windows
Genre Bioinformatics tool
License Public Domain
Website ftp://ftp.ncbi.nlm.nih.gov/blast/

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in 1990[1]. Image File history File links Broom_icon. ... This article is about the bioinformatics software tool. ... For other uses, see Software developer (disambiguation). ... Stephen Altschul is an American mathematician who has designed algorithms that are widely used in the field of Bioinformatics. ... Webb Miller is a professor in the Department of Biology and the Department of Computer Science and Engineering at The Pennsylvania State University. ... David J. Lipman is an American biologist who since 1989 has been the Director of NCBI (the National Center for Biotechnology Information) at the National Institutes of Health. ... The National Center for Biotechnology Information (NCBI) is part of the US National Library of Medicine (NLM), which is a branch of the US National Institutes of Health. ... An operating system (OS) is a software that manages computer resources and provides programmers with an interface used to access those resources. ... Filiation of Unix and Unix-like systems Unix (officially trademarked as UNIX®, sometimes also written as or ® with small caps) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy. ... This article is about operating systems that use the Linux kernel. ... The first Macintosh computer, introduced in 1984, upgraded to a 512K Fat Mac. The Macintosh or Mac, is a line of personal computers designed, developed, manufactured, and marketed by Apple Computer. ... Microsoft Windows is a range of commercial operating environments for personal computers. ... A software license is a legal agreement which may take the form of a proprietary or gratuitous license as well as a memorandum of contract between a producer and a user of computer software. ... A website (alternatively, web site or Web site) is a collection of Web pages, images, videos or other digital assets that is hosted on one or more web servers, usually accessible via the Internet. ... Map of the human X chromosome (from the NCBI website). ... Flowcharts are often used to graphically represent algorithms. ... A protein primary structure is a chain of amino acids. ... This article is about the class of chemicals. ... A representation of the 3D structure of myoglobin showing coloured alpha helices. ... A nucleotide is a chemical compound that consists of 3 portions: a heterocyclic base, a sugar, and one or more phosphate groups. ... part of a DNA sequence A DNA sequence (sometimes genetic sequence) is a succession of letters representing the primary structure of a real or hypothetical DNA molecule or strand, The possible letters are A, C, G, and T, representing the four nucleotide subunits of a DNA strand (adenine, cytosine, guanine... This article is about computing. ... Binomial name Mus musculus Linnaeus, 1758 Mus musculus is the common house mouse. ... A graphical representation of the normal human karyotype. ... Stephen Altschul is an American mathematician who has designed algorithms that are widely used in the field of Bioinformatics. ... David J. Lipman is an American biologist who since 1989 has been the Director of NCBI (the National Center for Biotechnology Information) at the National Institutes of Health. ... Webb Miller is a professor in the Department of Biology and the Department of Computer Science and Engineering at The Pennsylvania State University. ... NIH can refer to: National Institutes of Health Norwegian School of Sports Sciences: (Norges idrettshøgskole - NIH) Not Invented Here This is a disambiguation page — a navigational aid which lists other pages that might otherwise share the same title. ...

Contents

Background

BLAST is one of the most widely used bioinformatics programs[2], because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. This emphasis on speed is vital to making the algorithm practical on the huge genome databases currently available, although subsequent algorithms can be even faster.


Examples of other questions that researchers use BLAST to answer are

  • Which bacterial species have a protein that is related in lineage to a certain protein with known amino-acid sequence?
  • Where does a certain sequence of DNA originate?
  • What other genes encode proteins that exhibit structures or motifs such as ones that have just been determined?

BLAST is also often used as part of other algorithms that require approximate sequence matching. Phyla/Divisions Actinobacteria Aquificae Bacteroidetes/Chlorobi Chlamydiae/Verrucomicrobia Chloroflexi Chrysiogenetes Cyanobacteria Deferribacteres Deinococcus-Thermus Dictyoglomi Fibrobacteres/Acidobacteria Firmicutes Fusobacteria Gemmatimonadetes Nitrospirae Omnibacteria Planctomycetes Proteobacteria Spirochaetes Thermodesulfobacteria Thermomicrobia Thermotogae Bacteria (singular, bacterium) are a major group of living organisms. ... For other uses, see Species (disambiguation). ... A protein primary structure is a chain of amino acids. ... In an unbranched, chain-like biological molecule, such as a protein or a strand of RNA, a structural motif is a three-dimensional structural element or fold within the chain, which appears also in a variety of other molecules. ...


The BLAST algorithm and the computer program that implements it were developed by Stephen Altschul, Warren Gish, David Lipman at the U.S. National Center for Biotechnology Information (NCBI), Webb Miller at the Pennsylvania State University, and Gene Myers at the University of Arizona. It is available on the web on the NCBI website. Alternative implementations include WU-BLAST and FSA-BLAST. A computer program is a collection of instructions that describe a task, or set of tasks, to be carried out by a computer. ... Stephen Altschul is an American mathematician who has designed algorithms that are widely used in the field of Bioinformatics. ... David J. Lipman is an American biologist who since 1989 has been the Director of NCBI (the National Center for Biotechnology Information) at the National Institutes of Health. ... The National Center for Biotechnology Information (NCBI) is part of the US National Library of Medicine (NLM), which is a branch of the US National Institutes of Health. ... Webb Miller is a professor in the Department of Biology and the Department of Computer Science and Engineering at The Pennsylvania State University. ... This article is about the state-related university. ... Gene Myers is a professor of computer science at the University of California, Berkeley, whose research focuses on algorithms and computational biology. ... The University of Arizona (UA or U of A) is a land-grant and space-grant public institution of higher education and research located in Tucson, Arizona, United States. ...


The original paper by Altschul, et al.[1] was the most highly cited paper published in the 1990s.[3]


Input/Output

Input and output conform to the FASTA format. In bioinformatics, FASTA format is a file format used to exchange information between genetic sequence databases. ...


Algorithm

To run, BLAST requires a query sequence (also called the target sequence) to search for, and a sequence to search against (or a sequence database containing multiple such sequences). BLAST will find subsequences in the database which are similar to subsequences in the query. In typical usage, the query sequence is much smaller than the database, e.g., the query may be one thousand nucleotides while the database is several billion nucleotides.


BLAST searches for high scoring sequence alignments between the query sequence and sequences in the database using a heuristic approach that approximates the Smith-Waterman algorithm. The exhaustive Smith-Waterman approach is too slow for searching large genomic databases such as GenBank. Therefore, the BLAST algorithm uses a heuristic approach that is less accurate than the Smith-Waterman but over 500 times faster. The speed and relatively good accuracy of BLAST are among the key technical innovation of the BLAST programs. In bioinformatics, a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. ... The Smith-Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. ... The GenBank sequence database is an annotated collection of all publicly available nucleotide sequences and their protein translations. ... For other uses, see Heuristic (disambiguation). ...


The original BLAST algorithm can be conceptually divided into three stages.

  1. In the first stage, BLAST searches for exact matches of a small fixed length W between the query and sequences in the database. For example, given the sequences AGTTAC and ACTTAG and a word length W = 3, BLAST would identify the matching substring TTA that is common to both sequences. These exact matches are known as seeds. By default, W = 11 is used for nucleic seeds. Present versions of BLAST use two exact hits separated by a gapless region as matches in the following alignment step (high-scoring pairs, HSPs).
  2. In the second stage, BLAST tries to extend the match in both directions, starting at the seed. The ungapped alignment process extends the initial seed match of length W in each direction in an attempt to boost the alignment score. Insertions and deletions are not considered during this stage. For our example, the ungapped alignment between the sequences AGTTAC and ACTTAG centered around the common word TTA would be:
 ..AGTTAC.. | ||| ..ACTTAG.. 
If a high-scoring un-gapped alignment is found, the database sequence passes on to the third stage.
  1. In the third stage, BLAST performs a gapped alignment between the query sequence and the database sequence using a variation of the Smith-Waterman algorithm. Statistically significant alignments are then displayed to the user.

The Smith-Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. ... In statistics, a result is significant if it is unlikely to have occurred by chance, given that a presumed null hypothesis is true, but is not improbable if the null hypothesis is false. ...

Parallel BLAST

Parallel BLAST versions are implemented using MPI and Pthreads, and have been ported to various platforms including Windows, Linux, Solaris, Mac OS X, and AIX. Popular approaches to parallelize BLAST include query distribution, hash table segmentation, computation parallelization, and database segmentation (partition)[citation needed]. Message Passing Interface (MPI) is computer software that allows many computers to communicate with one another. ... pthreads is an abbreviation for POSIX threads and a library that provides POSIX-compliant functions for creating and manipulating threads. ... Windows redirects here. ... This article is about operating systems that use the Linux kernel. ... The Solaris Operating System, usually known simply as Solaris, is a free Unix-based operating system introduced by Sun Microsystems in 1992 as the successor to SunOS. Solaris is known for its scalability, especially on SPARC systems, as well for being the origin for many innovative features such as DTrace... Mac OS X (pronounced ) is a line of graphical operating systems developed, marketed, and sold by Apple Inc. ... AIX (Advanced Interactive eXecutive) is a proprietary operating system developed by IBM based on UNIX System V. Before the product was ever marketed, the acronym AIX originally stood for Advanced IBM UNIX. AIX has pioneered numerous network operating system enhancements, introducing new innovations later adopted by Unix-like operating systems...


Program

The BLAST program can either be downloaded and run as a command-line utility "blastall" or accessed for free over the web. The BLAST web server, hosted by the NCBI, allows anyone with a web browser to perform similarity searches against constantly updated databases of proteins and DNA that include most of the newly sequenced organisms. The National Center for Biotechnology Information (NCBI) is part of the US National Library of Medicine (NLM), which is a branch of the US National Institutes of Health. ...


BLAST is actually a family of programs (all included in the blastall executable). These include:

Nucleotide-nucleotide BLAST (blastn)
This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies.
Protein-protein BLAST (blastp)
This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies.
Position-Specific Iterative BLAST (PSI-BLAST)
This program is used to find distant relatives of a protein. First, a list of all closely related proteins is created. These proteins are combined into a general "profile" sequence, which summarises significant features present in these sequences. A query against the protein database is then run using this profile, and a larger group of proteins is found. This larger group is used to construct another profile, and the process is repeated.
By including related proteins in the search, PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST.
Nucleotide 6-frame translation-protein (blastx)
This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences.
Protein-nucleotide 6-frame translation (tblastn)
This program compares a protein query against the all six frame translations of a nucleotide sequence database.
Large numbers of query sequences (megablast)
When comparing large numbers of input sequences via the command-line BLAST, "megablast" is much faster than running BLAST multiple times. It concatenates many input sequences together to form a large sequence before searching the BLAST database, then post-analyze the search results to glean individual alignments and statistical values.

Fig. ...

Alternative versions

An extremely fast but considerably less sensitive alternative to BLAST that compares nucleotide sequences to the genome is BLAT (Blast Like Alignment Tool). A version designed for comparing multiple large genomes or chromosomes is BLASTZ.


Accelerated versions

  • There are two main field-programmable gate array (FPGA) implementations of the BLAST algorithm. Progeniq is up to 100x faster than a software implementation running on the same processor[citation needed]. TimeLogic [1] offers a FPGA BLAST package called Tera-BLAST.
  • The Mitrion-C Open Bio Project is an ongoing effort to port blast to run on Mitrion FPGAs. It is available on SourceForge.

An Altera Stratix II GX FPGA. A field-programmable gate array is a semiconductor device containing programmable logic components called logic blocks, and programmable interconnects. ... A field-programmable gate array or FPGA is a gate array that can be reprogrammed after it is manufactured, rather than having its programming fixed during the manufacturing — a programmable logic device. ...

References

  1. ^ a b Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). "Basic local alignment search tool". J Mol Biol 215 (3): 403-410. doi:10.1006/jmbi.1990.9999. PMID 2231712. 
  2. ^ Casey, RM (2005). "BLAST Sequences Aid in Genomics and Proteomics". Business Intelligence Network.
  3. ^ Sense from Sequences: Stephen F. Altschul on Bettering BLAST. ScienceWatch July/August 2000.

A digital object identifier (or DOI) is a standard for persistently identifying a piece of intellectual property on a digital network and associating it with related data, the metadata, in a structured extensible way. ...

See also

The Needleman–Wunsch algorithm performs a global alignment on two sequences (called A and B here). ... The Smith-Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. ... In bioinformatics, a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. ... This list of sequence alignment software is a compilation of bioinformatics software tools and web portals used in pairwise sequence alignment, multiple sequence alignment, and structural alignment. ... SEQUEROME is a web-based Sequence profiling tool developed by the Bioinformatics and Computational Biosciences Unit (BCBU) at the Georgetown University - http://bioinformatics. ... eTBLAST is provided by the University of Texas Southwestern Medical School. ...

External links

  • NCBI-BLAST website
  • NCBI-BLAST Tutorial
  • WU-BLAST - The original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis
  • FSA-BLAST - A new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements
  • NBIC mpiBLAST - Netherlands Bioinformatics Centre, running mpiBLAST
  • PatternHunter - An alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity
  • Parallel BLAST - A dual scheduling BLAST tested on the Blue Gene/L
  • BLAST HOWTO at the Wikiomics bioinformatics wiki
  • A/G BLAST - Implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer's Advanced Computation Group and Genentech.
  • STRAP The protein workbench STRAP contains a comfortable BLAST front-end with a cache for BLAST results
  • KoriBlast is a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface.
  • Using the Basic Local Alignment Search Tool (BLAST)
Databases supported by Bioinformatic Harvester
NCBI-BLAST | CDD | Ensembl | Entrez | Flybase | Flymine | GFP-cDNA | Genome_browser | GeneCard | Google_Scholar | GoPubMed | HomoloGene | iHOP | IPI | OMIM | Mitocheck | PSORT | PolyMeta | UniProt | SOURCE | SOSUI | RZPD | Sciencenet | STRING | SMART | ZFIN |
Washington University redirects here. ... Apple Inc. ... The Advanced Computation Group (ACG) researches algorithms and high-performance issues relevant to Apple technology. ... Genentech, Inc. ... The Bioinformatic-Harvester is a bioinformatic meta search engine for genes and protein associated information. ... CDD may mean: Capability Development Document Case Deletion Diagnostics Centre for Democracy and Development Cooling degree day Craniodiaphyseal dysplasia This page expands a three-character combination which might be any or all of: an abbreviation, an acronym, an initialism, a word in English, or a word in another language. ... Ensembl is a bioinformatics research project aiming to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. It is run in a collaboration between the Wellcome Trust Sanger Institute and the European Bioinformatics Institute, an outstation of the European Molecular Biology Laboratory. ... The Entrez logo The Entrez Global Query Cross-Database Search System allows access to databases at the National Center for Biotechnology Information (NCBI) website. ... FlyBase is an online bioinformatics database of the biology and genome of the model organism Drosophila melanogaster and related Drosophilid dipterans. ... The GFP-cDNA project documents the localisation of proteins to subcellular compartments of the eukaryotic cell applying fluorescence microscopy. ... As of 2006, there are over 1,000 public and commercial biological databases. ... Google Scholar Logo Google Scholar (GS) is a freely-accessible web search engine that indexes the full-text of scholarly literature across an array of publishing formats and disciplines. ... GoPubMed is a knowledge-based search engine for biomedical texts. ... HomoloGene[1], a tool of the National Center for Biotechnology Information (NCBI)[2], is a system for automated detection of homologs (similarity attributable to descent from a common ancestor) among the annotated genes of several completely sequenced eukaryotic genomes. ... This article needs to be wikified. ... The Mendelian Inheritance in Man project is a database that catalogues all the known diseases with a genetic component, and - when possible - links them to the relevant genes in the human genome. ... MitoCheck is an integrated research project which brings together leading European research groups to study systematically the regulation of mitosis in human cells. ... PSORT is a bioinformatics tool available online at http://psort. ... UniProt is the universal protein database, a central repository of protein data created by combining Swiss-Prot, TrEMBL and PIR. This makes it the worlds most comprehensive resource on protein information. ... SOSUI is a free online tool that predicts a part of the secondary structure of proteins from a given amino acid sequence (AAS). ... Sciencenet is a experimental search engine at KIT (Karlsruhe_Institute_of_Technology - Liebel-Lab) for scientific knowledge. ... The Zebrafish Information Network (ZFIN) is an online biological database of information about the zebrafish (Danio rerio). ...

  Results from FactBites:
 
Blast Wave - WoWWiki - Your guide to the World of Warcraft (338 words)
Blast Wave is a mage spell that deals Fire damage and dazes all enemies within 10 yds of the caster.
Blast Wave deals fire damage, and thus gains the benefit of the talents Impact, Ignite, Burning Soul, Master of Elements, Critical Mass, Fire Power, Pyromaniac, and Elemental Precision.
Between Blast Wave, Frost Nova, Cone of Cold, Blink, Cold Armors, and Frostbolt, a mage is a slippery target to catch.
BLAST (518 words)
BLAST was developed and is maintained by a group at the National Center for Biotechnology Information (NCBI).
BLAST is based on an explicit statistical theory developed by Samuel Karlin and Steven Altschul (PNAS 87:2284-2268.
BLAST is not guaranteed to find the best alignment between your query and the database; it may miss matches.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m