In bioinformatics, sequence assembly refers to aligning and merging fragments of a DNA sequence to reconstruct the original sequence, typically fragments of the genome resulting from shotgun sequencing, or fragments of a gene transcript (ESTs). Bioinformatics or computational biology is the use of techniques from applied mathematics, informatics, statistics, and computer science to solve biological problems. ... Sequence alignment is an arrangement of two or more sequences, highlighting their similarity. ... Space-filling model of a section of DNA molecule Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions specifying the biological development of all cellular forms of life (and most viruses). ... In biology the genome of an organism is the whole hereditary information of an organism that is encoded in the DNA (or, for some viruses, RNA). ... Shotgun sequencing is a method used in genetics for sequencing long DNA strands. ... Transcription may be one of the following: In linguistics, transcription is the conversion of spoken words into written language. ... An expressed sequence tag or EST is a short sub-sequence of a protein-coding DNA sequence. ...

Popular first-generation sequence assemblers were Phrap, TIGR Assembler, and CAP3. Phrap was the earliest of the three and the most popular, but the latter two were more accurate. Faced with the challenge of assembling the much larger genomes of the fruit fly [Drosophila melanogaster]] in 2000 and the human genome just a year later, scientists developed a new generation of assemblers. The first of these was the Celera Assembler, developed by Gene Myers and colleagues, followed by Arachne, developed at MIT by Serafim Batzoglou and later enhanced by David Jaffe and colleagues. These modern assemblers can handle genomes of 100-300 million base pairs such as the fruit fly and other insects, as well as the 3 billion base pairs of the human genome and other mammals. Subsequent to these efforts, several other groups, mostly at the major genome sequencing centers, built large-scale assemblers, and an open source effort known as AMOS was launched to bring together all the innovations in genome assembly technology under the open source framework. Gene Myers is a professor of computer science at the University of California, Berkeley, whose research focuses on algorithms and computational biology. ... Open source refers to projects that are open to the public and which draw on other projects that are freely available to the general public. ...

Greedy algorithm

This algorithm is an example of how to solve a sequence assembly problem.

Given a set of sequence fragments the object is to find the shortest common sequence.

1. find the two fragments which have the largest overlap.

2. merge these two fragments

3. repeat step 1. and 2. until only one fragment is left

4. this fragment is a suboptimal solution to our problem

the algorithm in pseudocode:

Let T be a set of fragments

while |T| > 1 {

 o* = 0 
 For i = 1 to |T| { 
 For j = 1 to |T| where i != j { If o(i,j) >= o* then i*=i, j*=j, o*=o(i,j) } 
 merge fragment i* and j* (this reduces |T| by one) 


o(i,j) = overlap of fragment i with j



http://amos.sourceforge.net (AMOS)

Huang, X and Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research,9:868-877.

SDSATC Sequence Assembly (2239 words)
Assembly is the process of taking trace files generated from the ABI 373 Sequencers, transferring the data to UNIX, and processing them to determine if they are novel yeast DNA sequences.
Not all sequences are guarranteed to be the desired yeast sequence.
If the lanes are mislabeled, the data is still useful for assembly, however, if a trace from that gel has a problem area in it which needs to be resolved by resequencing it, the trace will not correspond correctly to the sample sheet and the incorrect template will be chosen for resequencing.
Download the Current Assembly of Candida Albicans (1671 words)
Assemblies 7-18 were produced during the development of our software and methods for assembly of the diploid genome.
Assembly of diploid whole-genome shotgun sequence, at least in an organism with the degree of divergence between alleles observed in Candida, cannot be regarded as a routine task at this time.
This is interpreted as the protein sequence for ORF 2 from assembly 6.
  More results at FactBites »



