FACTOID # 13: New York has America's lowest percentage of residents who are veterans.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Shotgun sequencing

Shotgun sequencing is a method used in genetics for sequencing long DNA strands. Since the chain termination method of DNA sequencing can only be used for fairly short strands, it is necessary to divide longer sequences up and then assemble the results to give the overall sequence. In chromosome walking, this division is done by progressing through the entire strand, piece by piece; shotgun sequencing uses a faster, but more complex, process to assemble random pieces of the sequence. For a non-technical introduction to the topic, please see Introduction to genetics. ... In genetics and biochemistry, sequencing means to determine the primary structure (or primary sequence) of an unbranched biopolymer. ... The structure of part of a DNA double helix. ... DNA sequencing is the process of determining the nucleotide order of a given DNA fragment, called the DNA sequence. ... Chromosome walking is a method in genetics for identifying and sequencing long parts of a DNA strand, e. ...


In shotgun sequencing, DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a contiguous sequence.


For example, consider the following two rounds of shotgun reads:

 Original strand : AGCATGCTGCAGTCATGCTTAGGCTA 
 First round of shotgun reads : AGCATGCTGCAG TCATGCTTAGGCTA 
 Second round of shotgun reads : TTAGGCTA AGCATGCTGCAGTCATGC 

In this extremely simplified example, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally aggravated by the great abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence. In the study of DNA sequences, one can distinguish two main types of repeated sequence: Tandem repeats: Satellite DNA, Minisatellite, Microsatellite; Interspersed repeats: SINEs (Short INterspersed Elements), LINEs (Long INterspersed Elements). ...


Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome. The Human Genome Project (HGP) is a project to the 3 billion nucleotides contained in the human genome and to identify all the genes present in it. ...


Whole genome shotgun sequencing

Whole genome shotgun sequencing is an application of pairwise end sequencing, known colloquially as double-barrel shotgun sequencing. As sequencing projects began to take on longer and more complicated projects, multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment. The first published description of the use of paired ends was by Edwards et al. in 1990 as part of the sequencing of the human Hypoxanthine-guanine phosphoribosyltransferase locus, although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was by Edwards and Caskey in 1991. At the time, there was community consensus that the optimal fragment length for pairwise end sequencing would be three times the sequence read length. In 1995 Roach et al. introduced the innovation of using fragments of varying sizes, and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The strategy was subsequently adopted by The Institute for Genomic Research (TIGR) to sequence the genome of the bacterium Haemophilus influenzae in 1995, and then by Celera_Genomics to sequence the fruit fly genome in 2000, and subsequently the human genome. Hypoxanthine-guanine phosphoribosyltransferase (HPRT) is an enzyme (EC 2. ... The Institute for Genomic Research (TIGR), is a non-profit genomics research institute founded in 1992 by Craig Venter in Rockville, Maryland, United States. ... Celera Genomics (NYSE: CRA) is a business unit of the Applera Corporation that focuses on genetic sequencing and related technologies. ...


To apply the strategy, high-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap. In genetics, a clone is a replica of all or part of a macromolecule (eg. ... The chain termination or Sanger or dideoxy method is a process used to sequence (read the bases of) DNA. It is named after Frederick Sanger who developed the process in 1975. ...


The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as contigs. Contigs can be linked together into scaffolds by following connections between mate pairs. The distance between contigs can be inferred from the mate pair positions if the library size is known and has a narrow window of deviation. Computer software (or simply software) refers to one or more computer programs and data held in the storage of a computer for some purpose. ...


Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as . For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x coverage. A nucleotide is a chemical compound that consists of a heterocyclic base, a sugar, and one or more phosphate groups. ...


Proponents of this approach argue that it is possible to sequence the whole genome at once using large arrays of sequencers, which makes the whole process much more efficient than more traditional approaches. Detractors argue that although the technique quickly sequences large regions of DNA, its ability to correctly link these regions is suspect, particularly for genomes with repeating regions. As sequence assembly programs become more sophisticated and computing power becomes cheaper, it may be possible to overcome this limitation[citation needed]. In biology the genome of an organism is the whole hereditary information of an organism that is encoded in the DNA (or, for some viruses, RNA). ... In bioinformatics, sequence assembly refers to aligning and merging fragments of a DNA sequence to reconstruct the original sequence, typically fragments of the genome resulting from shotgun sequencing, or fragments of a gene transcript (ESTs). ...


References

  • Shotgun sequencing comes of age. The Scientist. Retrieved on December 31, 2002.
  • Shotgun sequencing finds nanoorganisms - Probe of acid mine drainage turns up unsuspected virus-sized Archaea. SpaceRef.com. Retrieved on December 23, 2006.

Fleischmann, RD; et al. (1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.". Science 269 (5223): 496-512. December 31 is the 365th day of the year (366th in leap years) in the Gregorian Calendar. ... For album titles with the same name, see 2002 (album). ... December 23 is the 357th day of the year in the Gregorian Calendar (358th in leap years). ... 2006 (MMVI) was a common year starting on Sunday of the Gregorian calendar. ...


Adams, MD; et al. (2000). "The genome sequence of Drosophila melanogaster". Science 287 (5461): 2185-95.


Edwards, A; Voss, H.; Rice, P.; Civitello, A.; Stegemann, J.; Schwager, C.; Zimmerman, J.; Erfle, H.; Caskey, T.; Ansorge, W. (1990). "Automated DNA sequencing of the human HPRT locus". Genomics 6: 593-608.


Edwards, A; Caskey, T (1991). "Closure strategies for random DNA sequencing". Methods: A Companion to Methods in Enzymology 3 (1): 41-47.


Roach, JC; Boysen, C; Wang, K; Hood, L (1995). "Pairwise end sequencing: a unified approach to genomic mapping and sequencing". Genomics 26: 345-353.


This article contains material text from the NCBI Handbook published by the National Center for Biotechnology Information, which, as a US government publication, is in the public domain. The National Center for Biotechnology Information (NCBI) is part of the US National Library of Medicine (NLM), which is a branch of the US National Institutes of Health. ... The public domain comprises the body of all creative works and other knowledge—writing, artwork, music, science, inventions, and others—in which no person or organization has any proprietary interest. ...


  Results from FactBites:
 
Genome Sequencing; DNA, PCR, & MicroRNA Analysis; De Novo Assembly - 454 Life Sciences™ (311 words)
With 454 Sequencing technology each molecule/ amplicon within the mixture is sequenced individually allowing for the identification of rare variants and the assignment of haplotype information over the full sequenced length.
cDNA sequencing for de novo assembly of the transcriptome or mapping to a scaffold genome for gene discovery.
De novo sequencing and assembly of your individual BACs or pool BACs to map it to a particular region of the genome.
12.21.2006 - Shotgun sequencing finds nanoorganisms (1049 words)
What made Baker's find possible was shotgun sequencing, a technique developed and made famous by Celera Corp., which used it to sequence the human genome in record time.
In 2004, Banfield collaborated with the Department of Energy's Joint Genome Institute to shotgun sequence a drop of the slime.
This type of sequencing involves homogenizing the organisms in the sample, isolating the combined DNA and breaking it into lots of random strands.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m