FACTOID # 1: Idaho produces more milk than Iowa, Indiana and Illinois combined.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "Superscalar" also viewed:
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Superscalar
Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.
Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.
Processor board of a CRAY T3e parallel computer with four superscalar Alpha processors
Processor board of a CRAY T3e parallel computer with four superscalar Alpha processors

A superscalar CPU architecture implements a form of parallelism called Instruction-level parallelism within a single processor. It thereby allows faster CPU throughput than would otherwise be possible at the same clock rate. A superscalar architecture executes more than one instruction during a single pipeline stage by pre-fetching multiple instructions and simultaneously dispatching them to redundant functional units on the processor. Image File history File links Download high resolution version (971x561, 11 KB) Title : Instruction scheduling on a 5 stages pipeline super scalar CPU (degree = 2). ... Image File history File links Download high resolution version (971x561, 11 KB) Title : Instruction scheduling on a 5 stages pipeline super scalar CPU (degree = 2). ... Image File history File linksMetadata Download high-resolution version (2232x1128, 448 KB) File links The following pages on the English Wikipedia link to this file (pages on other projects are not listed): Superscalar Cray Cray T3E Metadata This file contains additional information, probably added from the digital camera or scanner... Image File history File linksMetadata Download high-resolution version (2232x1128, 448 KB) File links The following pages on the English Wikipedia link to this file (pages on other projects are not listed): Superscalar Cray Cray T3E Metadata This file contains additional information, probably added from the digital camera or scanner... The Cray T3E was a massively parallel supercomputer sold by Cray Research from 1995. ... DEC Alpha AXP 21064 Microprocessor die photo Package for DEC Alpha AXP 21064 Microprocessor Alpha AXP 21064 bare die mounted on a business card with some statistics The DEC Alpha, also known as the Alpha AXP, is a 64-bit RISC microprocessor originally developed and fabricated by Digital Equipment Corp... “CPU” redirects here. ... Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain faster results. ... Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be dealt with at once. ... In communication networks, throughput is the amount of digital data per time unit that is delivered over a physical or logical link, or that is passing through a certain network node. ... The clock rate is the fundamental rate in cycles per second (measured in hertz) at which a computer performs its most basic operations such as adding two numbers or transferring a value from one processor register to another. ... In computer science, an instruction typically refers to a single operation of a processor within a computer architecture. ... In Computer architecture, instruction prefetch is a common technique used in modern microprocessors to speed up the execution of a program by reducing wait states. ...

Contents

History

Seymour Cray's CDC 6600 from 1965 is often mentioned as the first superscalar design. The Intel i960CA (1988) and the AMD 29000-series 29050 (1990) microprocessors were the first commercial single-chip superscalar microprocessors. RISC CPUs like these brought the superscalar concept to micro computers because the RISC design results in a simple core, allowing straightforward instruction dispatch and the inclusion of multiple functional units (such as ALUs) on a single CPU in the constrained design rules of the time. This was the reason that RISC designs were faster than CISC designs through the 1980s and into the 1990s. Seymour Roger Cray (September 28, 1925 â€“ October 5, 1996) was a U.S. electrical engineer and supercomputer architect who founded the company Cray Research. ... The CDC 6600 was a mainframe computer from Control Data Corporation, first manufactured in 1965. ... Year 1965 (MCMLXV) was a common year starting on Friday (link will display full calendar) of the 1965 Gregorian calendar. ... Intels i960 (or 80960) was a RISC-based microprocessor design that became popular during the early 1990s as an embedded microcontroller, becoming a best-selling CPU in that field, along with the competing AMD 29000. ... AMD 29000 Microprocessor The AMD 29000, often simply 29k, was a popular family of RISC-based 32-bit microprocessors and microcontrollers from Advanced Micro Devices. ... Reduced Instruction Set Computer (RISC), is a microprocessor CPU design philosophy that favors a smaller and simpler set of instructions that all take about the same amount of time to execute. ... A Complex Instruction Set Computer (CISC) is an instruction set architecture (ISA) in which each instruction can indicate several low-level operations, such as a load from memory, an arithmetic operation, and a memory store, all in a single instruction. ...


Except for CPUs used in some battery-powered devices, essentially all general-purpose CPUs developed since about 1998 are superscalar. Beginning with the "P6" (Pentium Pro and Pentium II) implementation, Intel's 80386 architecture microprocessors have implemented a CISC instruction set on a superscalar RISC micro-architecture. Complex instructions are internally translated to a RISC-like "micro-ops" RISC instruction set, allowing the processor to take advantage of the higher-performance underlying processor while remaining compatible with earlier Intel processors. Symbols representing a single Cell (top) and Battery (bottom), used in circuit diagrams. ... The P6 microarchitecture is the sixth generation Intel x86 microprocessor architecture, released in 1995. ... The Pentium Pro is a sixth-generation x86 architecture microprocessor (P6 core) produced by Intel and was originally intended to replace the original Pentium in a full range of applications, but later, was reduced to a more narrow role as a server and high-end desktop chip. ... Intel Pentium II Logo The Pentium II is an x86 architecture microprocessor by Intel, introduced on May 7, 1997. ... Intel Corporation (NASDAQ: INTC, SEHK: 4335), founded in 1968 as Integrated Electronics Corporation, is an American multinational corporation that is best known for designing and manufacturing microprocessors and specialized integrated circuits. ... The Intel 80386 is a microprocessor which was used as the central processing unit (CPU) of many personal computers from 1986 until 1994 and later. ... A Complex Instruction Set Computer (CISC) is an instruction set architecture (ISA) in which each instruction can indicate several low-level operations, such as a load from memory, an arithmetic operation, and a memory store, all in a single instruction. ... In computer science and computer engineering, a microarchitecture (sometime abbreviated to µarch or uarch) is the design and layout of a microprocessor, microcontroller, or digital signal processor. ...


From scalar to superscalar

The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy is the difference between scalar and vector arithmetic. A superscalar processor is sort of a mixture of the two. Each instruction processes one data item, but there are multiple redundant functional units within each CPU thus multiple instructions can be processing separate data items concurrently. Scalar processors represent the simplest class of computer processors. ... Processor board of a CRAY YMP vector computer A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. ... In linear algebra, real numbers are called scalars and relate to vectors in a vector space through the operation of scalar multiplication, in which a vector can be multiplied by a number to produce another vector. ...


Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and allowing it to keep the multiple functional units in use at all times. This has become increasingly important when the number of units increased. While early superscalar CPUs would have two ALUs and a single FPU, a modern design like the PowerPC 970 includes four ALUs and two FPUs and a couple of SIMD units too. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will suffer altogether. A floating point unit (FPU) is a part of a computer system specially designed to carry out operations on floating point numbers. ... PowerPC 970FX Processor In computing, the PowerPC 970, PowerPC 970FX, PowerPC 970GX, and PowerPC 970MP, are 64-bit processors in the PowerPC family from IBM. The PowerPC 970 was introduced in 2002. ... -1...


A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle. But merely processing multiple instructions concurrently does not make an architecture superscalar, since both pipelined CPUs and Multicore CPUs also achieve that, but via different methods. Cycles per instruction, also known as clock cycles per instruction, or clocks per instruction (CPI) is the number of clock cycles that happen when a instruction is being executed by a computer with a given clock frequency. ... Instruction pipelining is a method for increasing the throughput of a digital circuit, particularly a CPU, and implements a form of instruction level parallelism. ... A multicore processor is a chip with more than one processing units (cores). ...


In a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to redundant functional units contained inside a single CPU. Therefore a superscalar processor can be envisioned having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread.


Limitations

Available performance improvement from superscalar techniques is limited by two key areas:

  1. The degree of intrinsic parallelism in the instruction stream, i.e. limited amount of instruction-level parallelism, and
  2. The complexity and time cost of the dispatcher and associated dependency checking logic.

Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; d = a + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units.


When the number of simultaneously issued instructions increases, the cost of dependency checking increases extremely rapidly. This is exacerbated by the need to check dependencies at run time and at the CPU's clock rate. This cost includes additional logic gates required to implement the checks, and time delays through those gates. Research shows the gate cost in some cases may be nk gates, and the delay cost k2logn, where n is the number of instructions in the processor's instruction set, and k is the number of simultaneously dispatched instructions. In mathematics, this is called a combinatoric problem involving permutations. Combinatorics is a branch of mathematics that studies finite collections of objects that satisfy specified criteria, and is in particular concerned with counting the objects in those collections (enumerative combinatorics) and with deciding whether certain optimal objects exist (extremal combinatorics) and which algebraic structures these objects have (algebraic combinatorics). ... Permutation is the rearrangement of objects or symbols into distinguishable sequences. ...


Even though the instruction stream may contain no inter-instruction dependencies, a superscalar CPU must nonetheless check for that possibility, since there is no assurance otherwise and failure to detect a dependency would produce incorrect results.


No matter how advanced the semiconductor process or how fast the switching speed, this places a practical limit on how many instructions can be simultaneously dispatched. While process advances will allow ever greater numbers of functional units (e.g, ALUs), the burden of checking instruction dependencies grows so rapidly that the achievable superscalar dispatch limit is fairly small. -- likely on the order of five to six simultaneously dispatched instructions.


However even given infinitely fast dependency checking logic on an otherwise conventional superscalar CPU, if the instruction stream itself has many dependencies, this would also limit the possible speedup. Thus the degree of intrinsic parallelism in the code stream forms a second limitation.


Alternatives

Collectively, these two limits drive investigation into alternative architectural performance increases such as Very Long Instruction Word (VLIW), Explicitly Parallel Instruction Computing, simultaneous multithreading (SMT), and multi-core processors. A Very Long Instruction Word or VLIW CPU architecture implements a form of instruction level parallelism. ... Explicitly Parallel Instruction Computing (EPIC) is a computing paradigm that began to be researched in the 1990s. ... Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. ... Diagram of an Intel Core 2 dual core processor, with CPU-local Level 1 caches, and a shared, on-die Level 2 cache. ...


With VLIW, the burdensome task of dependency checking by hardware logic at run time is removed and delegated to the compiler. The checks which in a superscalar design must be completed within nanoseconds can be performed in seconds, and if on a multi-core machine and using a multithreaded compiler, by multiple processors in parallel. Explicitly Parallel Instruction Computing (EPIC) is like VLIW, with extra cache prefetching instructions. A diagram of the operation of a typical multi-language, multi-target compiler. ... For the form of code consisting entirely of subroutine calls, see Threaded code. ... Explicitly Parallel Instruction Computing (EPIC) is a computing paradigm that began to be researched in the 1990s. ...


Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. ...


Superscalar processors differ from multi-core processors in that the redundant functional units are not entire processors. A single processor is composed of finer-grained functional units such as the ALU, integer multiplier, integer shifter, floating point unit, etc. There may be multiple versions of each functional unit to enable execution of many instructions in parallel. This differs from a multicore CPU that concurrently processes instructions from multiple threads, one thread per core. It also differs from a pipelined CPU, where the multiple instructions can concurrently be in various stages of execution, assembly-line fashion. Diagram of an Intel Core 2 dual core processor, with CPU-local Level 1 caches, and a shared, on-die Level 2 cache. ... A typical schematic symbol for an ALU: A & B are operands; R is the output; F is the input from the Control Unit; D is an output status In computing, an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. ... In computer science, the term integer is used to refer to any data type which can represent some subset of the mathematical integers. ... In digital design, a multiplier or multiplication ALU is a hardware circuit dedicated to multiplying two binary values. ... A floating point unit (FPU) is a part of a computer system specially designed to carry out operations on floating point numbers. ... A multicore processor is a chip with more than one processing units (cores). ... Instruction pipelining is a method for increasing the throughput of a digital circuit, particularly a CPU, and implements a form of instruction level parallelism. ... Modern car assembly line. ...


The various alternative techniques are not mutually exclusive—they can be (and frequently are) combined in a single processor. Thus a multicore CPU is possible where each core is an independent processor containing multiple parallel pipelines, each pipeline being superscalar. Some processors also include vector capability. Processor board of a CRAY YMP vector computer A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. ...


See also

Super-threading is a form of simultaneous multithreading (SMT), similar in design to hyper-threading. ... Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. ... In computer science, speculative execution is the execution of code whose result may not actually be needed. ... Eager evaluation or strict evaluation is the evaluation strategy in most traditional programming languages. ... In multiprocessor computer systems, software lockout is the issue of performance degradation due to the idle wait times spent by the CPUs in kernel-level critical sections. ...

References

  • Mike Johnson, Superscalar Microprocessor Design, Prentice-Hall, 1991, ISBN 0-13-875634-1
  • Sorin Cotofana, Stamatis Vassiliadis, "On the Design Complexity of the Issue Logic of Superscalar Machines", EUROMICRO 1998: 10277-10284
  • Steven McGeady, "The 1960CA SuperScalar Implementation of the 80960 Architecture", IEEE 1990, pp. 232-240
  • Steven McGeady, et al., "Performance Enhancements in the Superscalar i960MM Embedded Microprocessor," ACM Proceedings of the 1991 Conference on Computer Architecture (Compcon), 1991, pp. 4-7

Dr. William Michael (Mike) Johnson is a technologist, and pioneer in superscalar microprocessor design. ... EUROMICRO is an international scientific, engineering and educational organization dedicated to advancing the arts, sciences and applications of information technology and microelectronics. ... Steven McGeady is a former Intel executive best known as a witness in the Microsoft Antitrust Trial. ... Steven McGeady is a former Intel executive best known as a witness in the Microsoft Antitrust Trial. ...

External links

  • http://www.cs.clemson.edu/~mark/eager.html

  Results from FactBites:
 
Superscalar (455 words)
A superscalar CPU architecture implements a form of parallelism on a single chip, thereby allowing the system as a whole to run much faster than it would otherwise be able to at a given clock speed.
In a superscalar CPU several functional units of the same type are included, along with additional circuitry to dispatch instructions to the units.
Superscalar systems were originally implemented on RISC CPU's.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m