FACTOID # 24: Looking for table makers? Head to Mississippi, with an overwhlemingly large number of employees in furniture manufacturing.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Vector processor
Processor board of a CRAY YMP vector computer
Processor board of a CRAY YMP vector computer

A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor which handles one element at a time. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputers through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU. Image File history File linksMetadata Download high-resolution version (2256x1412, 701 KB) File links The following pages on the English Wikipedia link to this file (pages on other projects are not listed): Supercomputer Vector processor Metadata This file contains additional information, probably added from the digital camera or scanner used... Image File history File linksMetadata Download high-resolution version (2256x1412, 701 KB) File links The following pages on the English Wikipedia link to this file (pages on other projects are not listed): Supercomputer Vector processor Metadata This file contains additional information, probably added from the digital camera or scanner used... “CPU” redirects here. ... Scalar processors represent the simplest class of computer processors. ... Scientific computing (or computational science) is the field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific and engineering problems. ... A supercomputer is a computer that led the world (or was close to doing so) in terms of processing capacity, particularly speed of calculation, at the time of its introduction. ...


Today most commodity CPU designs include some vector processing instructions, typically known as SIMD (Single Instruction, Multiple Data), common examples include SSE and AltiVec. Modern video game consoles and consumer computer-graphics hardware rely heavily on vector processing in their architecture. In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3.-1... SSE (Streaming SIMD Extensions, originally called ISSE, Internet Streaming SIMD Extensions) is a SIMD (Single Instruction, Multiple Data) instruction set designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMDs 3DNow! (which had debuted a year earlier). ... AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Computer, IBM and Motorola (the AIM alliance), and implemented on versions of the PowerPC including Motorolas G4 and IBMs G5 processors. ... The Nintendo GameCube is an example of a popular video game console. ... A graphics/video/display card/board/adapter is a computer component designed to convert the logical representation of visual information into a signal that can be used as input for a display medium. ... For other uses, see IBM (disambiguation) and Big Blue. ... Toshiba Corporations headquarters (Center) in Hamamatsucho, Tokyo Toshiba Corporation sales by division for year ending March 31, 2005 Toshiba Corporation ) (TYO: 6502 ) is a Japanese multinational conglomerate manufacturing company, headquartered in Tokyo, Japan. ... Sony Corporation ) is a Japanese multinational corporation and one of the worlds largest media conglomerates with revenue of $66. ... Layout of the IBM Cell die Cell is a microprocessor architecture jointly developed by a Sony, Toshiba, and IBM, an alliance known as STI. The architectural design and first implementation were carried out at the STI Design Center over a four-year period beginning March 2001 on a budget reported...


History

Vector processing was first worked on in the early 1960s at Westinghouse in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple math co-processors (or ALUs) under the control of a single master CPU. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois as the ILLIAC IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing. Westinghouse logo (designed by Paul Rand) The Westinghouse Electric Company, headquartered in Monroeville, Pennsylvania, is an organization founded by George Westinghouse in 1886. ... This article or section does not cite its references or sources. ... A typical schematic symbol for an ALU: A & B are operands; R is the output; F is the input from the Control Unit; D is an output status In computing, an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. ... “CPU” redirects here. ... In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will terminate in a defined end-state. ... A data set (or dataset) is a collection of data, usually presented in tabular form. ... A Corner of Main Quad The University of Illinois at Urbana-Champaign (UIUC, U of I, or simply Illinois), is the oldest, largest, and most prestigious campus in the University of Illinois system. ... The ILLIAC IV was one of the most infamous supercomputers ever, destined to be the last in a series of research machines from the University of Illinois. ... For other uses, see Flop. ... A computer simulation of high velocity air flow around the Space Shuttle during re-entry. ... Massively parallel is a description which appears in computer science, life science, medical diagnositcs, and other fields. ...


The first successful implementation of vector processing appears to be the CDC STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up. The STAR-100 was a supercomputer from Control Data Corporation, one of the first machines to use a vector processor for improved math performance. ... Texas Instruments (NYSE: TXN), better known in the electronics industry (and popularly) as TI, is an American company based in Dallas, Texas, USA, renowned for developing and commercializing semiconductor and computer technology. ... The Advanced Scientific Computer, or ASC, was a supercomputer architecture designed by Texas Instruments (TI) between 1966 and 1973. ... Control Data Corporation (CDC), was one of the pioneering supercomputer firms. ... The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Datas dominance of the supercomputer field into the 1970s. ...


The vector technique was first fully exploited in the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight "vector registers" which held sixty-four 64-bit words each. The vector instructions were applied between registers, which is much faster than talking to main memory. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today (2002). CRAY-1 at the EPFL in Switzerland. ... In computer architecture, a processor register is a small amount of very fast computer memory used to speed the execution of computer programs by providing quick access to frequently used values—typically, these values are involved in multiple expression evaluations occurring within a small region on the program. ... In computing, a 64-bit component is one in which data are processed or stored in 64-bit units (words). ...


Other examples followed. CDC tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi and NEC) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array processors for minicomputers, later building their own minisupercomputers. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognizing the benefits of vector processing IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar processors to act as a vector processor. Control Data Corporation (CDC), was one of the pioneering supercomputer firms. ... An ETA-10 supercomputer installation The ETA-10 was a line of supercomputers manufactured by ETA Systems (a spin-off division of CDC) in the 1980s and which implemented the instruction set of the CDC Cyber 205. ... For the district in Saga, Japan, see Fujitsu, Saga. ... It has been suggested that Hitachi Works be merged into this article or section. ... NEC Corporation is a multi-national information technologies company headquarterd in Minato-ku, Tokyo, Japan. ... Official language(s) (none)[1] Capital Salem Largest city Portland Area  Ranked 9th  - Total 98,466 sq mi (255,026 km²)  - Width 260 miles (420 km)  - Length 360 miles (580 km)  - % water 2. ... Floating Point Systems Inc. ... Minicomputer (colloquially, mini) is a largely obsolete term for a class of multi-user computers which make up the middle range of the computing spectrum, in between the largest multi-user systems (traditionally, mainframe computers) and the smallest single-user systems (microcomputers or personal computers). ... Minisupercomputers constituted a class of computers that emerged in the mid-1980s. ... The Cray-2 is in the left foreground. ... The Cray X-MP was a supercomputer designed, built and sold by Cray Research. ... The Cray Y-MP was a supercomputer sold by Cray Research from 1988, and the successor to the companys X-MP. The Y-MP retained software compatibility with the X-MP, but extended the address registers from 24 to 32 bits. ... Massively parallel is a description which appears in computer science, life science, medical diagnositcs, and other fields. ... ViVA (Virtual Vector Architecture) is a technology from IBM for coupling together multiple scalar floating point units to act as a single vector processor. ...


Today the average computer at home crunches as much data watching a short QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD. In these implementations the vector processor runs beside the main scalar CPU, and is fed data from programs that know it is there. QuickTime is a multimedia framework developed by Apple Inc. ... CPU can stand for: in computing: Central processing unit in journalism: Commonwealth Press Union in law enforcement: Crime prevention unit in software: Critical patch update, a type of software patch distributed by Oracle Corporation in Macleans College is often known as Ash Lim. ... -1... In computing, a scalar is a variable or field that can hold only one value at a time; as opposed to composite variables like array, list, record, etc. ... CPU can stand for: in computing: Central processing unit in journalism: Commonwealth Press Union in law enforcement: Crime prevention unit in software: Critical patch update, a type of software patch distributed by Oracle Corporation in Macleans College is often known as Ash Lim. ...


Description

In general terms, CPUs are able to manipulate one or two pieces of data at a time. For instance, many CPU's have an instruction that essentially says "add A to B and put the result in C," while others such as the MOS 6502 require two or three instructions to perform these types of operations. The MOS Technology 6502 is an 8-bit microprocessor designed by MOS Technology in 1975. ...


The data for A, B and C could be—in theory at least—encoded directly into the instruction. However things are rarely that simple. In general the data is rarely sent in raw form, and is instead "pointed to" by passing in an address to a memory location that holds the data. Decoding this address and getting the data out of the memory takes some time. As CPU speeds have increased, this memory latency has historically become a large impediment to performance. In computing, memory latency is the time between initiating a request for a byte or word in memory until it is retrieved. ...


In order to reduce the amount of time this takes, most modern CPUs use a technique known as instruction pipelining in which the instructions pass through several sub-units in turn. The first sub-unit reads the address and decodes it, the next "fetches" the values at those addresses, and the next does the math itself. With pipelining the "trick" is to start decoding the next instruction even before the first has left the CPU, in the fashion of an assembly line, so the address decoder is constantly in use. Any particular instruction takes the same amount of time to complete, a time known as the latency, but the CPU can process an entire batch of operations much faster than if it did so one at a time. Instruction pipelining is a method for increasing the throughput of a digital circuit, particularly a CPU, and implements a form of instruction level parallelism. ... Modern car assembly line. ... Latency is a time delay between the moment something is initiated, and the moment one of its effects begins. ...


Vector processors take this concept one step further. Instead of pipelining just the instructions, they also pipeline the data itself. They are fed instructions that say not just to add A to B, but to add all of the numbers "from here to here" to all of the numbers "from there to there". Instead of constantly having to decode instructions and then fetch the data needed to complete them, it reads a single instruction from memory, and "knows" that the next address will be one larger than the last. This allows for significant savings in decoding time.


To illustrate what a difference this can make, consider the simple task of adding two groups of 10 numbers together. In a normal programming language you would write a "loop" that picked up each of the pairs of numbers in turn, and then added them. To the CPU, this would look something like this:

 read the next instruction and decode it fetch this number fetch that number add them put the result here read the next instruction and decode it fetch this number fetch that number add them put the result there 

and so on, repeating the base command 10 times over.


But to a vector processor, this task looks considerably different:

 read instruction and decode it fetch these 10 numbers fetch those 10 numbers add them put the results here 

There are several savings inherent in this approach. For one, only two address translations are needed. Depending on the architecture, this can represent a significant savings in of itself. Another savings is fetching and decoding the instruction itself, which only has to be done one time instead of ten. The code itself is also smaller, which can lead to more efficient memory use.


But more than that, the vector processor typically has some form of superscalar implementation, meaning there is not one part of the CPU adding up those 10 numbers, but perhaps two or four of them. Since the output of a vector command does not rely on the input from any other, those two (for instance) parts can each add five of the numbers, thereby completing the whole operation in half the time. Simple superscalar pipeline. ...


As mentioned earlier, the Cray implementations took this a step further, allowing several different types of operations to be carried out at the same time. Consider code that adds two numbers and then multiplies by a third; in the Cray these would all be fetched at once, and both added and multiplied in a single operation. Using the pseudocode above, the Cray essentially did:

 read instruction and decode it fetch these 10 numbers fetch those 10 numbers fetch another 10 numbers add and multiply them put the results here 

The math operations thus completed much faster, the limiting factor being the memory accesses.


Not all problems can be attacked with this sort of solution. Adding these sorts of instructions adds complexity to the core CPU. That complexity typically makes other instructions slower — ie, whenever it is not adding up ten numbers in a row. The more complex instructions also add to the complexity of the decoders, which might slow down the decoding of the more common instructions like normal adding.


In fact they work best only when you have large amounts of data to work on. This is why these sorts of CPUs were found primarily in supercomputers, as the supercomputers themselves were found in places like weather prediction and physics labs, where huge amounts of data exactly like this is "crunched". A supercomputer is a computer that led the world (or was close to doing so) in terms of processing capacity, particularly speed of calculation, at the time of its introduction. ...


  Results from FactBites:
 
Vector processor capable of performing iterative processing - Patent 4757444 (3449 words)
There is provided a vector processor based on a pipeline control method in which a cyclic operation is divided into a plurality of stages and processed.
This processor comprises a vector register controller for dividing an operating process into a plurality of fundamental process units and controlling these units, and a phase generator for allowing the vector register controller to time-sharingly make the vector processor operative.
Recent vector processors adopt the pipeline control method and due to this control, the operation process is divided into conceptual data processing steps referred to as stages and data which should be operated on is inputted to the processor without interruption, thereby realizing a high operating speed.
Vector processor - Patent 4725973 (2871 words)
A vector processor for executing vector instructions comprises a plurality of vector registers and a plurality of pipeline arithmetic logic units.
A vector processor according to claim 4, wherein said data transfer means includes a first selector connecting said outputs of said vector registers to inputs of said arithmetic logic units, and a second selector connecting outputs of said arithmetic logic units to inputs of said vector registers.
One such vector processor has vector registers for improving the operational data transfer performance so that a plurality of pipeline-type arithmetic logic units included in the vector processor are effectively operated concurrently at a high speed.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m