FACTOID # 8: Bookworms: Vermont has the highest number of high school teachers per capita and third highest number of librarians per capita.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Streaming SIMD Extensions

SSE (Streaming SIMD Extensions, originally called ISSE, Internet Streaming SIMD Extensions) is a SIMD (Single Instruction, Multiple Data) instruction set designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! (which had debuted a year earlier). The fully expanded abbreviation stands for "Streaming Single Instruction, Multiple Data Extensions".-1... -1... An instruction set is (a list of) all instructions, and all their variations, that a processor can execute. ... Intel Corporation (NASDAQ: INTC, SEHK: 4335), founded in 1968 as Integrated Electronics Corporation, is an American multinational corporation that is best known for designing and manufacturing microprocessors and specialized integrated circuits. ... Pentium III logo The Pentium III is an x86 (more precisely, an i686) architecture microprocessor by Intel, introduced on February 26, 1999. ... Advanced Micro Devices, Inc. ... The first 3DNow! CPU 3DNow! is the name of a multimedia extension created by AMD for its processors, starting with the K6-2 in 1998. ...


SSE contains 70 new instructions.


It was originally known as KNI for Katmai New Instructions (Katmai was the code name for the first Pentium III core revision). During the Katmai project Intel was looking to distinguish it from their earlier product line, particularly their flagship Pentium II. AMD eventually added support for SSE instructions, starting with its Athlon XP processor. Pentium III logo The Pentium III is an x86 (more precisely, an i686) architecture microprocessor by Intel, introduced on February 26, 1999. ... Intel Pentium II Logo The Pentium II is an x86 architecture microprocessor by Intel, introduced on May 7, 1997. ... Athlon is the brand name applied to a series of different x86 processors designed and manufactured by AMD. The original Athlon, or Athlon Classic was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intels competing processors for a significant...


Intel was generally disappointed with their first IA-32 SIMD effort, MMX. MMX had two main problems: it re-used existing floating point registers making the CPU unable to work on both floating point and SIMD data at the same time, and it only worked on integers. It has been suggested that this article or section be merged with X86 assembly language. ... MMX is a SIMD instruction set designed by Intel, introduced in 1997 in their Pentium MMX microprocessors. ... A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. ... “CPU” redirects here. ... A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. ... The integers consist of the positive natural numbers (1, 2, 3, …) the negative natural numbers (−1, −2, −3, ...) and the number zero. ...


SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The x64 extensions from both Intel and AMD add a further eight registers XMM8 through XMM15. There is also a new 32-bit control / status register, MXCSR.

Each register packs together four 32-bit single-precision floating point numbers. Integer SIMD operations may still be performed with the eight 64-bit MMX registers. Image File history File links Size of this preview: 600 × 600 pixels Full resolution (1024 × 1024 pixel, file size: 30 KB, MIME type: image/png) The 8 new 128-bit XMM registers that appeard in the Intels SEE in Pentium III. File links The following pages on the English... It has been suggested that this article or section be merged into IEEE floating-point standard. ... -1...


Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems. An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. ... Referrs to math-related instruction subset of Intel X86 family line of processors. ...


Because SSE adds floating point support, it sees much more use than MMX. The addition of SSE2's integer support makes SSE even more flexible. While MMX is redundant, operations can be operated in parallel with SSE operations offering further performance increases in some situations.


The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the FPU. While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue a FPU and a SSE instruction in the same clock-cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching. A floating point unit (FPU) is a part of a computer system specially designed to carry out operations on floating point numbers. ...

Contents

SSE Instructions

  • SSE introduced both scalar and packed floating point instructions.

Floating point instructions

  • Memory-to-Register / Register-to-Memory / Register-to-Register data movement
 *Scalar - MOVSS *Packed - MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS 
  • Arithmetic
 *Scalar - ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS *Packed - ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS 
  • Compare
 *Scalar - CMPSS, COMISS, UCOMISS *Packed - CMPPS 
  • Data shuffle and unpacking
 *Packed - SHUFPS, UNPCKHPS, UNPCKLPS 
  • Data-type conversion
 *Scalar - CVTSI2SS, CVTSS2SI, CVTTSS2SI *Packed - CVTPI2PS, CVTPS2PI, CVTTPS2PI 
  • Bitwise logical operations
 *Packed - ANDPS, ORPS, XORPS, ANDNPS 

Integer instructions

  • Arithmetic
 *PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW 
  • Data movement
 *PEXTRW, PINSRW 
  • Other
 *PMOVMSKB, PSHUFW 

Other instructions

  • MXCSR management
 *LDMXCSR, STMXCSR 
  • Cache and Memory management
 *MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE 

Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, 4-component vectors together using x87 requires four floating point addition instructions

 vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;

This would correspond to four x87 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128 bit 'packed-add' instruction can replace the four scalar addition instructions.

 movaps xmm0,address-of-v1 ;xmm0=v1.w | v1.z | v1.y | v1.x 
addps xmm0,address-of-v2 ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x movaps address-of-vec_res,xmm0

Later versions

  • SSE2, introduced with the Pentium 4, is a major enhancement to SSE (which some programmers renamed "SSE1"). SSE2 adds new math instructions for double-precision (64-bit) floating point and also extends MMX instructions to operate on 128-bit XMM registers. Until SSE4, SSE integer instructions introduced with later SSE extensions would still operate on 64-bit MMX registers because the new XMM registers require operating system support. SSE2 enables the programmer to perform SIMD math of virtually any type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to touch the (legacy) MMX/FPU registers. Many programmers consider SSE2 to be "everything SSE should have been", as SSE2 offers an orthogonal set of instructions for dealing with common datatypes.
  • SSE3 called Prescott New Instructions, is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions.
  • SSSE3 is an incremental upgrade to SSE3, adding 16 new opcodes which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture.

SSE2, Streaming Single Instruction, Multiple Data Extensions 2, is one of the IA-32 SIMD instruction sets, first introduced by Intel with the initial version of the Pentium 4 in 2001. ... The Pentium 4[1] brand refers to Intels mainstream desktop and mobile single-core CPUs (introduced on November 20, 2000[2]) with the seventh-generation NetBurst architecture, which was the companys first all-new design since the Intel P6 of the Pentium Pro branded CPUs of 1995. ... In computing, double precision is a computer numbering format that occupies two storage locations in computer memory at address and address+1. ... SSE3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 architecture. ... Supplemental Streaming SIMD Extension 3 (SSSE3)[1] is Intels name for the SSE instruction sets fourth iteration, as they appear to consider it merely a revision of SSE3. ... Microprocessors perform operations using binary bits (on/off/1or0). ... SSE4, also known by its Intel code name Tejas New Instructions (TNI), is the fourth iteration of the SSE instruction set. ... In computer engineering, microarchitecture (sometime abbreviated to µarch or uarch) is a description of the electrical circuitry of a computer, central processing unit, or digital signal processor that is sufficient for completely describing the operation of the hardware. ... SSE4, also known by its Intel code name Tejas New Instructions (TNI), is the fourth iteration of the SSE instruction set. ... The Intel Core microarchitecture (previously known as the Intel Next-Generation Micro-Architecture, or NGMA) is a multi-core processor microarchitecture unveiled by Intel in Q1 2006. ... The SSE5 (short for Streaming SIMD Extensions 5), announced on August 30, 2007, is a new 128-bit extension to the AMD64 instruction set (itself a 64-bit extension to the 32-bit Intel x86 instruction set) for the AMD Bulldozer processor, due to begin production in 2009. ... Advanced Micro Devices, Inc. ...

References

  1. ^ http://arstechnica.com/news.ars/post/20070328-intel-spills-beans-on-core-2-successor-sse4-faster-virtualization-bigger-caches.html
  2. ^ http://www.theregister.co.uk/2007/08/30/amd_sse5/
  3. ^ http://developer.amd.com/sse5.jsp

See also


  Results from FactBites:
 
Streaming SIMD Extensions - Wikipedia, the free encyclopedia (435 words)
SSE (Streaming SIMD Extensions) is a SIMD (Single Instruction, Multiple Data) instruction set designed by Intel, and introduced in their Pentium III series processors as a reply to AMD's 3DNow!
Integer SIMD operations may still be performed with the eight 64-bit MMX registers.
On the Pentium 3, however, SSE is implemented using the same circuitry as the FPU, meaning that, once again, the CPU cannot issue both FPU and SSE instructions at the same time for pipelining.
Streaming SIMD Extensions - definition of Streaming SIMD Extensions in Encyclopedia (464 words)
SSE (Streaming SIMD Extensions) is a SIMD instruction set designed by Intel, and introduced in their Pentium III series processors as a reply to AMD's 3DNow!
Oddly, however, SSE is implemented using the same circuitry as the FPU, meaning that, once again, the CPU cannot issue both FPU and SSE instructions at the same time for pipelining.
This was the case with the implementation of SSE inside the Pentium 3 microprocessor, other processors do not necessarily suffer from this problem.
  More results at FactBites »

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m