FACTOID # 1: Idaho produces more milk than Iowa, Indiana and Illinois combined.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Multithreading (computer hardware)

Multithreading computers have hardware support to efficiently execute multiple threads. For the form of code consisting entirely of subroutine calls, see Threaded code. ...

Contents

Overview

The Multithreading paradigm has become more popular as efforts to further exploit instruction level parallelism have stalled since the late-1990s. This allowed the concept of Throughput Computing to reemerge to prominence from the more specialized field of transaction processing: Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be dealt with at once. ... In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state. ...

  • Even though it is very difficult to further speed up a single thread or single program, most computer systems are actually multi-tasking among multiple threads or programs.
  • Techniques that would allow speedup of the overall system throughput of all tasks would be a meaningful performance gain.

The two major techniques for throughput computing are multiprocessing and multithreading. Multiprocessing is traditionally known as the use of multiple concurrent processes in a system as opposed to a single process at any one instant. ...


Some criticism of multithreading include:

  • Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs).
  • Execution times of a single-thread are not improved but can be degraded.
  • Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.

Hardware techniques used to support multithreading often parallel the software techniques used for computer multitasking of computer programs. Look up cache in Wiktionary, the free dictionary. ... A Translation Lookaside Buffer (TLB) is a cache in a CPU that is used to improve the speed of virtual address translation. ... For the form of code consisting entirely of subroutine calls, see Threaded code. ... In computing, multitasking is a method by which multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is...


Block Multi-threading

Concept

The simplest type of multi-threading is where one thread runs until it is blocked by an event that normally would create a long latency stall. Such a stall might be a cache-miss that has to access off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting for the stall to resolve, a threaded processor would switch execution to another thread that was ready to run. Only when the data for the previous thread had arrived, would the previous thread be placed back on the list of ready-to-run threads.


For example:

  1. Cycle i  : instruction j from thread A is issued
  2. Cycle i+1: instruction j+1 from thread A is issued
  3. Cycle i+2: instruction j+2 from thread A is issued, load instruction which misses in all caches
  4. Cycle i+3: thread scheduler invoked, switches to thread B
  5. Cycle i+4: instruction k from thread B is issued
  6. Cycle i+5: instruction k+1 from thread B is issued

Conceptually, it is similar to cooperative multi-tasking used in real-time operating systems in which tasks voluntarily give up execution time when they need to wait upon some type of event. A Real Time Operating System or RTOS is an operating system that has been developed for real-time applications. ...


Terminology

This type of multithreading is known as Block or Cooperative or Coarse-grained multithreading.


Hardware Cost

The goal of multithreading hardware support is to allow quick switching between a blocked thread and another thread ready to run. To achieve this goal, the hardware cost is to replicate the program visible registers as well as some processor control registers (such as the program counter). Switching from one thread to another thread means the hardware switches from using one register set to another.


Such additional hardware has these benefit:

  • The thread switch can be done in one CPU cycle.
  • It appears to each thread that they are executing alone and not sharing any hardware resources with any other threads. This minimizes the amount of software changes needed within the application as well as the operating system to support multithreading.

In order to switch efficiently between active threads, each active thread needs to have its own register set. For example, to quickly switch between two threads, the register hardware needs to be instantiated twice.


Examples

  • Many families of microcontrollers and embedded processors have multiple register banks to allow quick context switching for interrupts. Such schemes can be considered a type of block multithreading among the user program thread and the interrupt threads.

It has been suggested that this article or section be merged with embedded microprocessor. ... A context switch is the computing process of storing and restoring the state (context) of a CPU such that multiple processes can share a single CPU resource. ...

Interleaved Multi-threading

See article: barrel processor A barrel processor is a CPU that switches between threads of execution on every cycle. ...


Concept

A higher performance type of multithreading is where the processor switches threads every CPU cycle. For example:

  1. Cycle i  : an instruction from thread A is issued
  2. Cycle i+1: an instruction from thread B is issued
  3. Cycle i+2: an instruction from thread C is issued

The purpose of this type of multithreading is to remove all data dependency stalls from the execution pipeline. Since one thread is relatively independent from other threads, there's less chance of one instruction in one pipe stage needing an output from an older instruction in the pipeline. A data dependency in computer science is a situation whereby computer instructions refer to the results of preceding instructions that have not yet been completed. ...


Conceptually, it is similar to pre-exemptive multi-tasking used in operating systems. One can make the analogy that the time-slice given to each active thread is one CPU cycle. Pre-emption as used with respect to operating systems means the ability of the operating system to preempt or stop a currently scheduled task in favour of a higher priority task. ...


Terminology

This type of multithreading was first called Barrel processing, in which the staves of a barrel represent the pipeline stages and their executing threads. Interleaved or Pre-emptive or Fine-grained or time-sliced multithreading are more modern terminology.


Hardware Costs

In addition to the hardware costs discussed in the Block type of multithreading, interleaved multithreading has an additional cost of each pipeline stage tracking the thread ID of the instruction it is processing. Also, since there are more threads being executed concurrently in the pipeline, shared resources such as caches and TLBs need to larger to avoid thrashing between the different threads.


Examples

The Heterogeneous Element Processor (HEP) was introduced by Denelcor in 1982 as the worlds first commercial MIMD computer. ... Intel Corporation (NASDAQ: INTC, SEHK: 4335), founded in 1968 as Integrated Electronics Corporation, is an American multinational corporation that is best known for designing and manufacturing microprocessors and specialized integrated circuits. ... Super-threading is a form of simultaneous multithreading (SMT), similar in design to hyper-threading. ... Sun Microsystems (Sun Microsystems, Inc. ... Sun Microsystems UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename Niagara , is a multithreading, multicore CPU. Designed to lower the energy consumption of server computers, the CPU uses typically 72 W of power at 1. ... Lexra, based in Waltham, Massachusetts, was founded in 1997 and began developing and licensing semiconductor intellectual property (SIP) cores that implemented the MIPS-I instruction set, except for the four unaligned load and store (lwl, lwr, swl, swr) instructions. ... A MIPS R4400 microprocessor made by Toshiba. ...

Simultaneous Multi-threading

See main article Simultaneous multithreading Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of the hardware that executes instructions in a computer. ...


Concept

The most advanced type of multi-threading applies to superscalar processors. A normal superscalar processor issues multiple instructions from a single thread every CPU cycle. In Simultaneous Multi-threading (SMT), the superscalar processor can issue instructions from multiple threads every CPU cycle. Recognizing that any single thread has a limited amount of instruction level parallelism, this type of multithreading is trying to exploit parallelism available across multiple threads to decrease the waste associated with unused issue slots. Simple superscalar pipeline. ... Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be dealt with at once. ...


For example:

  1. Cycle i  : instructions j and j+1 from thread A; instruction k from thread B all simultaneously issued
  2. Cycle i+1: instruction j+2 from thread A; instruction k+1 from thread B; instruction m from thread C all simultaneously issued
  3. Cycle i+2: instruction j+3 from thread A; instructions m+1 and m+2 from thread C all simultaneously issued

Terminology

To distinguish the other flavors of multithreading from SMT, the term Temporal multithreading is used to denote when only one thread can be issued at a time. Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other form being Simultaneous multithreading. ...


Hardware Costs

In addition to the hardware costs discussed for interleaved multithreading, SMT has the additional cost of each pipeline stage tracking the Thread ID of each instruction being processed. Again, shared resources such as caches and TLBs have to be sized for the large number of active threads.


Examples

DEC Alpha AXP 21064 Microprocessor The DEC Alpha, also known as the Alpha AXP, is a 64-bit RISC microprocessor originally developed and fabricated by Digital Equipment Corp. ... Intel Corporation (NASDAQ: INTC, SEHK: 4335), founded in 1968 as Integrated Electronics Corporation, is an American multinational corporation that is best known for designing and manufacturing microprocessors and specialized integrated circuits. ... Hyper-Threading (HTT = Hyper Threading Technology) is Intels trademark for their implementation of the simultaneous multithreading technology on the Pentium 4 microarchitecture. ... For other uses, see IBM (disambiguation) and Big Blue. ... POWER5 dual-MCM POWER5 quad-MCM POWER5 is a microprocessor developed by IBM. It is an improved variant of the highly successful POWER4. ... Layout of the IBM Cell die Cell is a microprocessor architecture jointly developed by a Sony, Toshiba, and IBM, an alliance known as STI. The architectural design and first implementation were carried out at the STI Design Center over a four-year period beginning March 2001 on a budget reported... Sun Microsystems (Sun Microsystems, Inc. ... Sun Microsystems UltraSPARC T2 microprocessor, is a multithreading, multicore CPU. The UltraSPARC T2s predecessor was the UltraSPARC T1. ...

Implementation Specifics

A major area of research is the thread scheduler which must quickly choose among the list of ready-to-run threads to execute next as well as maintain the read-to-run and stalled thread lists. An important sub-topic are the different thread priority schemes that can be used by the scheduler. The thread scheduler might be implemented totally in software or totally in hardware or as a hw/sw combination.


Another area of research is what type of events should cause a thread switch - cache misses, inter-thread communication, DMA completion, etc. Direct memory access (DMA) is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory for reading and/or writing independently of the central processing unit. ...


If the multithreading scheme replicates all software visible state, include privileged control registers, TLBs, etc., then it enables virtual machines to be created for each thread. This allows each thread to run its own operating system on the same processor. On the other hand, if only user-mode state is saved, less hardware is required which would allow for more threads to be active at one time for the same die-area/cost. In computer science, a virtual machine is software that creates a virtualized environment between the computer platform and its operating system, so that the end user can operate software on an abstract machine. ...

Topics in parallel computing  v  d  e 
General High-performance computing
Parallelism Data parallelismTask parallelism
Theory SpeedupAmdahl's lawFlynn's taxonomy (SISD, SIMD, MISD, MIMD) • Cost efficiencyGustafson's law • Karp-Flatt metric
Elements ProcessThreadFiberParallel Random Access Machine
Coordination MultiprocessingMultithreadingMultitaskingMemory coherencyCache coherencyBarrierSynchronizationDistributed computingGrid computing
Programming Programming modelImplicit parallelismExplicit parallelism
Hardware Computer clusterBeowulfSymmetric multiprocessing • Non-Uniform Memory Access • Cache only memory architectureAsymmetric multiprocessingSimultaneous multithreadingShared memoryDistributed memoryMassively parallel processingSuperscalar processingVector processingSupercomputerStream processingGPGPU
Software Distributed shared memoryApplication checkpointingWarewulf
APIs POSIX ThreadsOpenMPMessage Passing Interface (MPI)
Problems Embarrassingly parallelGrand Challenge • Software lockout

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m