FACTOID # 27: If you're itching to live in a trailer park, hitch up your home and head to South Carolina, where a whopping 18% of residences are mobile homes.
 
 Home   Encyclopedia   Statistics   States A-Z   Flags   Maps   FAQ   About 
   
 
WHAT'S NEW
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Speech Application Programming Interface

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date a number of versions of the API have been released, which have shipped either as part of a Speech SDK, or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server. ... API and Api redirect here. ... Microsoft Corporation, (NASDAQ: MSFT, HKSE: 4338) is a multinational computer technology corporation with global annual revenue of US$44. ... Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as Voice Recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. ... Speech synthesis is the artificial production of human speech. ... “Windows” redirects here. ... A Software Development Kit, or SDK for short, is typically a set of development tools that allows a software engineer to create applications for a certain software package, software framework, hardware platform, computer system, operating system or similar. ... An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. ... Microsoft Office is an office suite from Microsoft for Microsoft Windows and Apple Mac OS X operating systems. ... Microsoft provides examples on its website for the use of Agent. ... The Microsoft Speech Server is a product from Microsoft designed to allow the authoring and deployment of IVR applications incorporating Speech Recognition, Speech Synthesis and DTMF. The product also has some limited support for multimodal applications running on IE on Windows and PocketPC devices. ...


In general all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages. In addition, it is possible for a 3rd-party company to produce their own Speech Recognition and Text-To-Speech engines or adapt existing engines to work with SAPI. In principle, as long as these engines conform to the defined interfaces they can be used instead of the Microsoft-supplied engines. Speech synthesis is the artificial production of human speech. ...


In general the Speech API is freely-redistributable component which can be shipped with any Windows application that wishes to use speech technology. Many versions (although not all) of the speech recognition and synthesis engines are also freely redistributable.


There have been two main 'families' of the Microsoft Speech API. SAPI versions 1 through 4 are all similar to each other, with extra features in each newer version. SAPI 5 however was a completely new interface, released in 2000. Since then several sub-versions of this API have been released. Year 2000 (MM) was a leap year starting on Saturday (link will display full 2000 Gregorian calendar). ...

Contents

Versions

SAPI 1-4 API family

SAPI 1

The first version of SAPI was released in 1995, and was supported on Windows 95 and Windows NT 3.51. This version included low-level Direct Speech Recognition and Direct Text To Speech APIs which apps could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Windows 95 is a consumer-oriented graphical user interface-based operating system. ... Windows NT 3. ...


SAPI 2

SAPI 2.0 was released in 1996.


SAPI 3

SAPI 3.0 was released in 1997. It added limited support for dictation speech recognition (discrete speech, not continuous), and additional sample apps and audio sources.


SAPI 4

SAPI 4.0 was released in 1998. This version of SAPI included both the core COM API; together with C++ wrapper classes to make programming from C++ easier; and ActiveX controls to allow drag-and-drop Visual Basic development. This was shipped as part of an SDK that included recognition and synthesis engines. It also shipped (with synthesis engines only) in Windows 2000. Component Object Model (COM) is a platform for software componentry introduced by Microsoft in 1993. ... C++ (pronounced see plus plus, IPA: ) is a general-purpose programming language with high-level and low-level capabilities. ... ActiveX is Microsoft technology used for developing reusable object oriented software components. ... This article is about the Visual Basic language shipping with Microsoft Visual Studio 6. ... Windows 2000 (also referred to as Win2K) is a preemptive, interruptible, graphical and business-oriented operating system that was designed to work with either uniprocessor or symmetric multi-processor 32-bit Intel x86 computers. ...


The main components of the SAPI 4 API (which were all available in C++, COM, and ActiveX flavors) were:

  • Voice Command - high-level objects for command & control speech recognition
  • Voice Dictation - high-level objects for continuous dictation speech recognition
  • Voice Talk - high-level objects for speech synthesis
  • Voice Telephony - objects for writing telephone speech applications
  • Direct Speech Recognition - objects for direct control of recognition engine
  • Direct Text To Speech - objects for direct control of synthesis engine
  • Audio objects - for reading to and from an audio device or file

halva


SAPI 5 API family

The Speech SDK version 5.0, incorporating the SAPI 5.0 runtime was released in 2000. This was a complete redesign from previous versions and neither engines nor applications which used older versions of SAPI could use the new version without considerable modification.


The design of the new API included the concept of strictly separating the application and engine so all calls were routed through the runtime sapi.dll. This change was intended to make the API more 'engine-independent', preventing applications from inadvertently depending on features of a specific engine. In addition this change was aimed at making it much easier to incorporate speech technology into an application by moving some management and initialization code into the runtime.


The new API was initially a pure COM API and could be used easily only from C/C++. Support for VB and scripting languages were added later. Operating systems from Windows 98 and NT 4.0 upwards were supported. Windows 98 (codenamed Memphis) is a graphical operating system released on June 25, 1998 by Microsoft and the successor to Windows 95. ... Windows NT 4. ...


Major features of the API include:

  • Shared Recognizer. For desktop speech recognition applications, a recognizer object can be used that runs in a separate process (sapisvr.exe). All applications using the shared recognizer communicate with this single instance. This allows sharing of resources, removes contention for the microphone and allows for a global UI for control of all speech apps.
  • In-proc recognizer. For apps that require explicit control of the recognition process the in-proc recognizer object can be used instead of the shared one.
  • Grammar objects. Speech grammars are used to specify the words that the recognizer is listening for. SAPI 5 defines an XML markup for specifying a grammar, as well as mechanisms to create them dynamically in code. Methods also exist for instructing the recognizer to load a built-in dictation language model.
  • Voice object. This performs speech synthesis, producing an audio stream from text. A markup language (similar to XML, but not strictly XML) can be used for controlling the synthesis process.
  • Audio interfaces. The runtime includes objects for performing speech input from the microphone or speech output to speakers (or any sound device); as well as to and from wave files. It is also possible to write a custom audio object to stream audio to or from a non-standard location.
  • User lexicon object. This allows custom words and pronunciations to be added by a user or application. These are added to the recognition or synthesis engine's built-in lexicons.
  • Object tokens. This is a concept allowing recognition and TTS engines, audio objects, lexicons and other categories of object to be registered, enumerated and instantiated in a common way.

The Extensible Markup Language (XML) is a general-purpose markup language. ...

SAPI 5.0

This version shipped in late 2000 as part of the Speech SDK version 5.0, together with version 5.0 recognition and synthesis engines. The recognition engines supported continuous dictation and command & control and were released in U.S. English, Japanese and Simplified chinese versions. In the U.S. English system, special acoustic models were available for children's speech and telephony speech. The synthesis engine was available in English and Chinese. This version of the API and recognition engines also shipped in Microsoft Office XP in 2001. This article or section does not adequately cite its references or sources. ...


SAPI 5.1

This version shipped in late 2001 as part of the Speech SDK version 5.1. Automation-compliant interfaces were added to the API to allow use from Visual Basic, scripting languages such as JScript, and managed code. This version of the API and TTS engines were shipped in Windows XP. This API was also shipped, together with a substantially improved version 6 recognition engine in Office 2003 and Windows XP Tablet PC Edition. JScript is the Microsoft implementation of the ECMAScript scripting programming language specification. ... In Microsoft Windows terminology, managed code is computer instructions — that is, code — executed by a CLI-compliant virtual machine, such as Microsofts . ... Windows XP is a line of operating systems developed by Microsoft for use on general-purpose computer systems, including home and business desktops, notebook computers, and media centers. ... A typical Windows XP desktop. ...


SAPI 5.2

This was a special version of the API for use only in the Microsoft Speech Server which shipped in 2004. It added support for SRGS and SSML mark-up languages, as well as additional server features and performance improvements. The Speech Server also shipped with the version 6 desktop recognition engine and the version 7 server recognition engine. The Microsoft Speech Server is a product from Microsoft designed to allow the authoring and deployment of IVR applications incorporating Speech Recognition, Speech Synthesis and DTMF. The product also has some limited support for multimodal applications running on IE on Windows and PocketPC devices. ... Speech Recognition Grammar Specification (SRGS) is an W3C recommendation that defines syntax for representing grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. ... Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. ...


SAPI 5.3

This is the version of the API that ships in Windows Vista together with new recognition and synthesis engines. As Windows Speech Recognition is now integrated into the operating system, the Speech SDK and APIs are a part of the Windows SDK. SAPI 5.3 includes the following new features: Windows Vista is a line of graphical operating systems used on personal computers, including home and business desktops, notebook computers, Tablet PCs, and media centers. ... Windows Speech Recognition in Sleep mode Windows Speech Recognition is a speech recognition application included in Windows Vista. ... The Microsoft Windows SDK (Software Development Kit) is a free software development kit from Microsoft that contains header files, libraries, samples, documentation and tools utilizing the APIs required to successfully develop applications for Microsoft Windows and the . ...

  • Support for W3C XML speech grammars for recognition and synthesis. The Speech Synthesis Markup Language (SSML) version 1.0 provides the ability to mark up voice characteristics, speed, volume, pitch, emphasis, and pronunciation.
  • The Speech Recognition Grammar Specification (SRGS) supports the definition of context-free grammars, with two limitations:
    • It does not support the use of SRGS to specify dual-tone modulated-frequency (touch-tone) grammars.
    • It does not support Augmented Backus–Naur form (ABNF).
  • Support for semantic interpretation script within grammars. SAPI 5.3 enables an SRGS grammar to be annotated with JavaScript for semantic interpretation to supplement the recognized text.
  • User-Specified shortcuts in lexicons, which is the ability to add a string to the lexicon and associate it with a shortcut word. When dictating, the user can say the shortcut word and the recognizer will return the expanded string.
  • Additional functionality and ease-of-programming provided by new types.
  • Performance improvements, improved reliability and security.
  • Version 8 of the speech recognition engine ("recognizer")

Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. ... Speech Recognition Grammar Specification (SRGS) is an W3C recommendation that defines syntax for representing grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. ... The augmented Backus–Naur form (ABNF) extends the Backus-Naur form. ... It has been suggested that Client-side JavaScript be merged into this article or section. ...

SAPI 5 Voices

Microsoft Sam is a commonly-shipped SAPI 5 voice. In addition, Microsoft Office XP and Office 2003 installed L&H Michael and Michelle voices. The SAPI 5.1 SDK installs 2 more voices, Mike and Mary. Windows Vista includes [[Microsoft Anna which replaces Microsoft Sam. Anna is designed to sound more natural and offer greater intelligibility. Several multilingual voices are also included in localized versions of Windows Vista. Microsoft Anna is also installed on Windows XP by Microsoft Streets & Trips 2006 and later versions. Microsoft Office is an office suite from Microsoft for Microsoft Windows and Apple Mac OS X operating systems. ... Lernout & Hauspie Speech Products, or L&H, was a Belgium-based speech and language technology leader company, which was founded by Jo Lernout and Pol Hauspie, and which went bankrupt in 2001. ... Windows Vista is a line of graphical operating systems used on personal computers, including home and business desktops, notebook computers, Tablet PCs, and media centers. ... In Microsoft Windows 2000 and Windows XP, Microsoft Sam is the name of the default voice for the screen reader program built into the operating system. ... Microsoft Streets & Trips is mapping software created by Microsoft. ...


Managed code Speech API

A managed code API (codenamed SpeechFX) ships as part of the .NET Framework 3.0. [1] It has similar functionality to SAPI 5 but is more suitable to be used by managed code applications. The new API is available on Windows XP, Windows Server 2003 and Windows Vista. In Microsoft Windows terminology, managed code is computer instructions — that is, code — executed by a CLI-compliant virtual machine, such as Microsofts . ... .NET Framework 3. ... Windows XP is a line of operating systems developed by Microsoft for use on general-purpose computer systems, including home and business desktops, notebook computers, and media centers. ... Windows Server 2003 is a server operating system produced by Microsoft. ... Windows Vista is a line of graphical operating systems used on personal computers, including home and business desktops, notebook computers, Tablet PCs, and media centers. ...


The existing SAPI 5 API can also be used from managed code to a limited extent by creating COM Interop code (helper code designed to assist in accessing COM interfaces and classes). This works well in some scenarios however the new API should provide a more seamless experience equivalent to using any other managed code library.


Speech functionality in Windows Vista

Windows Vista includes a number of new speech-related features including: Windows Vista is a line of graphical operating systems used on personal computers, including home and business desktops, notebook computers, Tablet PCs, and media centers. ...

  • Speech control of the full Windows GUI and applications
  • New tutorial, microphone wizard, and UI for controlling speech recognition
  • New version of the Speech API runtime: SAPI 5.3
  • Built-in updated Speech Recognition engine (Version 8)
  • New Speech Synthesis engine and SAPI voice Microsoft Anna
  • Managed code speech API (codenamed SpeechFX)
  • Speech recognition support for 8 languages at release time: U.S. English, U.K. English, traditional Chinese, simplified Chinese, Japanese, German, French and Spanish, with more language to be released later.
  • Microsoft Agent most notably, and all other Microsoft speech applications use SAPI 5.

“GUI” redirects here. ... Microsoft Anna is the default Text To Speech voice used in Windows Vista, the new version of Microsoft Windows. ... In Microsoft Windows terminology, managed code is computer instructions — that is, code — executed by a CLI-compliant virtual machine, such as Microsofts . ... Microsoft provides examples on its website for the use of Agent. ...

Compatibility

The Speech API is compatible with the following operating systems: [2]

Windows Vista is the scheduled next version of Microsoft Windows operating system, superseding Windows XP. It was previously known by its codename Longhorn, after the Longhorn Saloon, a popular bar (pub) in Whistler, British Columbia. ... A typical Windows XP desktop. ... Microsoft Windows 2000 (also referred to as Win2K or Windows NT 5. ... Windows Millennium Edition, or Windows Me (IPA pronunciation: [miː], [ɛm iː]), is a hybrid 16-bit/32-bit graphical operating system released on September 14, 2000 by Microsoft. ... Windows 98SE Desktop Windows 98 (codename Memphis) is a graphical operating system released on June 25, 1998 by the Microsoft Corporation. ... Windows NT is an operating system produced by Microsoft. ...

Major applications using SAPI

A typical Windows XP desktop. ... Windows Speech Recognition in Sleep mode Windows Speech Recognition is a speech recognition application included in Windows Vista. ... Windows Vista is a line of graphical operating systems used on personal computers, including home and business desktops, notebook computers, Tablet PCs, and media centers. ... Narrator in Windows XP Narrator is a light-duty screen reader utility packaged with Microsoft Windows 2000, Windows XP and Windows Vista. ... Microsoft Office is an office suite from Microsoft for Microsoft Windows and Apple Mac OS X operating systems. ... Microsoft provides examples on its website for the use of Agent. ... The Microsoft Speech Server is a product from Microsoft designed to allow the authoring and deployment of IVR applications incorporating Speech Recognition, Speech Synthesis and DTMF. The product also has some limited support for multimodal applications running on IE on Windows and PocketPC devices. ... Microsoft Voice Command is an application which can control Windows or Windows Mobile devices by voice. ... Microsoft Plus! is an operating system enhancement package provided by Microsoft. ... Dragon NaturallySpeaking is industry leading speech recognition software for Microsoft Windows. ... The friendly Bonzi Buddy rotating purple gibbon that hides one of the most infamous examples of spyware. ... Text2Speech is an Open Source, Speech Synthesis program. ... Adobe Acrobat Reader running on Debian Adobe Acrobat was the first software to support Adobe Systems Portable Document Format. ... JAWS (an acronym for Job Access With Speech) is a screen reader, a software program for visually impaired users produced by the Blind and Low Vision Group at Freedom Scientific of St. ...

See also

The SASDK is Microsofts Speech Application SDK. It is used to create telephony applications as well as multimodal web applications. ... Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as Voice Recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. ... Speech synthesis is the artificial production of human speech. ... Windows Vista is a line of graphical operating systems used on personal computers, including home and business desktops, notebook computers, Tablet PCs, and media centers. ... The Telephony Application Programming Interface (TAPI) is an API, which enables PCs running Microsoft Windows to use telephone services. ... MAPI is an acronym for Messaging Application Programming Interface. ... The Cryptographic Application Programming Interface (also known variously as CryptoAPI, Microsoft Cryptography API, or simply CAPI) is an application programming interface included with Microsoft Windows operating systems that provides services to enable developers to secure Windows-based applications using cryptography. ... DPAPI (Data Protection Application Programming Interface) is a relatively easy-to-use cryptography API available as a standard component in Microsoft Windows 2000 and later versions of Windows operating systems. ...

External links

  • Microsoft site for SAPI
  • Microsoft download site for SAPI 5
  • Microsoft Systems Journal Whitepaper by Mike Rozak on the first version of SAPI
  • Microsoft Speech Team blog

References

  1. ^ Speech synthesis and recognition in .NET - Give applications a voice: Redmond Developer News
  2. ^ Microsoft Corporation. SAPI System Requirements. MSDN. Retrieved on 2006-04-12.

 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m