语音识别系统中英文对照外文翻译文献.docx
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《语音识别系统中英文对照外文翻译文献.docx》由会员分享,可在线阅读,更多相关《语音识别系统中英文对照外文翻译文献.docx(14页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、中英文资料对照外文翻译Speech Recognition1Defining the ProblemSpeech recognitionis the process of convertingan acousticsignal,capturedby a microphone or a telephone, to a set of words. The recognized words can be the final res for applications such as commands & control, data entry, and document preparation. Th
2、ey also serve as the input to further linguistic processing in order to achieve speech und subject covered in section.Speech recognitiosnystemscan be characterizebdy many parameters,some of themore important of which are shown in Figure. An isolated-word speech recognition system requ that the speak
3、er pause briefly between words, whereas a continuous speech recognition s does not.Spontaneous,or extemporaneouslygenerated,speech containsdisfluencieasn,d ismuch more difficultto recognizethan speechreadfrom scriptS.ome systemsrequirespeaker enrollment-a user must provide samples of his or her spee
4、ch before using them, wherea systems are said to be speaker-independent, in that no enrollment is necessary. Some of parametersdepend on the specifictask.Recognition is generallymore difficulwthen vocabularieasre largeor have many similar-soundinwgords. When speech isproduced in a sequence of words,
5、 language models or artificial grammars are used to restrict the comb of words.1The simplestlanguage model can be specifiedas a finite-stanteetwork, where the permissiblewords followingeach word are given explicitlMyo.re generallanguagemodels approximating natural language are specified in terms of
6、a context-sensitive grammar.One popular measure of the difficulty of the task, combining the vocabulary size an language model, ipserplexity, loosely defaisntehde geometric mean otfhe number of wordsthat can follow a word after the language model has been applied (see section for a dis languagemodel
7、ing in generaland perplexityin particularF)i.nally,thereare some external parameters that can affect speech recognition system performance, including the charact of the environmental noise and the type and the placement of the microphone.Parameters Speaking Mode Speaking Style Enrollment Vocabulary
8、Language Model PerplexitySNRTransducerRangeIsolated words to continuous speech Read speech to spontaneous speechSpeaker-dependent to Speaker-independent Small(20,000 words) Finite-state to context-sensitive Small(100)High (30 dB) to law (10dB)Voice-cancelling microphone to telephoneTable:Typical par
9、ameters used to characterize the capability of speech recognition syst Speech recognition is a difficult problem, largely because of the many sources ofassociated with the signal. First, the acoustic realizations of phonemes, the smallest of which words are composed, are highly dependent on the cont
10、ext in which they appear. phoneticvariabilitiaerse exemplifiedby the acousticdifferenceosf the phoneme ,At word boundaries, contextual variations can be quite dramaticg-a-s-msahkoirntgage sound ligkaesh shortage in American English, and devo andare sound like devandare in Italian.Second, acoustic va
11、riabilities can result from changes in the environment as well a position and characteristics of the transducer. Third, within-speaker variabilities can changesin the speakers physaincdalemotional state, speaking ravtoei,ceorqualitFyi.nally, differences in sociolinguistic background, dialect, and vo
12、cal tract size and shape canto across-speaker variabilities.Figure shows the major components of a typical speech recognition system. The digit speech signal is first transformed into a set of useful measurements or features at a f typicalloynce every 10-20 msec (seesectionsand11.3 forsignalrepresen
13、tatioannd digitalsignal processing, respectively). These measurements are then used to search for the mo word candidate, making use of constraints imposed by the acoustic, lexical, and languag Throughout this process, training data are used to determine the values of the model paFigure: Components o
14、f a typical speech recognition system.Speech recognition systems attempt to model the sources of variability described ab severalways. At the level of sirgenparlesentation, researchers have developed representation that emphasize perceptuallyimportant speaker-independentfeaturesof the signal,andde-e
15、mphasize speaker-dependentcharacteristicAst. the acousticphonetic level,speakervariabilitiys typicallmyodeled usingstatistictaelchniquesappliedto largeamounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent models to thoseof the currentspeakerduringsyst
16、em use,(seesection)E.ffectsof linguistic contextat the acousticphoneticlevelare typicallhyandled by trainingseparatemodels for phonemes in different contexts; this is called context dependent acoustic modeling.Word levelvariabilitcyan be handledby allowingalternatperonunciationosf words in represent
17、ations known as pronunciation networks. Common alternate pronunciations of wor as well as effectsof dialectand accentare handled by allowingsearchalgorithmsto find alternatpeaths of phonemes throughthesenetworks.Statisticlaalnguage models, based onestimates of the frequency of occurrence of word seq
18、uences, are often used to guide the through the most probable sequence of words.The dominantrecognition paradigm tihne pastfifteeynears isknown as hidden Markov models (HMM). An HMM is a doubly stochastimcodel, in which the generationof the underlyingphoneme stringand the frame-by-frame,surfaceacous
19、ticrealizationasre both representedprobabilisticaalslyMarkov processes,as discussedin sections,an1d1.2.Neuralnetworks have also been used to estimate the frame based scores; these scores are then intoHMM-basedsystem architectureisn,what has come to be known as hybridsystems,as described in section 1
20、1.5.An interesting feature of frame-based HMM systems is that speech segments are ident during the search process, rather than explicitly. An alternate approach is to first id segments,then classifythe segments and use the segment scoresto recognizewords. This approach has produced competitive recog
21、nition performance in several tasks.2State of the ArtComments about the state-of-the-art need to be made in the context of specific appl which reflectthe constraintosn the task.Moreover, differentechnologiesare sometimes appropriate for different tasks. For example, when the vocabulary is small, the
22、 entire be modeled as a single unit. Such an approach is not practical for large vocabularies, models must be built up from subword units.Performance of speech recognition systems is typically described in terms of word e E , defined as:where N is the total number of words in the tesSt,Is,eta,ndDand
23、are the total number of substitutions, insertions, and deletions, respectively.The past decade has witnessed significant progress in speech recognition technology error rates continue to drop by a factor of 2 every two years. Substantial progress has in the basic technology, leading to the lowering
24、of barriers to speaker independence, speech,and largevocabulariesT.here are severalfactorsthathave contributedto thisrapid progress.Firstt,hereis the coming of age of the HMM.HMMispowerfulin that,with theavailabiliotfy trainindgata,the parametersof themodel can be trainedautomaticalltyo give optimal
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 语音 识别 系统 中英文 对照 外文 翻译 文献
![提示](https://www.taowenge.com/images/bang_tan.gif)
限制150内