technology_voice.ppt
《technology_voice.ppt》由会员分享,可在线阅读,更多相关《technology_voice.ppt(37页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、CS 260:Lecture 10Professor John Canny4/23/20231Speech:the Ultimate Interface?4In the early days of HCI,people assumed that speech/natural language would be the ultimate UI4Use of speech interfaces has grown,but its still rarely used in the office.Why?4/23/20232Speech:the Ultimate Interface?4Why spee
2、ch hasnt succeeded in the office:4Affordances of text:*Visual scanning(for email or docs)*Unambiguity of text*Editing of text4Disadvantages of speech:*Noise call center ambience*Lack of privacy4/23/20233Speech:the Ultimate Interface?4Use of speech interfaces has grown,but its still rarely used in th
3、e office.4/23/20234Computing is MovingWhere are computers these days?Intels breakdown(based on PC sales):4Office4Home4Mobile(laptops)4MedicalAnd as we noted earlier,programmable smartphones will soon outnumber total PCs.Then there are game boxes,cable boxes,Smart TVs etc.4/23/20235What is a good int
4、erface for:4Mobile computing(walking or driving)?4Home computing?4Medical computing?4/23/20236Where is the industry now?:4After a big slump around 2002,the speech technology/voice interface industry seems to be growing briskly,about 30-40%per year.One current estimate put it at about$2.5 Billion.4It
5、 would probably be more visible,except several related industries have overtaken it:outsourced call centers,and VOIP(Voice Over IP).4The biggest growth has been in the new markets:*Cell phones(as a local UI)*Medical(e.g.order entry)*Voice services over the phone4/23/20237Industry movementIn January
6、this year,Yahoo acquired a large team of speech engineers from Nuance,the largest speech company(which owns Dragon NaturallySpeaking).Google already had some leading speech researchers.So there is much interest in speech for the portal market.Aside:there is a division of Nuance devoted to medical sp
7、eech recognition,and one to call centers.4/23/20238Industry movementHeyanita:Voice based email and messagingBevocal:Hosted IVR(Interactive Voice Response)for customers,e.g.MetroPCSTellme:Find a business service(including restaurants)using ASR.4/23/20239Speech:Some background A speech recognizer cons
8、ists of 3 stages:A state-of-the-art recognizer requires 50-100 Mflops for continuous speech(no pauses between words).PC continuous speech recognizers appeared in the 1990s and saved many victims of RSI.AcousticFront EndAcousticModelLanguage/phoneticmodelRawsoundAcousticfeaturesPhoneticfeaturesWords4
9、/23/202310Speech:Some backgroundThe first two stages are standard.The last is not,and has a big impact on performance.The last box encodes knowledge of what users might say,either as a grammar,or as a statistical language model(LM).Grammars are suitable for small recognition tasks with well-known co
10、mmand languages.AcousticFront EndAcousticModelLanguage/phoneticmodelRawsoundAcousticfeaturesPhoneticfeaturesWords4/23/202311Speech UIs4Most implement a finite-state machine.4At each state,the system can recognize various speech segments to take you to the next state(s).4A segment may be a word,throu
11、gh to a complete utterance.4The system can also make utterances of its own at various states.4You can specify them usingregular expressions,or using VoiceML.4/23/202312Speech on phonesSpeech recognition is faster and more accurate if you limit the vocabulary to a few dozen words.Small-vocabulary spe
12、ech recognition has been common on phones for the last few years:4Call a number4Call a name(from your contacts)What about large vocabulary,continuous speech?4/23/202313This years Smartphone(free with service contract)4 150-200 MHz ARM processor 4 32 MB ram4 2 GB flash(not included)Windows-98 PC that
13、 boots quickly!Plus:4 Camera4 AGPS(Qualcomm/Snaptrack)4 DSP cores,OpenGL GPU4 EV-DO(300 kb/s),Bluetooth200 mipsThis years Smart phone4/23/202314Speech on phonesThis is just the right power for high-performance speech recognition.Large-vocabulary speech recognition(not continuous)appeared on phones l
14、ast year:Samsung P207LVCSR(Large-Vocabulary ContinuousSpeech Recognition)should be available this year.4/23/202315Speech in the homeGood speech recognition used to require careful microphone placement and a worn headset.4/23/202316Speech in the homeNew microphones:array mics with builtin DSPs allow
15、recognition at greater range(several feet).Users dont have to wear microphones any more to use speech.4/23/202317Speech in the homeApart from CPU and memory(which are shrinking),speech recognition requires only a microphone and perhaps a speaker.It is power and size efficient.In a few years,it will
16、probably be possible to build speech recognition into bluetooth microphones,or other small devices.Compare with other interfaces4/23/202318Ten Guidelines for Speech Interfaces1.You cant design what you cant define2.Use user-centered design techniques3.Use the right technology,and use technology righ
17、t4.Leverage the language instinct5.Establish success criteria and test against them6.Branding in VUI is more than just a pretty voice 7.How you say it is as important as what you say8.Dont block the exit9.Take care with error handling10.Establish a change process4/23/2023191.You cant design what you
18、 cant define4Consider the task(s)that your users want to do,i.e.start with standard task analysis.4What conceptual model do they have(use contextual inquiry)?4What language do they use to refer to it?4Use recordings during contextual inquiry/task analysis.4/23/2023202.Use user-centered design techni
19、ques4Great to see this advice in a trade publication.You know a lot about this:4Study real use context especially important for mobile devices,medical,home etc.4Performs needs analysis what kinds of service might the system provide and how valuable are they?4Develop personae to guide your design4Onc
20、e again,study users conceptual models4/23/2023213.Use the right technology,and use technology right4In a speech interface,you have a choice between synthesized and recorded speech for output.4In designing the recognizer,language-models will generally give better results for routing a broad range of
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- technology_voice
限制150内