RupalPatel_2013W[3][和指纹一样独特的合成声音].pdf
www.XiYuS锡育软件Id like to talk today about a powerful and fundamentalaspect of who we are:our voice.我今天要和大家讲述的是 关于我们自身的一个非常强大 非常重要的方面:我们的声音,00:12Each one of us has a unique voiceprint that reflects our age,our size,even our lifestyle and personality.每一个人的声音都带有独特的标记,这个声音的标记能反映出我们的年龄,我们的胖瘦高矮,甚至是我们的生活方式和性格。00:20In the words of the poet Longfellow,the human voice is theorgan of the soul.用诗人朗费罗的话来说,“人类的声音是灵魂的重要器官。”00:29As a speech scientist,Im fascinated by how the voice isproduced,and I have an idea for how it can be engineered.身为一个语音科学家,我非常热衷于研究 声音的产生,而且我有一个如何制造声音的想法。00:35Thats what Id like to share with you.这就是我今天想和大家分享的东西。00:43Im going to start by playing you a sample of a voice thatyou may recognize.首先,我想为大家播放一个声音样本,这个声音你们可能听过。00:45(Recording)Stephen Hawking:I would have thought it wasfairly obvious what I meant.(录音)史蒂芬霍金:“我本来以为,我想说的意思很显而易见。”00:49Rupal Patel:That was the voice of Professor StephenHawking.卢帕尔帕特尔:那是 史蒂芬霍金教授的声音。00:53What you may not know is that same voice may also be usedby this little girl who is unable to speak because of aneurological condition.你们可能不知道的是,同样的声音 也被用于这个小女孩身上,她因为大脑神经系统缺陷 而不能讲话。00:56voiceprint:n.声纹 reflects:反照/反射 engineered:adj.设计的,工程/v.设计;指导(engineer的过去分词)Hawking:n.利用鹰行猎/v.袭击;翱翔;攫取(hawk的现在分词)neurological:adj.神经病学的,神经学上的In fact,all of these individuals may be using the same voice,and thats because theres only a few options available.事实上,很多不能说话的人 都可能在使用同样的声音 那是因为可以使用的声音样本太少了。01:07In the U.S.alone,there are 2.5 million Americans who areunable to speak,and many of whom use computerizeddevices to communicate.单单在美国,就有250万人 不能说话,而且在这些人中很多都是使用电脑化的设备 进行交流。01:14Now thats millions of people worldwide who are usinggeneric voices,including Professor Hawking,who uses anAmerican-accented voice.也就是全世界数百万的人 都在使用一些毫无个性的声音,其中就包括史蒂芬霍金教授,他使用的声音是带有美国口音的。01:24This lack of individuation of the synthetic voice really hithome when I was at an assistive technology conference a fewyears ago,and I recall walking into an exhibit hall and seeinga little girl and a grown man having a conversation usingtheir devices,different devices,but the same voice.我真正开始意识到 合成声音缺乏个性 是我在几年前参加一个 辅助技术会议的时候,我记得走进一个展厅,看到一个小女孩和一个成年男子 正在用他们的设备进行对话,不同的设备,却是同样的声音。01:36And I looked around and I saw this happening all around me,literally hundreds of individuals using a handful of voices,voices that didnt fit their bodies or their personalities.我看向四周,发现身边这种情况很多,几乎是上百个人 却只用着为数不多的几种声音,这些声音跟他们的身体特征 和性格都很不匹配。01:59We wouldnt dream of fitting a little girl with the prostheticlimb of a grown man.我们肯定做梦也不会想到把一个成年男子的假肢装在一个小女孩身上。02:13computerized:adj.电脑的;电脑化的;用电脑处理的/v.用电子计算机控制(computerize的过去分词)generic:adj.类的;一般的;属的;非商标的 synthetic:adj.综合的;合成的,人造的/n.合成物 conference:n.会议;讨论;协商;联盟;(正式)讨论会;工会,工党用语(每年的)大会/vi.举行或参加(系列)会议 personalities:n.人身攻击;诽谤;人格;个性;名人 prosthetic:adj.假体的;非朊基的So why then the same prosthetic voice?那为什么他们要用同样的合成声音呢?02:19It really struck me,and I wanted to do something about this.这深深的触动了我,我想做些什么。02:22Im going to play you now a sample of someone who has,two people actually,who have severe speech disorders.现在我想为大家播放一个人的录音 不对,其实是两个人,他们都有很严重的言语障碍。02:27I want you to take a listen to how they sound.我想让大家听听他们的声音。02:34Theyre saying the same utterance.他们在发出同样一个音。02:37TED演讲者:Rupal Patel|3演讲标题:Synthetic voices,as unique as fingerprints|和指纹一样独特的合成声音内容概要:Many of those with severe speech disorders use a computerized device tocommunicate.Yet they choose between only a few voice options.That.s why Stephen Hawkinghas an American accent,and why many people end up with the same voice,often toincongruous effect.Speech scientist Rupal Patel wanted to do something about this,and in thiswonderful talk she shares her work to engineer unique voices for the voiceless.很多有着严重言语障碍的人都用一个电脑化的设备进行交流。但是,他们的声音选择却非常有限。这就是为什么史蒂芬.霍金教授有美国口音,为什么很多人有着同样的声音,不符合他们身份的声音。语音学家卢帕尔.帕特尔想对此做点什么。在这个很棒的演讲中,她跟我们分享了她如何为那些没有声音的人打造独特的声音的过程。(First voice)(第一个声音)02:39(Second voice)You probably didnt understand what theysaid,but I hope that you heard their unique vocal identities.(第二个声音)大家可能并不明白他们说了什么,但我希望大家听到了 他们独特的声音标志。02:42So what I wanted to do next is,I wanted to find out how wecould harness these residual vocal abilities and build atechnology that could be customized for them,voices thatcould be customized for them.所以接下来我想要做的事情就是,我想要找出如何可以利用 他们残留的发声能力,并发明一项技术,这项技术能为他们创造出个性化的声音,就是专门为他们定制的声音。02:54So I reached out to my collaborator,Tim Bunnell.所以我联系了我的合作伙伴,蒂姆邦内尔。03:07Dr.Bunnell is an expert in speech synthesis,and what hedbeen doing is building personalized voices for people byputting together pre-recorded samples of their voice andreconstructing a voice for them.邦内尔博士是一位语言合成方面的专家,他一直在为需要帮助的人合成 个性化的声音,他把这些人预先录制好的声音样本组合在一起,并重新建立他们的声音。03:10disorders:n.无秩序,混乱;小病(disorder的复数形式)/v.电子扰乱(disorder的单三形式)utterance:n.表达;说话;说话方式identities:身份/特性 residual:n.剩余;残渣/adj.剩余的;残留的 customized:n.自定义;客制化;自定义级别/v.定制;按特别订货生产(customize的过去式和过去分词)/adj.定制的;用户化的 collaborator:n.劳经合作者;勾结者;通敌者 synthesis:n.综合,化学合成;综合体 personalized:adj.个性化的;个人化的/v.个性化(personalize的过去式);个人化 pre-recorded:adj.预录的;预制的 reconstructing:v.再现,重建;改造(reconstruct的ing形式)These are people who had lost their voice later in life.这些人都是在人生后来的某个阶段 才失去了语言能力。03:24We didnt have the luxury of pre-recorded samples of speechfor those born with speech disorder.可是我们没有 那些生来就有言语障碍的人的 预先录制好的声音样本。03:28But I thought,there had to be a way to reverse engineer avoice from whatever little is left over.但我想,肯定有一个办法 可以利用仅存的不管剩下多少的语言能力 来逆向重组声音。03:33So we decided to do exactly that.于是我们决定去做这样的工作。03:40We set out with a little bit of funding from the NationalScience Foundation,to create custom-crafted voices thatcaptured their unique vocal identities.我们从国家科学基金会的一小笔资金开始,努力打造反映了他们的独特声印的 定制的声音。03:43We call this project VocaliD,or vocal I.D.,for vocal identity.我们称之为VocaliD计划,即声音ID,用于区别不同的声音。03:51Now before I get into the details of how the voice is madeand let you listen to it,I need to give you a real quick speechscience lesson.Okay?那么,在我开始讲述 声音是如何制作的,以及让大家听这些声音之前,我需要先给大家上?可以么?03:56So first,we know that the voice is changing dramatically overthe course of development.首先,我们知道声音 在其发展过程中会发生巨大的改变。04:05Children sound different from teens who sound differentfrom adults.儿童的声音与青少年的声音不同,而青少年的声音则与成人的声音不同。04:11Weve all experienced this.我们都经历过这样的改变。04:14Fact number two is that speech is a combination of thesource,which is the vibrations generated by your voice box,which are then pushed through the rest of the vocal tract.第二,语音是 声源的组合,也就是你的喉部产生的震动 通过声道 传出来。04:17set out:出发;开始;陈述;陈列 teens:n.十多岁,十几岁;青少年 vibrations:n.力振动;共鸣;动摇(vibration的复数)generated:adj.生成的;发生的/v.(使)产生(generate的过去分词)These are the chambers of your head and neck that vibrate,and they actually filter that source sound to produceconsonants and vowels.这些是你的头部和颈部 会震动的腔室,他们会过滤声源 并产生辅音和元音。04:31So the combination of source and filter is how we producespeech.所以声源和过滤器的组合 使得我们能够制造语言。04:39And that happens in one individual.而这发生在一个个体身上。04:45Now I told you earlier that Id spent a good part of my careerunderstanding and studying the source characteristics ofpeople with severe speech disorder,and what Ive found isthat even though their filters were impaired,they were ableto modulate their source:the pitch,the loudness,the tempoof their voice.早先我告诉过你们 我花了我职业生涯中的很大一部分时间 来了解和学习 那些有着严重言语障碍的人的 声源的特征,我发现 虽然他们的过滤器受损,他们仍然能够控制他们的声源,包括音高、响度和声音的节奏。04:48These are called prosody,and Ive been documenting foryears that the prosodic abilities of these individuals arepreserved.这些我们称这些为韵律,而我多年的记录表明 这些人的韵律能力 被保留了下来。05:11chambers:n.内庭(chamber的复数)vibrate:vi.振动;颤动;摇摆;踌躇/vt.使振动;使颤动 consonants:n.语辅音,子音(consonant的复数形式)vowels:n.语元音,母音(vowel的复数形式)characteristics:n.特性,特征;特色(characteristic的复数);特质 filters:n.化工滤器,电子光滤波器;光滤光片(filter的复数)/v.过滤(filter的三单形式)impaired:adj.受损的/v.损害(impair的过去式和过去分词)modulate:vt.调节;(信号)调制;调整/vi.调制;转调 loudness:n.声响度;吵闹 tempo:n.速度,发展速度;拍子 documenting:n.文件编制 preserved:adj.保藏的;腌制的;喝醉的So when I realized that those same cues are also importantfor speaker identity,I had this idea.所以当我意识到这些同样的线索 对讲者身份也是非常重要的时候,我有了这样一个想法。05:18Why dont we take the source from the person we want thevoice to sound like,because its preserved,and borrow thefilter from someone about the same age and size,becausethey can articulate speech,and then mix them?为什么不利用那些 我们希望听到的声音的声源,因为这个声源是好的,再借助一个 差不多年龄和体型的人的过滤器,因为他们可以清晰地发声,然后把他们组合在一起?05:27Because when we mix them,we can get a voice thats as clearas our surrogate talker-thats the person we borrowed thefilter from?and is similar in identity to our target talker.因为当我们把它们组合在一起的时候,我们就可以获得一个 像代理说话者一样清晰的声音,代理说话者就是我们向其借了过滤器的那个人,而这个声音又跟我们的目标说话者的身份一致。05:43Its that simple.就这么简单。05:55Thats the science behind what were doing.这就是我们在做的研究背后的科学。05:57So once you have that in mind,how do you go aboutbuilding this voice?有了这样的想法以后,我们又该如何真正去打造这样的声音呢?06:00Well,you have to find someone who is willing to be asurrogate.嗯,你必须找到 愿意做代理说话者的人。06:05Its not such an ominous thing.这并不是什么有着不祥之兆的事情。06:09Being a surrogate donor only requires you to say a fewhundred to a few thousand utterances.作为一个代理说话者,你只需要说上几百个 到几千个话语。06:11cues:n.开端,线索;提示,关键;球杆;诱因(cue的复数形式)articulate:vt.清晰地发(音);明确有力地表达/vi.发音;清楚地讲话;用关节连接起来/adj.发音清晰的;口才好的;有关节的 surrogate:n.代理;代用品;遗嘱检验法官/vt.代理;指定某人为自己的代理人/adj.代理的;替代的 talker:n.说话的人;健谈者;空谈者 go about:v.着手做;四处走动;传开;从事 ominous:adj.预兆的;不吉利的 donor:n.捐赠者;供者;赠送人/adj.捐献的;经人工授精出生的 utterances:n.表达;说话;说话方式www.XiYuS锡育软件The process goes something like this.过程大致是这样的。06:18(Video)Voice:Things happen in pairs.(视频)声音:事情成对发生。06:20I love to sleep.我爱睡觉。06:22The sky is blue without clouds.天空很蓝,无云。06:24RP:Now shes going to go on like this for about three to fourhours,and the idea is not for her to say everything that thetarget is going to want to say,卢帕尔帕特尔:她就这样继续说上 大约三到四个小时,当然她并不需要说出 目标说话者会说的所有东西,06:28but the idea is to cover all the different combinations of thesounds that occur in the language.而只需覆盖到一门语言中的 所有发音的不同组合。06:37The more speech you have,the better sounding voice youregoing to have.越多的语音样本 就意味着越好的声音质量。06:44Once you have those recordings,what we need to do is wehave to parse these recordings into little snippets of speech,one-or two-sound combinations,sometimes even wholewords that start populating a dataset or a database.一旦有了这些录音之后,我们需要做的就是 将这些录音 解析成语音的小片段,一两个发声的组合,有的时候甚至整个的词语 也会出现在数据库里边。06:48Were going to call this database a voice bank.我们就将这个数据库称为声音银行。07:05Now the power of the voice bank is that from this voice bank,we can now say any new utterance,like,I love chocolate-everyone needs to be able to say that?fish through thatdatabase and find all the segments necessary to say thatutterance.这个声音银行的作用在于:基于这个声音银行,我们现在可以说出任何新的话语,比如:“我爱巧克力”每个人都应该有可以说出这句话的能力 从这个数据库中寻找 并找到说这句话需要的所有必要的片段。07:08sounding:n.音响;试探;测探水深/adj.发出声音的 parse:vt.解析;从语法上分析/vi.理解;从语法上分析/n.从语法上分析;分列snippets:n.片段(snippet的复数形式);小片 populating:v.居住于中;构成的人口;占据(populate的ing形式)segments:n.片段;段数(segment的复数);积弓形片模型/v.把分割成段;细胞分裂(segment的三单形式)(Video)Voice:I love chocolate.(视频)声音:我爱巧克力。07:23RP:So thats speech synthesis.卢帕尔帕特尔:这就是语音合成。07:25Its called concatenative synthesis,and thats what wereusing.这个被称之为衔接合成,而我们用的就是它。07:26Thats not the novel part.其实这部分并不新奇。07:29Whats novel is how we make it sound like this young woman.新奇的部分是我们如何制作出听起来 像是这个年轻女性的声音。07:31This is Samantha.这是萨曼莎。07:34I met her when she was nine,and since then,my team and Ihave been trying to build her a personalized voice.我第一次见到她的时候,她九岁,从那时候起,我和我的团队 就一直在努力给她打造一个属于她自己的声音。07:36We first had to find a surrogate donor,and then we had tohave Samantha produce some utterances.我们首先要找到一个代理说话者,然后我们让萨曼莎 发出一些声音。07:43What she can produce are mostly vowel-like sounds,butthats enough for us to extract her source characteristics.她能做的就是发出一些类似元音的声音,但这对于我们提取她的声源特征 已经足够了。07:50What happens next is best described by my daughtersanalogy.Shes six.接下来发生的事情最好可以 用我女儿的比喻来描述。她六岁。07:57She calls it mixing colors to paint voices.她称其为“用不同的颜色画声音”。08:03Its beautiful.Its exactly that.美极了。正是这样。08:08Samanthas voice is like a concentrated sample of red fooddye which we can infuse into the recordings of her surrogateto get a pink voice just like this.萨曼莎的声音就好比是 浓缩的红色食用色素注入了 她的代理说话者的录音里面,而产生了这样的粉红色的声音。08:11extract:vt.提取;取出;摘录;榨取/n.汁;摘录;榨出物;选粹 analogy:n.类比;类推;类似 mixing:n.混合;电子混频/v.混合(mix的ing形式)concentrated:adj.集中的;浓缩的;全神贯注的/v.集中(concentrate的过去分词)infuse:vt.灌输;使充满;浸渍/vi.(茶叶,草药等)被泡(Video)Samantha:Aaaaaah.(视频)萨曼莎:啊08:23RP:So now,Samantha can say this.卢帕尔帕特尔:那么现在,萨曼莎可以说这样的话。08:28(Video)Samantha:This voice is only for me.(视频)萨曼莎:这是只属于我的声音。08:30I cant wait to use my new voice with my friends.我迫不及待地想跟我的朋友用我 的新声音交流。08:34RP:Thank you.(Applause)Ill never forget the gentle smilethat spread across her face when she heard that voice for thefirst time.卢帕尔帕特尔:谢谢。(掌声)我永远不会忘记当她第一次听到自己的声音的时候,那个绽放在她脸上的温柔的笑脸。08:40Now theres millions of people around the world likeSamantha,millions,and weve only begun to scratch thesurface.这个世界有上百万 和萨曼莎一样的人,上百万,而我们其实才刚刚开始。08:54What weve done so far is we have a few surrogate talkersfrom around the U.S.我们到目前为止所做的就是,我们有来自美国的几个代理说话者,09:02who have donated their voices,and we have been usingthose to build our first few personalized voices.他们捐献了自己的声音,而我们正在用这些声音 来打造最初的一些个性化的声音。09:08But theres so much more work to be done.但是接下来的任务还很重。09:16For Samantha,her surrogate came from somewhere in theMidwest,a stranger who gave her the gift of voice.就萨曼莎,她的代理说话者 来自中西部的一个地方,一个将声音赠送给她的陌生人。09:17And as a scientist,Im so excited to take this work out of thelaboratory and finally into the real world so it can have real-world impact.作为一名科学家,我很期待 将这项工作搬到实验室之外,最终搬进现实世界 并产生真正的影响。09:27scratch the surface:只做了肤浅的研究;不深刻,不周详 talkers:n.说话的人;健谈者;空谈者 donated:adj.捐赠的/v.捐赠(donate的过去分词形式)real-world:adj.现实生活的;工作的What I want to share with you next is how I envision takingthis work to that next level.我接下来想跟你们分享的是 我对如何将这项工作推进到下一个层次的展望。09:36I imagine a whole world of surrogate donors from all walksof life,different sizes,different ages,coming together in thisvoice drive to give people voices that are as colorful as theirpersonalities.我想象到一个充满了代理说话者的世界,他们来自不同的行业,有着不同的体型和年龄,他们为这个声音计划走到一起,希望赋予人们 和他们的性格一样丰富多彩的声音。09:42To do that as a first step,weve put together this website,VocaliD.org,as a way to bring together those who want tojoin us as voice donors,as expertise donors,in whatever wayto make this vision a reality.实现这个目标的第一步,我们建立了一个网站:VocaliD.org,通过这个网站,我们把 愿意以声音捐献者或专业知识捐献者的身份 加入到我们的人们团结在一起,不管以何种方式,来一起实现这个愿景。09:58They say that giving blood can save lives.人们说献血可以拯救生命。10:15Well,giving your voice can change lives.那么,捐献您的声音可以改变生命。10:19All we need is a few hours of speech from our surrogatetalker,and as little as a vowel from our target talker,tocreate a unique vocal identity.我们需要的仅仅是几小时的 代理说话者的话语,以及目标说话者的一个小小的元音,就可以打造一个独特的声音。10:24So thats the science behind what were doing.这就是我们所做的研究背后的科学。10:37I want to end by circling back to the human side that is reallythe inspiration for this work.作为结尾,我还是想回到人的主题,这也是这项工作的真正灵感来源。10:40envision:vt.想象;预想 donors:n.捐赠人(donor的复数);电子施主 put together:.放在一起;组合;装配 expertise:n.专门知识;专门技术;专家的意见 vowel:n.元音;母音/adj.元音的 circling:n.环骑/vt.盘旋,绕轨道运行(circle现在分词形式)About five years ago,we built our very first voice for a littleboy named William.大约五年前,我们第一次给一个名为威廉的男孩 打造了他的声音。10:49When his mom first heard this voice,she said,This is whatWilliam would