《英语声学和语音学英文版.ppt》由会员分享,可在线阅读,更多相关《英语声学和语音学英文版.ppt(26页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、Speech acoustics and phoneticsLouis C.W.PolsInstitute of Phonetic Sciences(IFA)Amsterdam Center for Language and Communication(ACLC)NATO-ASI“Dynamics of Speech Production and Perception”Il Ciocco,Tuscany,Italy,July 1,2002OverviewnDynamics in speech acousticsnContour modeling(mainly formants)nAspects
2、 of spectral undershootnModeling V and C reductionnPhonetic knowledge from speech corporanIFA,CGN,TIMIT,found speechnConclusionsJuly 1st,20022Speech acoustics and phonetics,Il CioccoJuly 1st,20023Speech acoustics and phonetics,Il CioccoDynamics in speech acousticsnDynamics is the norm,not stationari
3、tynarticulatory efficiencynDynamics is everywherengenerally no word boundaries in speechndeletion of words,syllables,phonemes;insertionnwithin/between word coarticulation/assimilationnvowel and consonant reductionnAcoustic manifestationsnsegment duration,F0,loudness,spectral qualityJuly 1st,20024Spe
4、ech acoustics and phonetics,Il CioccoDynamics is the normnThe speaker speaks as sloppily as the listeners allow him to do in communicationncommunicative efficiencynArticulatory vs.perceptual efficiencyndo spectral transitions facilitate or hamper perception?see other presentationnSpeaker flexibility
5、;speaking style(clear vs.sloppy);speaking rateJuly 1st,20025Speech acoustics and phonetics,Il CioccoDynamics is everywherenDeletionnbread and butter/brEmbY3/nAmsterdam(Du)/AmstrdAm/AmsdAm/nkoninklijke(Du)/konIklk/kolk/nInsertionnhomorganic glide insertion:die een(Du)/dijn/nDegeminationnis zichtbaar(
6、Du)/Is zIxtbar/IsIxbar/nReduction,coarticulation,assimilationJuly 1st,20026Speech acoustics and phonetics,Il CioccoAcoustic manifestationsnpitch,loudness,formant,component contoursncontour stylization(e.g.,pitch in praat)ncontour modelingnn-th degree curve fitting(D.van Bergem)nLegendre polynomials)
7、(R.van Son)n16 points per segment)n(phoneme)segmentationnby hand(time consuming;non-consistent)nautomatically(via forced phoneme recognition and a pronunciation lexicon with alternatives;systematic errors)July 1st,20027Speech acoustics and phonetics,Il CioccoContour modelingnallows modeling of speci
8、fic phenomenanpitch accentuation(vs.vowel onset)nreduction,centralization,undershootnallows generation of stimuli for perc.expts.nphoneme identification in extending contextn2-alternatives forced choice identif.of continuandiscrimination,RTnallows statistics on large speech corporanTIMIT,CGN,IFA-cor
9、pus,SwitchboardJuly 1st,20028Speech acoustics and phonetics,Il CioccoStatic vs.dynamic V recogn.nsee Weenink(2001)n“Vowel normalizations with the TIMIT acoustic phonetic speech corpus”,IFA Proc.24,117-123n438 males,both train&test sent.of TIMITn35,385 vowel segments,hand segmentedn13 monophthongeal
10、vowel categoriesn1-Bark bandfilter anal.(18),intensity.normal.n3 frames per segment:central and 25 ms L/RJuly 1st,20029Speech acoustics and phonetics,Il CioccoSome resultsnVowel classif.(%)with discriminant functionsCondition#ItemsStatic 1 frameDynamic 3 framesOriginal35,385438x13x(125)59.366.9speak
11、er normalized35,38562.269.2V centers per speaker5,374438x1378.990.1speaker normalized5,37487.994.5July 1st,200210Speech acoustics and phonetics,Il CioccoFormant tracks/speaking ratenPh.D.thesis Rob van Son(1993)n“Spectro-temporal features of vowel segments”nsee also Speech Comm.13,135-148(Pols&vSon)
12、n850-words text,read at normal and fast ratenhand segmentation of 7 most freq.V+schwanformant tracksnvia 16 points per segm.or 5 Legendre polynomialsninfluence of rate,V-dur.,context,sent.acc.nevidence for duration-controlled undershoot?July 1st,200211Speech acoustics and phonetics,Il CioccoSome res
13、ultsnno differences for F1/F2 in vowel center for normal-or fast-rate speech;only some over-all rise in F1 for fast rate(irrespective of V)nsame formant track shape(normalized to 16 points)for normal-or fast-rate speechnsame results when using the more elaborate Legendre polynomialsnConcl.:changes i
14、n V-duration do not change the amount of undershoot active control of articulation speedJuly 1st,200212Speech acoustics and phonetics,Il CioccoFormant representationszeroth order Legendre Legendre polynomial coefficients(mean Fi in vowel segment)second order polynomials(axes reversed)eeJuly 1st,2002
15、13Speech acoustics and phonetics,Il CioccoModeling vowel reductionnPh.D.thesis Dick van Bergem(1995)n“Acoustic and lexical vowel reduction”nsee also Speech Communication 16,329-358nlexical V reduction Fr/bet/vs.Du/btOn/nacoustic V reduction/banan,bAnan,bnan/nf(sent.acc.,w.str.,w.class):can-candy-can
16、teenncoarticulatory effects on the schwanC1C2V-and VC1C2-type nonsense wordsnperceptual effects(full V or schwa,f.i.ananas)July 1st,200214Speech acoustics and phonetics,Il CioccoSome resultsThe schwa is not just a centralized vowel but somethingthat is completely assimilated with its phonemic contex
17、tt-nw-lJuly 1st,200215Speech acoustics and phonetics,Il CioccoModeling consonant reductionnSp.Comm.(1999)28,125-140(vSon&Pols)n20 min.speech,both spontaneous and readn2 x 791 similar VCV;hand segmentedn5 aspects of V and C reductionnrelated to coarticulation:F2 slope differences at CV-vs.VC-boundari
18、es;F2 locus equations(F2 onset vs.F2 target)nrelated to speaking effort:duration;spectral COG(mean freq.);V-C sound energy differencesJuly 1st,200216Speech acoustics and phonetics,Il CioccoSome resultsnV markedly reduced in spontaneous speechnlower F2-slope diff.in spontaneous speech decrease in art
19、iculation speednno systematic effect on F2 locus equation;V onsets and targets change in concert any V reduction mirrored by comparable change in Cnspont.sp.:V and C shorter;lower COG decrease in vocal and articulatory effortJuly 1st,200217Speech acoustics and phonetics,Il CioccoAccess to large corp
20、oranmore,and more realistic,datanphonetic knowledge via statistical analysesnf.i.highly accessible IFA-corpus(free,SQL)nsee“Structure and access of the open source IFA-corpus”,IFA Proc.24,15-26(vSon&Pols)non-line http:/www.fon.hum.uva.nl/IFAcorpus/n4 M/4F speakers,5.5 hrs of speechnfrom informal to
21、read+sent.,words,syllablesn 50Kwords segm.and labeled at phoneme levelJuly 1st,200218Speech acoustics and phonetics,Il CioccoSome resultsnspeech+annot.+meta data:relational DBnrealization of final n,f.i.Du geven/xev(n)/Style#wrds/n/All%/n/Informal5,2501304305 0.3Retelling6,22913236249 5.2 LFHFNarr.s
22、tory14,453180372552334230Sentences14,97020334054337Pseudo-sent2,55462198177All43,4564591,2711,73036ReadJuly 1st,200219Speech acoustics and phonetics,Il CioccoSpoken Dutch Corpus(CGN)n10 M words,1,000 hrs of speechnvariety of styles,incl.telephone speechnadult Dutch and Flemish speakersnfor linguisti
23、c and technological researchnsee various LREC and ICSLP papers(2002)nsee also http:/lands.let.kun.nl/cgn/home.htmnfully transcribed:orthogr.,POS,lemmasnpartly transcr.:phonemic,prosodic,syntacticJuly 1st,200220Speech acoustics and phonetics,Il CioccoTIMITnpopular DB in acoustic phonetics and ASRnals
24、o telephone version(NTIMIT)nhand segmented&labeled at phoneme leveln438 males,192 females(8 dialect regions)n10 sent./sp.(2 fixed,1 pact,7 diverse)sa1:“She had her dark suit in greasy wash water all year”nincludes separate test data(112 M,56 F)ne.g.Ph.D thesis X.Wang (1997)“Incorporating knowledge o
25、n segmental duration in HMM-based continuous speech recognition”July 1st,200221Speech acoustics and phonetics,Il CioccoUseful info:durational variabilityAdopted from Wang(1998)normal rate=95 primary stress=104word final=136utterance final=186overall average=95 msJuly 1st,200222Speech acoustics and p
26、honetics,Il Ciocconormalized phone durationspeaking rateall 3,696 training sent.(sx+si)of TIMIT training set0July 1st,200223Speech acoustics and phonetics,Il Cioccofound speechnDARPA-LVSR community rather ambitiousnBroadcast News(BN),Sp.Comm.37(2002)95WSJ NAB read sp.1995Market place1996F0-F5,FX par
27、titioned19973 hrs test unpartit.1998+non Engl.speech also 900 Mbest%WERon test set27.0%27.1%1:46 hrs16.2%3 hrs13.5 16.1%3 hrs(10 xRT)For Proc.DARPA Workshops,see http:/www.nist.gov/speech/proc/darpa99/index.htmJuly 1st,200224Speech acoustics and phonetics,Il CioccoArticul.-acoustic features in ASRn“
28、A Dutch treatment of an elitist approach to articulatory-acoustic feature classification”,Proc.Eurospeech-2001,1729-1732(M.Wester et al.)n“Integrating articulatory features into acoustic models for speech recognition”,Phonus 5,73-86(K.Kirchhoff,2000)n“An overlapping-feature-based phonological model
29、incorporating linguistic constraints:Applications to speech recognition”,JASA 111(2),1086-1101 (J.Sun&L.Deng,2002)July 1st,200225Speech acoustics and phonetics,Il CioccoConclusionsnexamples of dynamics in speech acousticsngoing from formal to informal speech:nless dynamics,more reduction(artic.guided)nundershoot vs.speaking stylensloppiness or articulatory limits?nfunctionality of dynamics?other papernsystematicity of dynamics?neasing ASR,rules for TTS,acquiring knowledge?July 1st,200226Speech acoustics and phonetics,Il Ciocco
限制150内