最新多媒体技术第二讲教学课件.ppt
Why Digital?Universal storage, transmission format CD, internetPrecision (Range of values, number of bits, floating point)Lossless transmission/storageBUT:sampling rate distorts informationsize requirements may be large compared to analogText ASCII, Unicode Formatted Text, Rich Text Document Formats: Structured: Tex, HTML Page Descriptions: Postscript, PDFGraphics Objects circles, splines, rectangles, lines Editable resize, reshape, move, colorize Synthetic Images (Pictures) Fixed digitized representation bitmap, colors per pixel Editable in limited ways retouch, cut and paste, remap colors, filter Photoshop tools no model of the thing Captured not just from real life, clip art, screen dumpAudio Sounds hear 15 Hz to 20 kHz Speech is 50 Hz to 10 kHz Speech Recognition It is hard to wreck a nice beach Ice cream I scream Synthesis Speech Music MIDI for 127 instruments, 47 percussion soundsNotes, timingSpeech Recognition Issues Continuous vs Discrete Vocabulary Size Channel (Microphone) Environment (Location of mike and Speaker) Speaker Dependent/Speaker Independent Context (Language Model) Interactivity (Dialog Model)Acoustic ModelingDescribes the sounds thatmake up speechLexiconDescribes which sequences of speechsounds make upvalid wordsLanguage ModelDescribes the likelihoodof various sequences ofwords being spokenSpeech RecognitionSpeech Recognition Knowledge SourcesSpeech VariationsStyle Variationscareful, clear, articulated, formal, casualspontaneous, normal, read,dictated, intimateVoice Qualitybreathy, creaky,whispery, tense,lax, modalContextsport, professional,interview, free conversation,man-machine dialogueSpeaking Ratenormal, slow, fast,very fastStress in noise, with increased vocaleffort (Lombard reflex),emotional factors (e.g. angry),under cognitive loadVideo Frames comprise the video Frame rate = delay between successive frames minimal change between frames Sequencing creates the illusion of movement 16 fps is “smooth” Standards: 29.97 is NTSC, 25 is PAL, 60 is HDTVInterlacing Display scan rate is different monitor refresh rate 60 - 70 Hz (= 1/s)Orthogonal Transforms 从理论上讲正交变换本身不能对信号产生任何影响,但正交变换改变了信号的表现域或表现形式,为某些信号处理和分析如压缩提供了另一种可能更方便的手段.1010210102/ )(2exp),(1),( / )(2exp),(1),( )(2exp),(),( )(2exp),(),( NiNkNmNnNnkmijkiFNnmfNnkmijnmfNkiFdudvvyuxjvuFyxfdxdyvyuxjyxfvuFDiscrete Fourier Transform (DFT). 1, , 2/1, 2/1 ),( 0, 0)1,1(0, 0)1,(0, 0),1(0, 0),(),( : .,; , 4, DCT., ),( )(2sin)(2cos)(2exp )(2sin)(2cos)(2exp :TransformFourier NNkjkikifkikifkikifkikifkikifkifNNNNyxfvyuxjvyuxvyuxjvyuxjvyuxvyuxjss这个虚拟点的对称是对偶对称扩展称为奇对称素如果折叠时重叠一个像称为偶对称块个形成对折将其沿水平和垂直边界的二维图象对于可导出则变换域中只有余弦项对称于原点如果图象中在Discrete Cosine Transform (DCT) 1010101010111/ )2/1(cos/ )2/1(cos),(2),(/ )2/1(cos/ )2/1(cos),(2),( ),( /)2/1()2/1(2exp),(1),(:),(),(FT2/)2/1()2/1(2exp),(21),(:NuNvNiNkssNiNNksssNNiNNkssNkvNjuvuFNkifNkvNjukifNvuFkifNkviujkifNvuFvuFvuFNkviujkifNvuF是实对称函数由于因此具有共轭对称性由于偶对称余弦变换)()(),( )()(),( )()(),( )()(),( WTldimensiona-Two )()(2)2,21)( 0)( )()( )()(),)( 321,2/2/1yxyxyxyxyxyxyxyxdxxxfkfWdttabtatdtttfbafWkjjjjababWavelet Transform (WT)Coding 从信息论角度看: 描述信源的数据由有效信息和冗余量两部分组成,去除冗余量能够节省存储和传输中的开销,同时又不损失信源的有效信息量. 从生理角度看: 一定限度的失真是允许的,如人眼对图象灰度分辨率的局限性,监视器显示分辨率的限制,因此可以对图象信源做一定的甚至很大程度的压缩. 编码压缩的分类: 1) 冗余度压缩: 基于统计模型,减少或完全去除数据流中的冗余,同时保持信息不变. (Statistical Coding). 2) 熵(entropy)压缩: 以牺牲部分信息量为代价而换取缩短平均码长的方法,即有损压缩.Statistical Coding统计编码是根据信源的概率分布特性,分配可变长码字(其具有唯一可译性),降低平均码字长度.Shannon CodingHuffman CodingArithmetic CodingShannon Coding Log2(1/pi) 表示包含在 si 中的信息量,即编码所需的位数,如果信息出现的概率不同,那么用非一致位表示不同的信息比用同一位表示更有效率.Shannon and Fano 提出由上到下的编码方法:图象灰度级 xi 按概率递减排序.将 xi 分成2组,每组的概率和相等或相似,对第一组分配代码“0”,对第二组分配“1”.执行步骤2后,如果每组还是由2个或2个以上灰度级组成,就重复上述步骤,知道每组只有一个灰度级.iiippsH)/1 (log)(2Huffman Coding与Shannon Coding相反,它是一种从下到上的编码方法:将灰度等级按概率从大到小排序.取两个最小概率相加之和取代这两个概率,然后所有概率构成一个新的概率集合(新概率集合的元素个数比执行前少1个),新概率元素在集合中还是遵循由大到小规则确定自己所处的位置.被相加的2个最小概率所对应的灰度级成为Huffman树的一个叶节点,这2个节点构造一个父节点.重复2,3步骤直到只有2个概率为止,这时Huffman树达到了根节点.设所有节点的左后代为“0”,右后代为“1”,那么从根开始经各中间节点到叶节点的路径代码就是叶节点的Huffman码. Huffman coding的效率优于其它统计编码,是一种最佳变长码. 当数据成分复杂时,码表生成困难,编码速度较慢. (排序复杂) Huffman码无错误保护功能. (error propagation) Huffman和Shannon都自含同步码,无需添加标记符号.Arithmetic Coding 通过把信息转换为0,1实数实现编码,已知参数包括每个符号的概率和它的编码间隔. ; ; ; ; ; where:)0 . 1 , 7 . 0)7 . 0 , 5 . 0)5 . 0 , 1 . 0) 1 . 0 , 0 . 0: 0,1) ,3 . 0 , 2 . 0 , 4 . 0 , 1 . 0 , 为当前子区间长度当前符号的区间右端当前符号的区间左端为前子区间的起始位置为新子区间的结束位置为新子区间的起始位置关系式为方便讨论再给出一组内的赋值范围设定为数区间那么各符号在半封闭实它们的概率分别为令信源符号为LCCFNNLCFNLCFNdcbadcbaflagrflaglareasareaeareasflagrareasareaeflaglareasareas. ., 514402. 0 ,5143876. 05146. 0006. 07 . 0514. 0514. 0006. 00 . 0514. 0. 0.5,0.7) 2)0.514,0.5 4)52. 002. 00 . 15 . 0514. 002. 07 . 05 . 0. 0.7,1.0) 0.5,0.52) 3)52. 02 . 01 . 05 . 05 . 02 . 00 . 05 . 0:, 0.0,0.1) 0.5,0.7) , 0.5,0.7) , 2)0.5,0.7). , 1) 范围来确定对应的信源解码是根据参数和代码这种对应是唯一的只要参数确定表示字符串之内的在之内的在即之内的的取值限制在则内已将区间限制在由于前一个符号第二个被压缩符号为代码的取值范围第一个被压缩符号为如果信源为cadacdbNNaNNdNNacaccadacdbareaeareasareaeareasareaeareas算术编码的问题: 精度有限,但可采用位数放大法或分段编码解决. 译码器要在接受这个实数的所有值后译码. 对错误敏感. 要加终止符. 动态建模: 实时精确知道信源概念是困难的.The characteristics of multimedia High data volume Content-based retrieval Quality of service Synchronization Device management Data modeling primitives Interactivity of multimedia applicationsThank you32 结束语结束语