CCF Young Computer Scientists & Engineers Forum
9:00 签到
9:30 报告会开始
特邀讲者:Bruce Denby博士,法国皮埃尔与玛丽-居里大学(巴黎第六大学)教授
演讲题目:The OUISPER Project-- How to Talk without Speaking
The talk introduces the concept of the Silent Speech Interface, or SSI, for medical and telecommunications applications, surveys technologies for implementing one, and presents latest results on using medical ultrasound for an SSI.
Professor Bruce Denby received a BS degree from the California Institute of Technology, an MS from Rutgers University, and a PhD from the University of California at Santa Barbara, all in physics. Since 1995 he has been full professor of electronics, signal processing, and telecommunications in France, most recently at Pierre and Marie Curie University in Paris. His current interests are in radiocommunication and speech processing.
特邀报告2: (10:45-11:45)
特邀讲者:Frank K. Soong 博士,微软亚洲研究院语音组首席研究主任及研究经理
演讲题目:High Quality Speech Synthesis and Talking Head for Language Learning and Cross-lingual Voice Conversion
A universal algorithm called HMM Trajectory Tiling (HTT) is presented for synthesizing high quality speech and photo-realistic talking head and its pplications in foreign language learning and cross-lingual voice conversion will be covered. A database is first collected first to train a statistical, generative Hidden Markov Model (HMM), where corresponding statistical HMM parameters are estimated. Thus trained HMMs can predict acoustic or articulator trajectories for input text. The predicted trajectory by HMM are then in turn to guide sample (speech segments or articular images) based synthesis. Appropriate samples (N nearest neighbors) in the original database are selected to for a “sausage” network and Viterbi algorithm is used to optimize a path search in the network to find the final high quality, speech/articulator movement output. Different training criteria are compared and demos for natural user interfacing will be presented. The HTT algorithm is also used for training a cross-lingual voice conversion TTS system in a target language with speech/articulator data from a monolingual speaker in a source language. The talking head and TTS has been deployed very successfully to as a dynamic dictionary for helping English as a Second Language (ESL) learners. The web site is accessed by more than one million users per day.
Frank Soong is a Principal Researcher and Manager, Speech Group at Microsoft Research Asia (MSRA), Beijing, China, where he works on fundamental speech research and its practical applications, His professional research career spans over 30 years, first with Bell Labs, US, then with ATR, Japan before joining MSRA in 2004. At Bell Labs, he was responsible for developing speech recognition algorithm and making the voice-activated dialing mobile phone product rated by the Mobile Office Magazine (Apr. 1993) as the “outstandingly the best”. He is a co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package.
He is an IEEE Fellow and serves the Speech and Language Technical Committee, IEEE Signal Processing Society and other society functions, including Associate Editor of the IEEE Speech and Audio Transactions and chairing IEEE International Workshop.
He published extensively with more than 200 papers and co-edited a widely used reference book, Automatic Speech and Speech Recognition- Advanced Topics, Kluwer, 1996.
He is the co-Director of the MSRA-CUHK Joint Research Lab at Chinese University of Hong Kong and a visiting professor of Tianjin Univ, CUHK, Zhejiang Univ. and a few other top-rated universities in China. He got his BS, MS and PhD from National Taiwan Univ., Univ. of Rhode Island, and Stanford Univ, all in Electrical Eng.