Programme

The workshop is now over. Videos and slides for the talks and keynotes are available through the links in the schedule below. There is also a YouTube Playlist for all talks.

11:00	Welcome
11:10	Oral session 1
12:35	Break
12:45	Keynote 1 : Leibny Paola Garcia Perara (Johns Hopkins University)
13:45	Break
13:55	Oral session 2
15:20	Break
15:30	Keynote 2: Dong Yu (Tencent AI Lab)
16:30	Break
16:40	Oral session 3
18:45	Closing

Oral Session 1

11:10	Overview of the 6th CHiME Challenge [YouTube] [Slides] Shinji Watanabe¹, Michael Mandel², Jon Barker³, Emmanuel Vincent⁴ (¹Center for Language and Speech Processing, Johns Hopkins University; ²Brooklyn College, City University of New York; ³University of Sheffield, UK; ⁴Inria, France)
11:35	The IOA Systems for CHiME-6 Challenge [Paper] [YouTube] [Slides] Hangting Chen^1,2, Pengyuan Zhang^1,2, Qian Shi^1,2, Zuozhen Liu^1,2 (¹Key Laboratory of Speech Acoustics & Content Understanding, Institute of Acoustics, CAS, China; ²University of Chinese Academy of Sciences, Beijing, China)
11:55	The OPPO System for CHiME-6 Challenge [Paper] [YouTube] [Slides] Xiaoming Ren, Huifeng Zhu, Liuwei Wei, Linju Yang, Ming Yu, Chenxing Li, Dong Wei, Jie Hao (Beijing OPPO telecommunications corp., ltd., Beijing, China)
12:15	The Qdreamer Systems for CHiME-6 Challenge [Paper] [YouTube] [Slides] Haoyuan Tang¹, Huanliang Wang¹, Jiajun Wang¹, Li Zhang¹, JiaBin Xue², Zhi Li¹ (¹Qdreamer Research, Suzhou, JiangSu, P.R. China; ²School of Computer Science and Technology, Harbin Institute of Technology, Harbin, P.R. China)

Oral Session 2

13:55	The USTC-NELSLIP Systems for CHiME-6 Challenge [Paper] [YouTube] [Slides] Jun Du¹, Yan-Hui Tu¹, Lei Sun¹, Li Chai¹, Xin Tang¹, Mao-Kui He¹, Feng Ma¹, Jia Pan¹, Jian-Qing Gao¹, Dan Liu¹, Chin-Hui Lee², Jing-Dong Chen³ (¹University of Science and Technology of China, Hefei, Anhui, P. R. China; ²Georgia Institute of Technology, Atlanta, Georgia, USA; ³Northwestern Polytechnical University, Shanxi, P. R. China)
14:20	The CW-XMU System For CHiME-6 Challenge [Paper] [YouTube] [Slides] Xuerui Yang¹, Yongyu Gao¹, Shi Qiu¹, Song Li², Qingyang Hong², Xuesong Liu¹, Lin Li², Dexin Liao², Hao Lu², Feng Tong², Qiuhan Guo², Huixiang Huang², Jiwei Li¹ (¹CloudWalk Technology Co., Ltd.; ²Xiamen University)
14:40	The Academia Sinica Systems of Speech Recognition and Speaker Diarization for the CHiME-6 Challenge [Paper] [YouTube] [Slides] Hung-Shin Lee¹, Yu-Huai Peng¹, Pin-Tuan Huang¹, Ying-Chun Tseng², Chia-Hua Wu¹, Yu Tsao², Hsin-Min Wang¹ (¹Institute of Information Science, Academia Sinica, Taiwan; ²Research Center for Information Technology Innovation, Academia Sinica, Taiwan)
14:55	LEAP Submission to CHiME-6 ASR Challenge [Abstract] [YouTube] [Slides] Anirudh Sreeram, Anurenjan Purushothaman, Rohit Kumar, Sriram Ganapathy. (Learning and Extraction of Acoustic Patterns (LEAP) lab Indian Institute of Science, Bangalore, 560012.)

Oral Session 3

16:40	The STC System for the CHiME-6 Challenge [Paper] [YouTube] [Slides] Ivan Medennikov^1,2 Maxim Korenevsky¹, Tatiana Prisyach¹, Yuri Khokhlov¹, Mariya Korenevskaya¹, Ivan Sorokin¹, Tatiana Timofeeva¹, Anton Mitrofanov¹, Andrei Andrusenko^1,2, Ivan Podluzhny¹, Aleksandr Laptev^1,2, Aleksei Romanenko^1,2 (¹STC-innovations Ltd; ²ITMO University, Saint Petersburg, Russia)
17:05	Towards a speaker diarization system for the CHiME 2020 dinner party transcription [Paper] [YouTube] [Slides] Christoph Boeddeker¹, Tobias Cord-Landwehr¹, Jens Heitkaemper¹, Ca ̆tălin Zorilă², Daichi Hayakawa³, Mohan Li², Min Liu⁴, Rama Doddipatla², Reinhold Haeb-Umbach¹ (¹Paderborn University, Department of Communications Engineering, Paderborn, Germany; ²Toshiba Cambridge Research Laboratory, Cambridge, United Kingdom; ³ Toshiba Corporation Corporate R&D Center, Kawasaki, Japan; ⁴Toshiba China R&D Center, Beijing, China)
17:25	The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge [Paper] [YouTube] [Slides] Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew ̇Maciejewski, Piotr Zelasko, la Garcia, Shinji Watanabe, Sanjeev Khudanpur (Center for Language and Speech Processing, Johns Hopkins University)
17:45	CUNY Speech Diarization System for the CHiME-6 Challenge [Abstract] [YouTube] [Slides] Zhaoheng Ni¹, Michael I Mandel² (¹The Graduate Center, City University of New York; ²Brooklyn College, City University of New York)
18:05	BUT System for CHiME-6 Challenge [Paper] [YouTube] [Slides] Katerina Zmolikova, Martin Kocour, Federico Landini, Karel Beneš, Martin Karafiát, Hari Krishna Vydana, Alicia Lozano-Diez, Oldřich Plchot, Murali Karthick Baskar,Ján Svec, Ladislav Mošner, Vladimir Malenovský, Lukáš Burget, Bolaji Yusuf,Ondřej Novotný, František Grézl, Igor Szöke, Jan “Honza” Černocký (Brno University of Technology, Faculty of Information Technology, IT4I Centre of Excellence, Czechia)
18:25	Toshiba’s Speech Recognition System for the CHiME 2020 Challenge [Paper] [YouTube] [Slides] Cătălin Zorila ̆¹, Mohan Li¹, Daichi Hayakawa², Min Liu³, Ning Ding² and Rama Doddipatla¹ (¹Toshiba Cambridge Research Laboratory, Cambridge, United Kingdom; ²Toshiba Corporation Corporate R&D Center, Kawasaki, Japan; ³Toshiba China R&D Center, Beijing, China)

Keynote Talks

Dr. Leibny Paola Garcia Perera
Johns Hopkins University

Diarization, the missing link in Speech Technologies

YouTube Slides

Diarization, the missing link in Speech Technologies

Abstract

The amount of unlabeled speech data that is available enormously outweighs the labeled data, and there is great potential in using this data to improve the performance of current speech recognition systems and related technologies. A primary goal of research in this domain is to automatically compute labels for the unlabelled data with an acceptable level of accuracy for downstream applications. One such task is to answer the question "who spoke when" in a recording, identifying regions containing speech and assigning speaker identity labels to each utterance. This labeling, called speaker diarization, is not typically the final task for applications, but often a missing link in a pipeline that can boost the performance of automatic speech recognition and speaker and language identification systems. In this talk, I will guide you through a journey of this missing link. We will start with a brief discussion of the key components that comprise the state-of-the-art systems—discussing the usage of a voice activity detector, speaker embeddings, scoring and clustering techniques. Next, we will demonstrate the aspects in which current systems fail and propose new alternatives to attain better performance. We will address overlap detection, resegmentation, and speaker turn detection among others. In addition, we will give some insights of the newest solutions, such as end-to-end approaches. Then, we will go beyond diarization and explore the positive impact of including a diarization stage in speech and speaker recognition systems. Finally, we will discuss the influence of diarization in other fields such as cognitive science and linguistics.

Bio

Dr. Leibny Paola Garcia Perera (PhD 2014, University of Zaragoza, Spain) joined Johns Hopkins University after extensive research experience in academia and industry, including highly regarded laboratories at Agnitio and Nuance Communications. She lead a team of 20+ researchers from four of the best laboratories worldwide in far-field speech diarization and speaker recognition, under the auspices of the JHU summer workshop 2019 in Montreal , Canada. She was also a researcher at Tec de Monterrey, Campus Monterrey, Mexico for 10 years. She was a Marie Curie researcher for the Iris project during 2015, exploring assistive technology for children with autism in Zaragoza, Spain. She was a visiting scholar at Georgia Institute of Technology (2009) and Carnegie Mellon (2011). Recently, she has been working on children’s speech; including child speech recognition and diarization in day-long recordings. She is also part of the JHU CHiME-5, CHiME-6, SRE18 and SRE19 teams. Her interests include diarization, speech recognition, speaker recognition, machine learning and language processing.

Dr. Dong Yu
Distinguished Scientist, Tencent AI Lab

Solving Cocktail Party Problem – From Single Modality to Multi-Modality

YouTube Slides

Solving Cocktail Party Problem – From Single Modality to Multi-Modality

Abstract

Cocktail party problem is one of the difficult problems yet to be solved to enable high-accuracy speech recognition in everyday environments. In this talk, I will introduce our recent attempts to attack this problem with a focus on multi-channel multi-modal approaches.

Bio

Dr. Dong Yu, IEEE Fellow, is a distinguished scientist and vice general manager at Tencent AI Lab. Prior to joining Tencent in 2017, he was a principal researcher at Microsoft speech and dialog research group. His research works, which focus on statistical speech recognition and processing, have been recognized by the prestigious IEEE Signal Processing Society 2013 and 2016 best paper award and have been widely cited.

Dr. Dong Yu is currently serving as the vice chair of the IEEE Speech and Language Processing Technical Committee (SLPTC). He has served as a member of the IEEE SLPTC (2013-2018), a distinguished lecturer of APSIPA (2017-2018), an associate editor of the IEEE/ACM transactions on audio, speech, and language processing (2011-2015), an associate editor of the IEEE signal processing magazine (2008-2011), and members of organization and technical committees of many conferences and workshops.

Oral Session 1

Oral Session 2

Oral Session 3

Keynote Talks

Dr. Leibny Paola Garcia Perera Johns Hopkins University

Diarization, the missing link in Speech Technologies

Diarization, the missing link in Speech Technologies

Abstract

Bio

Dr. Dong Yu Distinguished Scientist, Tencent AI Lab

Solving Cocktail Party Problem – From Single Modality to Multi-Modality

Solving Cocktail Party Problem – From Single Modality to Multi-Modality

Abstract

Bio

Dr. Leibny Paola Garcia Perera
Johns Hopkins University

Dr. Dong Yu
Distinguished Scientist, Tencent AI Lab