Junhyeok Lee (from JHU), "Differentiable Phase Augmentation for Speech Synthesis"
Abstract
One of the most important goals of generative models is achieving a one-to-many mapping. In audio signals, the phase is a key component that contributes to one-to-many mapping, particularly in vocoder tasks. However, current audio generative models often overlook this aspect. In this seminar, we will discuss the application of domain knowledge from digital signal processing to deep generative modeling, including concepts such as the Fourier transform, Jensen-Shannon sampling, Nyquist frequency, quantization, and others.
Bio
Junhyeok Lee is a Ph.D. student at Johns Hopkins University’s Center for Language and Speech Processing, under the supervision of Professor Najim Dehak. His research focuses on generative modeling for speech, particularly text-to-speech (TTS) and voice conversion (VC). He graduated from the Korea Science Academy (KSA) in 2014 and earned both his Bachelor’s and Master’s degrees in Mechanical Engineering from KAIST in 2018 and 2020, respectively. Junhyeok has industry experience as the Chief AI Scientist at maum.ai (formerly MINDsLab) and as a Research Scientist at Supertone Inc. of HYBE Corp., where he researched and developed AI-driven speech synthesis.
Zoom
https://stanford.zoom.us/j/7733389381?pwd=Z2xkVUEvNTA3dTRSclBHUlNRZGFZdz09
Meeting ID: 773 338 9381
Passcode: 601462