Audio Understanding and Room Acoustics in the Era of AI
Date:
Fri, 10/14/2022 - 3:30pm - 4:20pm
Location:
CCRMA Classroom [Knoll 217]
Event Type:
DSP Seminar In the second part of the talk, we will talk about neural architectures without any convolution or recurrence and discuss how Transformer architectures have revolutionized machine listening. We will discuss how we can use classic signal processing ideas, such as wavelets and powerful Transformer architectures, to get significant gains, which each could not do individually. We would also talk about learning time-frequency representations different than classic Fourier representations.
Finally, if time permits, we would go back in time and explore how one could build state-of-the-art architecture without having access to the tools at our disposal. Can we still do machine listening without advancements like attention, transformers, convolutions, and recurrence? We show that one can still do a decent job with simple neural architectures developed back in 2006 and simple counting statistics, beating all previous architectures even as late as 2019! :)
The talk will be self-contained. People have taken the first course in audio/signal processing (what spectrograms are), and knowing a small amount of machine learning(what feed-forward neural networks are) should be sufficient. That being said, it will cover novel and state-of-the-art concepts and methods, and there will be something even for experienced graduate researchers/ faculties from music/NLP/signal processing/audio/AI fields.
This work was done with Chris Chafe and Jonathan Berger at CCRMA. In addition, we thank the Institute of Human-Centered AI at Stanford University (Stanford HAI) for supporting this work.
Bio: Prateek Verma is currently a researcher at Stanford working at the intersection of music/signal processing/audio/optimization/neural architectures. He got his Master's degree from Stanford, and before that, he was at IIT Bombay. He loves biking, hiking, and playing sports.
FREE
Open to the Public