AI-based Digital Synthesizer Preset Programming: Parameter Estimation for Sound Matching
Presenter: Soohyun Kim
Abstract:
In this seminar, we will start by reviewing the historical background of digital synthesizer parameter estimation, from FM synthesis (Chowning, 1973; Schottstaedt, 1977; Chowning, 1980) to the discrete Hilbert transform analysis (Justice, 1979), the time-variant spectral centroid tracking (Beauchamp, 1982), and the genetic algorithm (Horner et al., 1993).
The parameter estimation of digital synthesizers, also known as the "sound matching" problem, aims to determine the parameter values that allow the synthesizer's output sound to best imitate a given target sound, especially real-world sounds (e.g., strings, brass, woodwinds, drums, human voice, etc.).
The recent advent of DDSP (Differentiable Digital Signal Processing; Engel et al., 2020) in this AI era has brought renewed attention to the digital synthesizer parameter estimation problem. DDSP integrates neural networks into a traditional DSP system like digital synthesizers, enabling neural networks to serve as parameter estimators for the DSP system after being trained in a data-driven manner.
We will review the most recent papers on the DDSP application for digital synthesizer parameter estimation (Caspe et al., 2022; Masuda et al, 2023; Ye et al., 2023).
Finally, we will have a discussion session on future research—how we can improve these DDSP neural networks' parameter estimation and how we can utilize these DDSP neural networks (inversely!!!) as timbre controllers to explore new musical expressions on digital synthesizers.
Reference:
[1] Chowning, John M. "The synthesis of complex audio spectra by means of frequency modulation." Journal of the audio engineering society 21.7 (1973): 526-534.
[2] Schottstaedt, Bill. "The simulation of natural instrument tones using frequency modulation with a complex modulating wave." Computer Music Journal (1977): 46-50.
[3] Chowning, John M. "Computer Synthesis of the Singing Voice." In J. Sondberg, ed. Sound Generation in Wind, Strings, Computers. Stockholm: Royal Swedish Academy of Music (1980).
[4] Justice, James. "Analytic signal processing in music computation." IEEE Transactions on Acoustics, Speech, and Signal Processing 27.6 (1979): 670-684.
[5] Beauchamp, James W. "Synthesis by spectral amplitude and" Brightness" matching of analyzed musical instrument tones." Journal of the Audio Engineering Society 30.6 (1982): 396-406.
[6] Horner, Andrew, James Beauchamp, and Lippold Haken. "Machine tongues XVI: Genetic algorithms and their application to FM matching synthesis." Computer Music Journal 17.4 (1993): 17-29.
[7] Yee-King, Matthew John, Leon Fedden, and Mark d'Inverno. "Automatic programming of VST sound synthesizers using deep networks and other techniques." IEEE Transactions on Emerging Topics in Computational Intelligence 2.2 (2018): 150-159.
[8] Barkan, Oren, et al. "Inversynth: Deep estimation of synthesizer parameter configurations from audio signals." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.12 (2019): 2385-2396.
[9] Engel, Jesse, et al. "DDSP: Differentiable digital signal processing." International Conference on Learning Representations (2020).
[10] Turian, Joseph, and Max Henry. "I'm sorry for your loss: Spectrally-based audio distances are bad at pitch." arXiv preprint arXiv:2012.04572 (2020).
[11] Schwär, Simon, and Meinard Müller. "Multi-Scale Spectral Loss Revisited." IEEE Signal Processing Letters 30 (2023): 1712-1716.
[12] Caspe, Franco, Andrew McPherson, and Mark Sandler. "Ddx7: Differentiable fm synthesis of musical instrument sounds." arXiv preprint arXiv:2208.06169 (2022).
[13] Masuda, Naotake, and Daisuke Saito. "Improving semi-supervised differentiable synthesizer sound matching for practical applications." IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023): 863-875.
[14] Ye, Zhen, et al. "NAS-FM: neural architecture search for tunable and interpretable sound synthesis based on frequency modulation." arXiv preprint arXiv:2305.12868 (2023).
Bio:
Soohyun Kim is an incoming PhD student at CCRMA, Stanford University, advised by Prof. Chris Chafe. He is also completing his master's at CCRMA advised by Prof. Julius Smith. His primary research interests lie in (1) creative sound synthesis through neural networks and (2) human-AI interaction design for new music performance.
He holds a bachelor's degree in electrical engineering with a minor in physics from KAIST, where he conducted research on neural sound synthesis with Prof. Juhan Nam (CCRMA PhD 2013).
In addition to his academic pursuits, Soohyun is a music producer and recording/mixing engineer trained in South Korea, having participated in multiple popular music production projects. As a musician, he is a guitarist and singer.
Zoom:
https://stanford.zoom.us/j/7733389381?pwd=Z2xkVUEvNTA3dTRSclBHUlNRZGFZdz09
Meeting ID: 773 338 9381
Passcode: 601462