Talk & slides¶
Here are somes slides.
In most cases, I cannot directly share samples through PDF. Some samples can be found through the link in the PDF.
Tutorial on Neural statistical parametric speech synthesis (recent sequence-to-sequence TTS models). 2020 Oct, For Odyssey 2020. PDF and PPT slides are available. Audios are collected from reference papers’ official websites or from open domain data repository.
Neural vocoders for speech and music signals. 2020 Nov, invited talk at YAMAHA, with Prof. Yamagishi.
Neural auto-regressive, source-filter and glottal vocoders for speech and music signals. 2020 Jul, Tutorial at ISCA 2020 Speech Processing Courses in Crete, with Prof. Yamagishi. Hands-on-materials on github.
Neural waveform models for text-to-speech synthesis. 2019 Sep, Fraunhofer IIS, invited talk, Erlangen, Germany. Slide is here 1
Tutorial on recent neural waveform models. 2019 Jan, IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan. Slide is here 2
Autoregressive neural networks for parametric speech synthesis, 2018 Jan, Nagoya Institute of Technology, Tokuda lab & 2018 Jun, Aalto University, Paavo Alku lab. Slide is here 3
NSF model (2nd ver.): SSW 2019 for paper Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis
NSF model (1st ver.): ICASSP 2019 for paper Neural Source-Filter-Based Waveform Model for Statistical Parametric Speech Synthesis
Speech synthesis comparison: ICASSP 2018 for paper A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis
Deep AR F0 model: Interspeech 2017 slide for paper An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Shallow AR model: ICASSP 2017 slide for paper An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis.
Speech synthesis: SSW 2016 slide for paper A Comparative Study of the Performance of HMM, DNN, and RNN Based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Prosody embedding: Interspeech 2016 slide for paper Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
HMM-based speech synthesis: ICASSP 2016 slide. For paper A Full Training Framework of Cross-Stream Dependence Modelling for HMM-Based Singing Voice Synthesis.