Talk & slides

Here are somes slides.

In most cases, I cannot directly share samples through PDF. Some samples can be found through the link in the PDF.


Advancement in Neural Vocoders. 2021 July Tutorial at ISCA 2021 Speech Processing Courses in Crete, with Prof. Yamagishi. Hands-on-materials on github. Slides is here.

Tutorial on Neural statistical parametric speech synthesis (recent sequence-to-sequence TTS models). 2020 Oct, For Odyssey 2020. PDF and PPT slides are available. Audios are collected from reference papers’ official websites or from open domain data repository.

Neural vocoders for speech and music signals. 2020 Nov, invited talk at YAMAHA, with Prof. Yamagishi.

Neural auto-regressive, source-filter and glottal vocoders for speech and music signals. 2020 Jul, Tutorial at ISCA 2020 Speech Processing Courses in Crete, with Prof. Yamagishi. Hands-on-materials on github.

Neural waveform models for text-to-speech synthesis. 2019 Sep, Fraunhofer IIS, invited talk, Erlangen, Germany. Slide is here 1

Tutorial on recent neural waveform models. 2019 Jan, IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan. Slide is here 2

Autoregressive neural networks for parametric speech synthesis, 2018 Jan, Nagoya Institute of Technology, Tokuda lab & 2018 Jun, Aalto University, Paavo Alku lab. Slide is here 3

Conference presentation

Anti-spoofing: Interspeech 2021 presentation for Comparative study on ASVspoof 2019 LA, PPT. Codes are available at git repo project/03-asvspoof-mega

NSF model (latest ver.): Interspeech 2020 presentation for cyclic-noise-NSF – PPT and PDF slides . Natural samples are from CMU-arctic

NSF model (2nd ver.): SSW 2019 for paper Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

NSF model (1st ver.): ICASSP 2019 for paper Neural Source-Filter-Based Waveform Model for Statistical Parametric Speech Synthesis

Speech synthesis comparison: ICASSP 2018 for paper A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Deep AR F0 model: Interspeech 2017 slide for paper An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.

Shallow AR model: ICASSP 2017 slide for paper An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis.

Speech synthesis: SSW 2016 slide for paper A Comparative Study of the Performance of HMM, DNN, and RNN Based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.

Prosody embedding: Interspeech 2016 slide for paper Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.

HMM-based speech synthesis: ICASSP 2016 slide. For paper A Full Training Framework of Cross-Stream Dependence Modelling for HMM-Based Singing Voice Synthesis.


On CURRENNT toolkit. These slides were made a long time ago during weekends, and they may be sloppy :)

CURRENNT WaveNet is also explained in another slide with more figures.