Talk & slides¶
In most cases, I cannot directly share audio samples. Some samples can be found through the link in the PDF.
Talk¶
SEP-24-2024¶
APSIPA China-Japan Joint Symposium: Introduction to the research of NII Yamagishi Lab
SEP-2024¶
Interspeech 2024 presentation: Revisiting score fusion for spoofing-aware speaker verification
Interspeech 2024 survey talk: Current trend in speech privacy and security
Slides (privacy part): PDF
APR-2024¶
ICASSP 2024 presentation: Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
Using large-scale spoofed data to updated SSL front end of speech anti-spoofing model.
NOV-2023¶
VoicePersonae workshop talk 2: Harnessing data to improve speech spoofing countermeasures
High-level summary of the talk to use vocoded data to train speech anti-spoofing models.
Slides can be downloaded here dropbox.
VoicePersonae workshop talk 1: DNN+DSP waveform model
An overview talk given at VoicePersonae workshop. The title is From DSP and DNN to DNN/DSP: Neural speech waveform models and its applications in speech and music audio waveform modelling.
Slides can be downloaded here dropbox.
OCT-2023¶
Shonan Seminar: casual presentation
During the No.182 Shonan Seminar https://shonan.nii.ac.jp/seminars/182/, I had chance to introduce voice privacy.
Slides are available on dropbox.
AUG-2023¶
Interspeech Tutorial: anti-spoofing
Interspeech 2023 tutorial Advances in audio anti-spoofing and deepfake detection using graph neural networks and self-supervised learning.
Slides and notebook are available on github.
MAR-2023¶
SPSC Webinar: using vocoders to create spoofed data for speech spoofing countermeasures
for ICASSP 2023 paper “Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders”.
SEP-2022¶
SPSC Symposium: tutorial on speaker anonymization (software part)
This short tutorial shows the basic process of speaker anonymization, using baselines in Voice Privacy Challenge 2022.
The hands-on notebook is available on Google Colab.
MAY-2022¶
ICASSP 2022 short course: neural vocoder
This talk briefly summarizes a few representative neural vocoders. For a more detailed talk, please check the slide for Advancement in Neural Vocoders.
The hands-on materials used for this short course cover a few latest neural vocoders. There are step-to-step instructions on implementation, demonstration with pre-trained models, and detailed explanation on some common DSP and deep learning techniques. Please check Google Colab.
DEC-2021¶
Two Speech Security Issues after Speech Synthesis Boom
This talk briefly introduces anti-spoofing (audio deepfake detection) and voice privacy. It is mainly for new comers to these fields.
The slide can be found on dropbox here (PPTX), (PDF).
OCT-2021¶
DeepFake: high-tech illusions to deceive human brains
This is a talk given at JST Science Agora with Dr. Erica Cooper.
It is an introduction on anti-spoofing (audio deepfake detection).
JUL-2021¶
Advancement in Neural Vocoders
This is the tutorial on neural vocoders, at ISCA 2021 Speech Processing Courses in Crete, with Prof. Yamagishi.
It was a very long tutorial (>3 hours). Slides are on slideshare (I only own part of it).
The hands-on materials were re-edited and uploaded to Google Colab. See ICASSP 2022 short course: neural vocoder.
DEC-2020¶
Tutorial on Neural statistical parametric speech synthesis
This is a tutorial on text-to-speech synthesis, at ISCA speaker Odyssey 2020.
It is mainly on sequence-to-sequence TTS acoustic models (both soft- and hard-attention based approaches), but it also covers some basic ideas from the classical HMM-based approaches.
PDF and PPT slides are available.
The video is on youtube
There many audios samples collected from reference papers’ official websites or from open domain data repository.
NOV-2020¶
Neural vocoders for speech and music signals
This an invited talk at YAMAHA, with Prof. Yamagishi.
Nothing can be disclosed.
JUL-2020¶
Neural auto-regressive, source-filter and glottal vocoders for speech and music signals
This is the early version of the tutorial on neural vocoders, given at ISCA 2020 Speech Processing Courses in Crete, with Prof. Yamagishi.
The hands-on materials were re-edited and uploaded to Google Colab. See ICASSP 2022 short course: neural vocoder.
SEP-2019¶
Neural waveform models for text-to-speech synthesis
Invited talk given at Fraunhofer IIS, Erlangen, Germany.
This is about the neural source-filter vocoders and related experiments done by 2019.
Slide is here 1
JAN-2019¶
Tutorial on recent neural waveform models
This is a talk on neural vocoders, but the contents and explanations are based on my knowledge by then. It is out-of-date. Please check tutorials above for my latest understanding.
IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan. Slide is here 2
JAN-2018¶
Autoregressive neural networks for parametric speech synthesis
This is a talk on the previous-generation TTS system. It talks about autoregressive models for F0 prediction.
It was given at Nagoya Institute of Technology, Tokuda lab, and Aalto University, Paavo Alku lab. Slide is here 3
Conference presentation¶
Anti-spoofing: Interspeech 2021 presentation for Comparative study on ASVspoof 2019 LA, PPT. Codes are available at git repo project/03-asvspoof-mega
NSF model (latest ver.): Interspeech 2020 presentation for cyclic-noise-NSF – PPT and PDF slides . Natural samples are from CMU-arctic
NSF model (2nd ver.): SSW 2019 for paper Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis
NSF model (1st ver.): ICASSP 2019 for paper Neural Source-Filter-Based Waveform Model for Statistical Parametric Speech Synthesis
Speech synthesis comparison: ICASSP 2018 for paper A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis
Deep AR F0 model: Interspeech 2017 slide for paper An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Shallow AR model: ICASSP 2017 slide for paper An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis.
Speech synthesis: SSW 2016 slide for paper A Comparative Study of the Performance of HMM, DNN, and RNN Based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Prosody embedding: Interspeech 2016 slide for paper Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
HMM-based speech synthesis: ICASSP 2016 slide. For paper A Full Training Framework of Cross-Stream Dependence Modelling for HMM-Based Singing Voice Synthesis.
MISC¶
On CURRENNT toolkit. These slides were made a long time ago during weekends, and they may be sloppy :)
CURRENNT basics
CURRENNT LSTM explanation
CURRENNT CNN implementation
CURRENNT mixture density network
CURRENNT WaveNet
CURRENNT WaveNet is also explained in another slide with more figures.