Talk & slides

In most cases, I cannot directly share audio samples. Some samples can be found through the link in the PDF.

Talk

APR-2024

ICASSP 2024 presentation: Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?

Using large-scale spoofed data to updated SSL front end of speech anti-spoofing model.

NOV-2023

VoicePersonae workshop talk 2: Harnessing data to improve speech spoofing countermeasures

High-level summary of the talk to use vocoded data to train speech anti-spoofing models.

Slides can be downloaded here dropbox.

VoicePersonae workshop talk 1: DNN+DSP waveform model

An overview talk given at VoicePersonae workshop. The title is From DSP and DNN to DNN/DSP: Neural speech waveform models and its applications in speech and music audio waveform modelling.

Slides can be downloaded here dropbox.

OCT-2023

Shonan Seminar: casual presentation

During the No.182 Shonan Seminar https://shonan.nii.ac.jp/seminars/182/, I had chance to introduce voice privacy.

Slides are available on dropbox.

AUG-2023

Interspeech Tutorial: anti-spoofing

Interspeech 2023 tutorial Advances in audio anti-spoofing and deepfake detection using graph neural networks and self-supervised learning.

Slides and notebook are available on github.

MAR-2023

SPSC Webinar: using vocoders to create spoofed data for speech spoofing countermeasures

for ICASSP 2023 paper “Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders”.

Slides in PDF and PPTX

SEP-2022

SPSC Symposium: tutorial on speaker anonymization (software part)

This short tutorial shows the basic process of speaker anonymization, using baselines in Voice Privacy Challenge 2022.

The hands-on notebook is available on Google Colab.

MAY-2022

ICASSP 2022 short course: neural vocoder

This talk briefly summarizes a few representative neural vocoders. For a more detailed talk, please check the slide for Advancement in Neural Vocoders.

The hands-on materials used for this short course cover a few latest neural vocoders. There are step-to-step instructions on implementation, demonstration with pre-trained models, and detailed explanation on some common DSP and deep learning techniques. Please check Google Colab.

DEC-2021

Two Speech Security Issues after Speech Synthesis Boom

This talk briefly introduces anti-spoofing (audio deepfake detection) and voice privacy. It is mainly for new comers to these fields.

The slide can be found on dropbox here (PPTX), (PDF).

OCT-2021

DeepFake: high-tech illusions to deceive human brains

This is a talk given at JST Science Agora with Dr. Erica Cooper.

It is an introduction on anti-spoofing (audio deepfake detection).

Here is the part presented by me: Agora PDF and Aogra PPT.

JUL-2021

Advancement in Neural Vocoders

This is the tutorial on neural vocoders, at ISCA 2021 Speech Processing Courses in Crete, with Prof. Yamagishi.

It was a very long tutorial (>3 hours). Slides are on slideshare (I only own part of it).

The hands-on materials were re-edited and uploaded to Google Colab. See ICASSP 2022 short course: neural vocoder.

DEC-2020

Tutorial on Neural statistical parametric speech synthesis

This is a tutorial on text-to-speech synthesis, at ISCA speaker Odyssey 2020.

It is mainly on sequence-to-sequence TTS acoustic models (both soft- and hard-attention based approaches), but it also covers some basic ideas from the classical HMM-based approaches.

PDF and PPT slides are available.

The video is on youtube

There many audios samples collected from reference papers’ official websites or from open domain data repository.

NOV-2020

Neural vocoders for speech and music signals

This an invited talk at YAMAHA, with Prof. Yamagishi.

Nothing can be disclosed.

JUL-2020

Neural auto-regressive, source-filter and glottal vocoders for speech and music signals

This is the early version of the tutorial on neural vocoders, given at ISCA 2020 Speech Processing Courses in Crete, with Prof. Yamagishi.

The hands-on materials were re-edited and uploaded to Google Colab. See ICASSP 2022 short course: neural vocoder.

SEP-2019

Neural waveform models for text-to-speech synthesis

Invited talk given at Fraunhofer IIS, Erlangen, Germany.

This is about the neural source-filter vocoders and related experiments done by 2019.

Slide is here 1

JAN-2019

Tutorial on recent neural waveform models

This is a talk on neural vocoders, but the contents and explanations are based on my knowledge by then. It is out-of-date. Please check tutorials above for my latest understanding.

IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan. Slide is here 2

JAN-2018

Autoregressive neural networks for parametric speech synthesis

This is a talk on the previous-generation TTS system. It talks about autoregressive models for F0 prediction.

It was given at Nagoya Institute of Technology, Tokuda lab, and Aalto University, Paavo Alku lab. Slide is here 3

Conference presentation

Anti-spoofing: Interspeech 2021 presentation for Comparative study on ASVspoof 2019 LA, PPT. Codes are available at git repo project/03-asvspoof-mega

NSF model (latest ver.): Interspeech 2020 presentation for cyclic-noise-NSF – PPT and PDF slides . Natural samples are from CMU-arctic

NSF model (2nd ver.): SSW 2019 for paper Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

NSF model (1st ver.): ICASSP 2019 for paper Neural Source-Filter-Based Waveform Model for Statistical Parametric Speech Synthesis

Speech synthesis comparison: ICASSP 2018 for paper A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Deep AR F0 model: Interspeech 2017 slide for paper An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.

Shallow AR model: ICASSP 2017 slide for paper An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis.

Speech synthesis: SSW 2016 slide for paper A Comparative Study of the Performance of HMM, DNN, and RNN Based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.

Prosody embedding: Interspeech 2016 slide for paper Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.

HMM-based speech synthesis: ICASSP 2016 slide. For paper A Full Training Framework of Cross-Stream Dependence Modelling for HMM-Based Singing Voice Synthesis.

MISC

On CURRENNT toolkit. These slides were made a long time ago during weekends, and they may be sloppy :)

CURRENNT WaveNet is also explained in another slide with more figures.