Talk & slides¶

In most cases, I cannot directly share audio samples. Some samples can be found through the link in the PDF.

Talk¶

SEP-24-2024¶

APSIPA China-Japan Joint Symposium: Introduction to the research of NII Yamagishi Lab

Slides (gdrive): PDF and PPT

SEP-2024¶

Interspeech 2024 survey talk: Current trend in speech privacy and security

Slides (privacy part): PDF

NOV-2023¶

VoicePersonae workshop talk 2: Harnessing data to improve speech spoofing countermeasures

High-level summary of the talk to use vocoded data to train speech anti-spoofing models. Slides can be downloaded here gdrive.

VoicePersonae workshop talk 1: DNN+DSP waveform model

An overview talk given at VoicePersonae workshop. The title is From DSP and DNN to DNN/DSP: Neural speech waveform models and its applications in speech and music audio waveform modelling. Slides can be downloaded here gdrive.

OCT-2023¶

Shonan Seminar

During the No.182 Shonan Seminar https://shonan.nii.ac.jp/seminars/182/, I had chance to introduce voice privacy. Slides are available on gdrive.

AUG-2023¶

Interspeech Tutorial: anti-spoofing

Interspeech 2023 tutorial Advances in audio anti-spoofing and deepfake detection using graph neural networks and self-supervised learning.

Slides and notebook are available on github.

MAR-2023¶

SPSC Webinar: using vocoders to create spoofed data for speech spoofing countermeasures

Based on ICASSP 2023 paper Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders.

Slides in PDF and PPTX

SEP-2022¶

SPSC Symposium: tutorial on speaker anonymization (software part)

This short tutorial shows the basic process of speaker anonymization, using baselines in Voice Privacy Challenge 2022.

The hands-on notebook is available on Google Colab.

MAY-2022¶

ICASSP 2022 short course: neural vocoder

This talk summarizes a few representative neural vocoders. For a more detailed talk, please check the slide for Advancement in Neural Vocoders.

The hands-on materials cover a few latest neural vocoders. There are step-to-step instructions on implementation, demonstration with pre-trained models, and detailed explanation on some common DSP and deep learning techniques. Please check Google Colab.

DEC-2021¶

Two Speech Security Issues after Speech Synthesis Boom

This talk introduces anti-spoofing (audio deepfake detection) and voice privacy. It is mainly for new comers to these fields.

The slide can be found on gdrive, (PPT).

OCT-2021¶

DeepFake: high-tech illusions to deceive human brains

This is a talk given at JST Science Agora with Dr. Erica Cooper. It is an introduction on anti-spoofing (audio deepfake detection).

Slides: Agora PDF and PPT.

JUL-2021¶

Advancement in Neural Vocoders

This is the tutorial on neural vocoders, at ISCA 2021 Speech Processing Courses in Crete, with Prof. Yamagishi. It was a very long tutorial (>3 hours). Slides here.

The hands-on materials were re-edited and uploaded to Google Colab. See ICASSP 2022 short course: neural vocoder.

DEC-2020¶

Tutorial on Neural statistical parametric speech synthesis

This is a tutorial on text-to-speech synthesis, at ISCA speaker Odyssey 2020. It is mainly on sequence-to-sequence TTS acoustic models (both soft- and hard-attention based approaches), but it also covers some basic ideas from the classical HMM-based approaches.

PDF and PPT slides are available.

The video is on youtube. There many audios samples collected from reference papers’ official websites or from open domain data repository.

NOV-2020¶

Neural vocoders for speech and music signals

This an invited talk at YAMAHA, with Prof. Yamagishi. No slides available.

JUL-2020¶

Neural auto-regressive, source-filter and glottal vocoders for speech and music signals

This is the early version of the tutorial on neural vocoders, given at ISCA 2020 Speech Processing Courses in Crete, with Prof. Yamagishi.

The hands-on materials were re-edited and uploaded to Google Colab. See ICASSP 2022 short course: neural vocoder.

SEP-2019¶

Neural waveform models for text-to-speech synthesis

Invited talk given at Fraunhofer IIS, Erlangen, Germany.

This is about the neural source-filter vocoders and related experiments done by 2019.

Slide is here and here

JAN-2019¶

Tutorial on recent neural waveform models

This is a talk on neural vocoders, but the contents and explanations are based on my knowledge by then. It is out-of-date. Please check tutorials above for my latest understanding.

IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan. Slides not available

JAN-2018¶

Autoregressive neural networks for parametric speech synthesis

This is a talk on the previous-generation TTS system. It talks about autoregressive models for F0 prediction.

It was given at Nagoya Institute of Technology, Tokuda lab, and Aalto University, Paavo Alku lab. Slide is here

Conference presentation¶

ASVSPOOF-2024¶

Summary of ASVspoof 5: PDF

IS-2024¶

Revisiting score fusion for spoofing-aware speaker verification

Paper: https://www.isca-archive.org/interspeech_2024/wang24l_interspeech.html
Slides: PDF and PDF
Github: https://github.com/nii-yamagishilab/SpeechSPC-mini

ICASSP-2024¶

Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?

Paper: https://ieeexplore.ieee.org/document/10446331
Slides: PDF and PPT

ICASSP-2023¶

Anti-spoofing using vocoded data: PDF

SLT-2022¶

Anti-spoofing using active learning: PDF

ODYSSEY-2022¶

Anti-spoofing using SSL features: PDF.

IS-2021¶

Anti-spoofing: Interspeech 2021 presentation for Comparative study on ASVspoof 2019 LA PDF. Codes are available at git repo project/03-asvspoof-mega git:

IS-2020¶

NSF model (latest ver.): Interspeech 2020 presentation for cyclic-noise-NSF – PPT and PDF slides . Natural samples are from CMU-arctic

SSW-2019¶

NSF model (2nd ver.): SSW 2019 for paper Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

ICASSP-2019¶

NSF model (1st ver.): ICASSP 2019 for paper Neural Source-Filter-Based Waveform Model for Statistical Parametric Speech Synthesis

ICASSP-2018¶

Speech synthesis comparison: ICASSP 2018 for paper A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

IS-2017¶

Deep AR F0 model: Interspeech 2017 slide for paper An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.

ICASSP-2017¶

Shallow AR model: ICASSP 2017 slide for paper An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis.

SSW-2016¶

Speech synthesis: SSW 2016 slide for paper A Comparative Study of the Performance of HMM, DNN, and RNN Based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.

IS-2016¶

Prosody embedding: Interspeech 2016 slide for paper Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.

ICASSP-2016¶

HMM-based speech synthesis: ICASSP 2016 slide. For paper A Full Training Framework of Cross-Stream Dependence Modelling for HMM-Based Singing Voice Synthesis.

MISC¶

On CURRENNT toolkit. These slides were made a long time ago during weekends, and they may be sloppy :)

CURRENNT basics

CURRENNT LSTM explanation

CURRENNT CNN implementation

CURRENNT mixture density network

CURRENNT CUDA implementation of WaveNet

CURRENNT WaveNet is also explained in another slide with more figures.