.. temp documentation master file, created by
   sphinx-quickstart on Sun Aug 30 20:18:34 2020.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. _label-slide:

Talk & slides
*************

In most cases, I cannot directly share audio samples. Some samples can be found through the link in the PDF.

Talk
============

.. _label-slide-2024-apr-1:

APR-2024
--------

**ICASSP 2024 presentation**: Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?

Using large-scale spoofed data to updated SSL front end of speech anti-spoofing model.

* Paper: https://ieeexplore.ieee.org/document/10446331
* Slides: `PPT <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABJVYX7cWN0h6ocsBRDvnoVa/web/20240418-ICASSP24-SLP.L20.2.pptx>`__ and `PDF <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAAzSUKPDVpO-R2NpE8TIIqYa/web/20240418-ICASSP24-SLP.L20.2.pdf>`__

.. _label-slide-2023-nov-1:

NOV-2023
--------

**VoicePersonae workshop talk 2: Harnessing data to improve speech spoofing countermeasures**

High-level summary of the talk to use vocoded data to train speech anti-spoofing models.

Slides can be downloaded here `dropbox <https://www.dropbox.com/sh/gf3zp00qvdp3row/AACIULzpbAQNmP6GjGSwnjAIa/web/20231122_ASVspoof_data.pdf>`__.

**VoicePersonae workshop talk 1: DNN+DSP waveform model**

An overview talk given at VoicePersonae workshop. The title is From DSP and DNN to DNN/DSP: Neural speech waveform models and its applications in speech and music audio waveform modelling.

Slides can be downloaded here `dropbox <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAD7uQRQpnJsbverEVlrl2OBa/web?dl=0&preview=20231121_VoicePersonae_DSP-NDSP.pdf>`__.

.. _label-slide-2023-oct-31:

OCT-2023
--------
**Shonan Seminar: casual presentation**

During the No.182 Shonan Seminar  https://shonan.nii.ac.jp/seminars/182/, I had chance to introduce voice privacy.

Slides are available on `dropbox <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABwpFTp7e7E7T7O8QKinQYRa/web/20231102_shonan-seminar-v1.pdf>`__.

.. _label-slide-2023-aug-1:

AUG-2023
--------
**Interspeech Tutorial: anti-spoofing**

Interspeech 2023 tutorial Advances in audio anti-spoofing and deepfake detection using graph neural networks and self-supervised learning.

Slides and notebook are available on `github <https://github.com/Jungjee/INTERSPEECH2023_T6>`__.

.. _label-slide-2023-mar-1:

MAR-2023
--------
**SPSC Webinar: using vocoders to create spoofed data for speech spoofing countermeasures**

for `ICASSP 2023 paper <https://arxiv.org/abs/2210.10570>`__ "Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders".  

Slides `in PDF <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAA8o9fpoJV27JL2y02_p46Ea/web/20230306_spsc_webinar_xinwang.pdf>`__ and `PPTX <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABdRnr6WPKr0cI4DU32FPN2a/web/20230306_spsc_webinar_xinwang.pptx>`__


.. _label-slide-2022-sep-1:


SEP-2022
--------
**SPSC Symposium: tutorial on speaker anonymization (software part)**

This short tutorial shows the basic process of speaker anonymization, using baselines in Voice Privacy Challenge 2022.

The hands-on notebook is available on `Google Colab <https://colab.research.google.com/drive/1_zRL_f9iyDvl_5Y2Rdakg0hYAl_5Rgyq?usp=sharing>`__.


.. _label-slide-2022-may-1:


MAY-2022
--------

**ICASSP 2022 short course: neural vocoder**

This talk briefly summarizes a few representative neural vocoders. For a more detailed talk, please check :ref:`the slide for Advancement in Neural Vocoders <label-slide-2021-jul-1>`.

The hands-on materials used for this short course cover a few latest neural vocoders. There are step-to-step instructions on implementation, demonstration with pre-trained models, and detailed explanation on some common DSP and deep learning techniques. Please check `Google Colab <https://colab.research.google.com/drive/1EO-ggi1U9f2zXwTiqg7AEljVx11JKta7>`_. 


.. _label-slide-2021-dec-1:

DEC-2021
--------

**Two Speech Security Issues after Speech Synthesis Boom**

This talk briefly introduces anti-spoofing (audio deepfake detection) and voice privacy. It is mainly for new comers to these fields.

The slide can be found `on dropbox here (PPTX) <https://www.dropbox.com/sh/gf3zp00qvdp3row/AADDhVJGzMbXEquzf2Z1Y8YHa/web/CCF-talk-2021.pptx>`_, `(PDF) <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAANoSBvdc4y16CteakUcF9Ia/web/CCF-talk-2021.pdf>`_.


.. _label-slide-2021-oct-1:

OCT-2021
--------

**DeepFake: high-tech illusions to deceive human brains**

This is a talk given at JST Science Agora with Dr. Erica Cooper.

It is an introduction on anti-spoofing (audio deepfake detection).

Here is the part presented by me: `Agora PDF <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAC3cXcoPNA7M8MHB2CAXnY5a/web/Science-Agora-2021_part2.pdf>`_  and `Aogra PPT <https://www.dropbox.com/sh/gf3zp00qvdp3row/AADLL5SEUSZ-fRPGSl_eiYRba/web/Science-Agora-2021_part2.pptx>`_. 

.. _label-slide-2021-jul-1:

JUL-2021
--------

**Advancement in Neural Vocoders**

This is the tutorial on neural vocoders, at ISCA 2021 Speech Processing Courses in Crete, with Prof. Yamagishi.

It was a very long tutorial (>3 hours). Slides are `on slideshare <https://www.slideshare.net/jyamagis/advancements-in-neural-vocoders>`_ (I only own part of it).

The hands-on materials were re-edited and uploaded to Google Colab. See :ref:`ICASSP 2022 short course: neural vocoder <label-slide-2022-may-1>`.


.. _label-slide-2020-dec-1:

DEC-2020
--------

**Tutorial on Neural statistical parametric speech synthesis**

This is a tutorial on text-to-speech synthesis, at ISCA speaker Odyssey 2020.

It is mainly on sequence-to-sequence TTS acoustic models (both soft- and hard-attention based approaches), but it also covers some basic ideas from the classical HMM-based approaches.

`PDF <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABFY0RiorILzSuX1YuQXyA7a/web/Odyssesy2020_Tutorial_TTS_XINWANG.pdf?raw=1>`_ and `PPT slides <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABn3DyzRuZeBJwEGPV1ouFSa/web/Odyssesy2020_Tutorial_TTS_XINWANG.pptx?raw=1>`_ are available.

The video is on `youtube <https://youtu.be/WCe7SYcDzAI>`_


There many audios samples collected from reference papers' official websites or from open domain data repository.


.. _label-slide-2020-nov-1:

NOV-2020
--------
**Neural vocoders for speech and music signals**

This an invited talk at YAMAHA, with Prof. Yamagishi.

Nothing can be disclosed.


.. _label-slide-2020-jul-1:

JUL-2020
--------

**Neural auto-regressive, source-filter and glottal vocoders for speech and music signals**

This is the early version of the tutorial on neural vocoders, given at ISCA 2020 Speech Processing Courses in Crete, with Prof. Yamagishi.

The hands-on materials were re-edited and uploaded to Google Colab. See :ref:`ICASSP 2022 short course: neural vocoder <label-slide-2022-may-1>`.


.. _label-slide-2019-sep-1:

SEP-2019
--------
**Neural waveform models for text-to-speech synthesis**


Invited talk given at Fraunhofer IIS, Erlangen, Germany.

This is about the neural source-filter vocoders and related experiments done by 2019.

Slide is `here 1 <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAByUSX6u4O51bGHpIFlgy-ba/web/201909-FraunhoderIIS-neural-waveform-models.pdf?raw=1>`_


.. _label-slide-2019-jan-1:

JAN-2019
--------
**Tutorial on recent neural waveform models**

This is a talk on neural vocoders, but the contents and explanations are based on my knowledge by then. It is out-of-date. Please check tutorials above for my latest understanding.

IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan. Slide is `here 2 <https://www.slideshare.net/jyamagis/tutorial-on-endtoend-texttospeech-synthesis-part-1-neural-waveform-modeling>`_

.. _label-slide-2018-jan-1:

JAN-2018
--------
**Autoregressive neural networks for parametric speech synthesis**

This is a talk on the previous-generation TTS system. It talks about autoregressive models for F0 prediction.

It was given at Nagoya Institute of Technology, Tokuda lab, and Aalto University, Paavo Alku lab. Slide is `here 3 <https://www.dropbox.com/sh/gf3zp00qvdp3row/AACZVX1Tf9Qw1MUc2YHQKf4Ia/web/20180111-Nagoya-ARmodels.pdf?raw=1>`_


Conference presentation
=======================

Anti-spoofing: Interspeech 2021 presentation for `Comparative study on ASVspoof 2019 LA, PPT <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAAbQM0rKGea4t5i5m6rn_F_a/web/2021-interspeech-Fri-M-V-7-1.pdf?raw=1>`_. Codes are available at `git repo project/03-asvspoof-mega <https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts>`_

NSF model (latest ver.): Interspeech 2020 presentation for cyclic-noise-NSF -- `PPT <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAAMoAEj77_oy4FmG0rkCTWwa/web/2020-interspech.pptx?raw=1>`_ and `PDF slides <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAD0BZlZh4TexeLs3VQVY0kJa/web/2020-interspech.pdf?raw=1>`_ . Natural samples are from `CMU-arctic <http://www.festvox.org/cmu_arctic/>`_


NSF model (2nd ver.): `SSW 2019 <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABEVzUUqnJ4QbkxiQcjOhM5a/web/2019-ssw.pdf?raw=1>`_ for paper Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis


NSF model (1st ver.): `ICASSP 2019 <https://www.dropbox.com/sh/gf3zp00qvdp3row/AACIlTwfcTeJYNlMBlnZLE52a/web/2019-ICASSP.pdf?raw=1>`_ for paper Neural Source-Filter-Based Waveform Model for Statistical Parametric Speech Synthesis

Speech synthesis comparison: `ICASSP 2018 <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAC8XgykCv9hSChQMgtzAmVSa/web/2018-ICASSP.pdf?raw=1>`_ for paper A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Deep AR F0 model: `Interspeech 2017 slide <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAA0rZJEq6lQYU98mamyterka/web/2017-interspeech.pdf?raw=1>`_ for paper An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
 
Shallow AR model: `ICASSP 2017 slide <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAA5syHnVZvJrljcOILi5U4ga/web/2017-ICASSP.pdf?raw=1>`_ for paper An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis.

Speech synthesis: `SSW 2016 slide <https://www.dropbox.com/sh/gf3zp00qvdp3row/AACozQp08QjxkmyFEDQlMDZha/web/2016_JVoice.pdf?raw=1>`_ for paper A Comparative Study of the Performance of HMM, DNN, and RNN Based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.

Prosody embedding: `Interspeech 2016 slide <https://www.dropbox.com/sh/gf3zp00qvdp3row/AADDYHrpFe6b8AbjWjqpRuqTa/web/2016-interspeech.pdf?raw=1>`_ for paper Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.

HMM-based speech synthesis: `ICASSP 2016 slide <https://www.dropbox.com/sh/gf3zp00qvdp3row/AADzOxHtpW9V6SpRAGEZMLXTa/web/2016-ICASSP.pdf?raw=1>`_. For paper A Full Training Framework of Cross-Stream Dependence Modelling for HMM-Based Singing Voice Synthesis.


MISC
====

On CURRENNT toolkit. These slides were made a long time ago during weekends, and they may be sloppy :)


 * CURRENNT `basics <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABQBuX7Sepgt-1zK49wUTH2a/web/misc-CURRENNT_BASIC.pdf?raw=1>`_ 

 * CURRENNT `LSTM explanation <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAASRaMvZkSc29CyZ_WMXWRIa/web/misc-CURRENNT_LSTM.pdf?raw=1>`_

 * CURRENNT `CNN implementation <https://www.dropbox.com/sh/gf3zp00qvdp3row/AACH1seKkkLfLjEhOsWFr3gSa/web/misc-CURRENNT_CNN.pdf?raw=1>`_

 * CURRENNT `mixture density network <https://www.dropbox.com/sh/gf3zp00qvdp3row/AABz4QF9IN5Fa1NlwCrNghJKa/web/misc-CURRENNT_MDN.pdf?raw=1>`_

 * CURRENNT `WaveNet <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAB5Q1Hdm9WBW8IZ6nepSH9xa/web/misc-CURRENNT_WaveNet.pdf?raw=1>`_


CURRENNT WaveNet is also explained in `another slide <https://www.dropbox.com/sh/gf3zp00qvdp3row/AAAxWSo8bmFTTEi0mmJOPPQ_a/web/2018-SLP-tsukuba.pdf?raw=1>`_ with more figures.


.. toctree::
   :hidden:
   :maxdepth: 1