Resume¶

Xin Wang (王鑫)

Project Associate Professor & JST PRESTO researcher

@National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

Education¶

Ph.D: 2015 - 2018, SOKENDAI/NII, Tokyo, Japan.

Fundamental frequency modeling for neural-betwork-based statistical parametric speech synthesis

Supervisor: Prof. Junichi Yamagishi

Thesis committee: Prof. Keiichi Tokuda, Prof. Nobuaki Minematsu, Prof. Isao Echizen, Prof. Yusuke Miyao

Thesis (submitted 2018-06-29): PDF

Slides of pubic defense

Appendix: highway network, SAR, DAR, VQ-VAE

M.Sc.: 2012 - 2015, University of Science and Technology of China, Hefei, China.

Bi-directional optimization for concept-to-speech synthesis (in Chinese)

Supervisor: Prof. Zhen-Hua Ling

Related English paper

B.Sc.: 2008 - 2012, University of Electronic Science and Technology of China, Chengdu, China.

Cross-stream dependency modeling for HMM-based statistical parametric speech synthesis (in Chinese)

Supervisor: Prof. Zhen-Hua Ling (done at USTC)

Related English paper

Academic activity¶

Organizer

ASVspoof5, ASVspoof challenges 2024, 2021, 2019

Voice Privacy Challenge 2024, 2022, 2020

ASRU 2025 special session on Deepfake detection

Interspeech 2025 special session on source tracing of synthetic speech

APSIPA ASC 2019 special session on Deep Generative Models for Media Clones and Its Detection

ISCA Interspeech 2019 special session on ASVspoof

IEEE ASRU 2019 special session on ASVspoof

Guest editor

Computer Speech and Language Special issue on Advances in Automatic Speaker Verification Anti-spoofing

Session chair

Interspeech 2024, ICASSP 2023, ACM MM 2022 DDAM Workshop, ASVspoof workshop 2021, Interspeech 2021, SSW 2019.

Reviewer

IEEE TASLP, TBIOM, TIFS, SPL, ICASSP, ASRU, SLT

ISCA Interspeech, Speech synthesis workshop, Odyssey workshop, Computer speech & language, Speech Communication

IEICE Trans on Information and Systems

EUSIPCO, BIOSIG, COLING-LREC 2024 (meta-reviewer)

Grants¶

2023 - 2027, JST, PRESTO (JPMJPR23P9): Unified framework for speech privacy protection and anti-spoofing. PI: Xin Wang.
2021 - 2023, JSPS, Wakate (21K17775): Speech privacy protection by high-quality, invertible, and extendable speech anonymization and de-anonymization. PI: Xin Wang.
2020 - 2021, KAWAI: Deep-learning-based neural source-filtering models for fast and high-quality music signal generation. PI: Xin Wang.
2021 - 2022, JST AIP Challenge Enhanced End-to-End Multi-Instrument MIDI/sheet-to-Music Synthesis with Timber and Style Transfer. PI: Xin Wang.
2019 - 2021, JSPS, grant-for-startup (19K24371): One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling. PI: Xin Wang.
2021 - 2022, Google Research Grant: Optimizing a Speech Anti-spoofing Database. PI:: Junichi Yamagishi. Collaborator: Xin Wang, Eric Cooper.
2019 - 2020, Google AI Focused Research Awards Program in Japan: Robust and all-purpose neural source-filter models. PI: Junichi Yamagishi. Collaborator: Xin Wang, Eric Cooper.

Publication¶

Journal papers¶

Speech synthesis

Xin Wang, Shinji Takaki, Junichi Yamagishi, Simon King and Keiichi Tokuda. A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 28, pages 157-170. doi: 10.1109/TASLP.2019.2950099. 2020.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 28, pages 402-415. doi: 10.1109/TASLP.2019.2956145. 2020.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Investigating very deep highway networks for parametric speech synthesis. Speech Communication, vol: 96, pages 1-9. doi: 10.1016/j.specom.2017.11.002. 2018.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 26, pages 1406-1419. doi: 10.1109/TASLP.2018.2828650. 2018.
Xin Wang, Zhen-Hua Ling and Li-Rong Dai. Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering. Computer Speech & Language, vol: 38, pages 46-67. doi: 10.1016/j.csl.2015.12.003. 2016.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis. IEICE Transactions on Information and Systems, vol: E99.D, pages 2471-2480. doi: 10.1587/transinf.2016SLP0011. 2016.
Cheng Gong, Xin Wang, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang, Korin Richmond and Junichi Yamagishi. ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 32, pages 4036-4051. doi: 10.1109/TASLP.2024.3451951. 2024.
Yusuke Yasuda, Xin Wang and Junichi Yamagishi. Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis. Computer Speech & Language, vol: 67, pages 101183. doi: https://doi.org/10.1016/j.csl.2020.101183. 2021.
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki and Junichi Yamagishi. Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences. IEEE Access, vol: 8, pages 138149-138161. doi: 10.1109/ACCESS.2020.3011975. 2020.

Antispoofing & deepfake detection
Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer, Cheng Gong, Hanjie Guo, Liping Chen, and Vishwanath Singh. 2025. ASVspoof 5: Design, collection and validation of public resources for spoofing, deepfake, and adversarial attack detection using crowdsourced speech. Computer Speech & Language (2025). doi.org/doi.org/10.1016/j.csl.2025.101825
Xin Wang, Junichi Yamagishi, Massimiliano Todisco, H{'{e}}ctor Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, S{'{e}}bastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Fran{c{c}}ois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang and Zhen-Hua Ling. ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech. Computer Speech & Language, vol: 64, pages 101114. doi: 10.1016/j.csl.2020.101114. 2020.
Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, and Shinji Watanabe. 2025. SpoofCeleb: Speech Deepfake Detection and SASV In The Wild. IEEE Open Journal of Signal Processing (2025).
Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper and Junichi Yamagishi. Joint Speaker Encoder and Neural Back-End Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances. Computer Speech & Language, vol: 86, doi: 10.1016/j.csl.2024.101619. 2024.
Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, H{'{e}}ctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch and Kong Aik Lee. ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 31, pages 2507-2522. doi: 10.1109/TASLP.2023.3285283. 2023.
Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans and Junichi Yamagishi. The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 1-13. doi: 10.1109/TASLP.2022.3233236. 2022.
Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi H. Kinnunen, Ville Vestman, Massimiliano Todisco, Hector Delgado, Md Sahidullah, Junichi Yamagishi and Kong Aik Lee. ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech. IEEE Transactions on Biometrics, Behavior, and Identity Science, vol: 3, pages 252-265. doi: 10.1109/TBIOM.2021.3059479. 2021.
Tomi Kinnunen, Hector Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi and Douglas A Reynolds. Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 28, pages 2195-2210. doi: 10.1109/TASLP.2020.3009494. 2020.

Speech anonymization
Xiaoxiao Miao, Yuxiang Zhang, Xin Wang, Natalia Tomashenko, Donny Cheng Lock Soh, and Ian Mcloughlin. 2025. Adapting general disentanglement-based speaker anonymization for enhanced emotion preservation. Computer Speech & Language 94, (November 2025), 101810. doi.org/10.1016/j.csl.2025.101810
Xiaoxiao Miao, Ruijie Tao, Chang Zeng, and Xin Wang. A Benchmark for Multi-speaker Anonymization. IEEE Trans.Inform.Forensic Secur. doi: 10.1109/TIFS.2025.3556345. 2025
Michele Panariello, Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas Evans, Emmanuel Vincent and Junichi Yamagishi. The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 1-14. doi: 10.1109/TASLP.2024.3430530. 2024.
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi and Natalia Tomashenko. Speaker Anonymization Using Orthogonal Householder Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 31, pages 3681-3695. doi: 10.1109/TASLP.2023.3313429. 2023.
Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier No{'{e}}, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O’Brien, Ana{"{i}}s Chanclu, Jean-Fran{c{c}}ois Bonastre, Massimiliano Todisco and Mohamed Maouche. The VoicePrivacy 2020 Challenge: Results and findings. Computer Speech & Language, pages 101362. doi: https://doi.org/10.1016/j.csl.2022.101362. 2022.
Brij Mohan Lal Srivastava, Mohamed Maouche, Md Sahidullah, Emmanuel Vincent, Aurelien Bellet, Marc Tommasi, Natalia Tomashenko, Xin Wang and Junichi Yamagishi. Privacy and Utility of X-Vector Based Speaker Anonymization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol: 30, pages 2383-2395. doi: 10.1109/TASLP.2022.3190741. 2022.

Book chapters¶

antispoofing

Xin Wang and Junichi Yamagishi. A Practical Guide to Logical Access Voice Presentation Attack Detection. Frontiers in Fake Media Generation and Detection, pages 169-214. doi: 10.1007/978-981-19-1524-6_8. 2022.
Md Sahidullah, H{'{e}}ctor Delgado, Massimiliano Todisco, Andreas Nautsch, Xin Wang, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi and Kong-Aik Lee. Introduction to Voice Presentation Attack Detection and Recent Advances. Handbook of Biometric Anti-Spoofing, pages 339-385. doi: 10.1007/978-981-19-5288-3_13. 2023.

Conference papers¶

Speech synthesis

Xin Wang and Junichi Yamagishi. Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. Proc. Interspeech, pages 1992-1996. doi: 10.21437/Interspeech.2020-1018. 2020.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis. Proc. ICASSP, pages 5916-5920. doi: 10.1109/ICASSP.2019.8682298. 2019.
Xin Wang and Junichi Yamagishi. Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. Proc. SSW, pages 1-6. doi: 10.21437/SSW.2019-1. 2019.
Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela and Junichi Yamagishi. A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis. Proc. ICASSP, pages 4804-4808. 2018.
Xin Wang, Shinji Takaki and Junichi Yamagishi. An autoregressive recurrent mixture density network for parametric speech synthesis. Proc. ICASSP, pages 4895-4899. 2017.
Xin Wang, Shinji Takaki and Junichi Yamagishi. An RNN-based quantized F0 model with multi-tier feedback links for text-to-speech synthesis. Proc. Interspeech, pages 1059-1063. 2017.
Xin Wang, Minghui Dong and Zhenhua Ling. A full training framework of cross-stream dependence modelling for HMM-based singing voice synthesis. Proc. ICASSP, pages 5165-5169. doi: 10.1109/ICASSP.2016.7472662. 2016.
Xin Wang, Shinji Takaki and Junichi Yamagishi. A comparative study of the performance of HMM, DNN, and RNN based speech synthesis systems trained on very large speaker-dependent corpora. Proc. SSW, pages 125-128. 2016.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Investigating very deep highway networks for parametric speech synthesis. Proc. SSW, pages 181-186. 2016.
Xin Wang, Shinji Takaki and Junichi Yamagishi. Enhance the word vector with prosodic information for the recurrent neural network based TTS system. Proc. Interspeech, pages 2856-2860. 2016.
Xin Wang, Zhen-Hua Ling and Li-Rong Dai. Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis. Proc. Interspeech, pages 2942-2946. 2014.
Xin Wang, Zhen-Hua Ling and Li-Rong Dai. Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis. Proc. ISCSLP, pages 84-87. 2012.
Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond and Junichi Yamagishi. An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios. Proc. Interspeech, pages 4963-4967. 2024.
Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi and Shrikanth Narayanan. Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?. Proc. ICASSP, pages . doi: 10.1109/ICASSP49357.2023.10095848. 2023.
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper and Junichi Yamagishi. How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?. Proc. ICASSP, pages 6488-6492. doi: 10.1109/ICASSP39728.2021.9414175. 2021.
Yang Ai, Haoyu Li, Xin Wang, Junichi Yamagishi and Zhenhua Ling. Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation. Proc. SLT, pages 477-484. doi: 10.1109/SLT48900.2021.9383611. 2021.
Yusuke Yasuda, Xin Wang and Junichi Yamagishd. End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE. Proc. ICASSP, pages 5694-5698. 2021.
Erica Cooper, Xin Wang and Junichi Yamagishi. Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis. Proc. SSW, pages 130-135. doi: 10.21437/SSW.2021-23. 2021.
Yi Zhao, Xin Wang, Lauri Juvela and Junichi Yamagishi. Transferring neural speech waveform synthesizers to musical instrument sounds generation. Proc. ICASSP, pages 6269-6273. doi: 10.1109/ICASSP40776.2020.9053047. 2020.
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen and Junichi Yamagishi. Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. Proc. ICASSP, pages 6184-6188. 2020.
Yusuke Yasuda, Xin Wang and Junichi Yamagishi. Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment. Proc. ICASSP, pages 6724-6728. 2020.
Yang Ai, Xin Wang, Junichi Yamagishi and Zhen-Hua Ling. Reverberation Modeling for Source-Filter-Based Neural Vocoder. Proc. Interspeech, pages 3560-3564. 2020.
Yusuke Yasuda, Xin Wang, Shinji Takaki and Junichi Yamagishi. Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language. Proc. ICASSP, pages 6905-6909. 2019.
Fuming Fang, Xin Wang, Junichi Yamagishi and Isao Echizen. Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics. Proc. ICASSP, pages 6795-6799. 2019.
Shinji Takaki, Toru Nakashika, Xin Wang and Junichi Yamagishi. STFT spectral loss for training a neural speech waveform model. Proc. ICASSP, pages 7065-7069. 2019.
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa. Training multi-speaker neural text-to-speech systems using speaker-imbalanced speech corpora. Proc. Interspeech, pages 1303-1307. doi: 10.21437/Interspeech.2019-1311. 2019.
Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li and Junichi Yamagishi. Joint training framework for text-to-speech and voice conversion using multi-source tacotron and WaveNet. Proc. Interspeech, pages 1298-1302. doi: 10.21437/Interspeech.2019-1357. 2019.
Yusuke Yasuda, Xin Wang and Junichi Yamagishi. Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments. Proc. SSW, pages 211-216. doi: 10.21437/SSW.2019-38. 2019.
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki and Junichi Yamagishi. Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences. Proc. SSW, pages 111-116. doi: 10.21437/SSW.2019-20. 2019.
Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo and Junichi Yamagishi. Cyborg speech: Deep multilingual speech synthesis for generating segmental foreign accent with natural prosody. Proc. ICASSP, pages 4799-4803. 2018.
Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi and Paavo Alku. Speech waveform synthesis from MFCC sequences with generative adversarial networks. Proc. ICASSP, pages 5679-5683. 2018.
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa. Investigating accuracy of pitch-accent annotations in neural-network-based speech synthesis and denoising effects. Proc. Interspeech, pages 37-41. 2018.
Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi and Tomi Kinnunen. Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data. Proc. Odyssey, pages 240-247. doi: 10.21437/Odyssey.2018-34. 2018.
Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang and Junichi Yamagishi. Principles for learning controllable TTS from annotated and latent variation. Proc. Interspeech, pages 3956-3960. doi: 10.21437/Interspeech.2017-171. 2017.
Lauri Juvela, Xin Wang, Shinji Takaki, Manu Airaksinen, Junichi Yamagishi and Paavo Alku. Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks. Proc. Interspeech, pages 2283-2287. 2016.

Speech antispoofing & deepfake detection
Xin Wang, H{'e}ctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee and Junichi Yamagishi. ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale. ASVspoof Workshop 2024, pages 1-8. 2024.
Xin Wang, Tomi Kinnunen, Lee Kong Aik, Paul-Gauthier Noe and Junichi Yamagishi. Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis. Proc. Interspeech, pages 1110-1114. 2024.
Xin Wang and Junichi Yamagishi. Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?. Proc. ICASSP, pages 10311-10315. 2024.
Xin Wang and Junichi Yamagishi. Investigating Active-learning-based Training Data Selection for Speech Spoofing Countermeasure. Proc. SLT, pages 585-592. 2023.
Xin Wang and Junichi Yamagishi. Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders. Proc. ICASSP, pages . 2023.
Xin Wang and Junichi Yamagishi. Estimating the Confidence of Speech Spoofing Countermeasure. Proc. ICASSP, pages 6372-6376. doi: 10.1109/ICASSP43922.2022.9746204. 2022.
Xin Wang and Junichi Yamagishi. Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures. Proc. Odyssey, pages 100-106. doi: 10.21437/Odyssey.2022-14. 2022.
Xin Wang and Junichi Yamagishi. A comparative study on recent neural spoofing countermeasures for synthetic speech detection. Proc. Interspeech, pages 4259-4263. doi: 10.21437/Interspeech.2021-702. 2021.
Wanying Ge, Xin Wang, and Junichi Yamagishi. 2025. Proactive detection of speaker identity manipulation with neural watermarking. In The 1st workshop on GenAI watermarking, 2025.
Lauri Juvela and Xin Wang. Audio codec augmentation for robust collaborative watermarking of speech synthesis. In Proc. ICASSP, 2025. https://doi.org/10.1109/ICASSP49660.2025.10888976
Massimiliano Todisco, H{'e}ctor Delgado, Lee Kong Aik, Nicholas Evans, Michele Panariello and Xin Wang. Malacopula: Adversarial Automatic Speaker Verification Attacks Using a Neural-Based Generalised Hammerstein Model. ASVspoof Workshop 2024, pages 94-100. 2024.
Tomi Kinnunen, Rosa Gonzalez Hautamaki, Xin Wang and Junichi Yamagishi. Speaker Detection by the Individual Listener and the Crowd: Parametric Models Applicable to Bonafide and Deepfake Speech. Proc. Interspeech, pages 3654-3658. doi: 10.21437/Interspeech.2024-1704. 2024.
Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi and Joon Son Chung. To What Extent Can ASV Systems Naturally Defend against Spoofing Attacks?. Proc. Interspeech, pages 3240-3244. 2024.
Lin Zhang, Xin Wang, Erica Cooper, Mireia Diez, L, Federico ini, Nicholas Evans and Junichi Yamagishi. Spoof Diarization: “What Spoofed When” in Partially Spoofed Audio. Proc. Interspeech, pages 502-506. 2024.
Wanying Ge, Xin Wang, Junichi Yamagishi, Massimiliano Todisco and Nicholas Evans. Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?. Proc. ICASSP, pages 12531-12535. 2024.
Lauri Juvela and Xin Wang. Collaborative Watermarking for Adversarial Speech Synthesis. Proc. ICASSP, pages 11231-11235. 2024.
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim and Jee-weon Jung. Towards Single Integrated Spoofing-aware Speaker Verification Embeddings. Proc. Interspeech, pages 3989-3993. doi: 10.21437/Interspeech.2023-1402. 2023.
Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper and Junichi Yamagishi. Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms. Proc. Interspeech, pages 1998-2002. doi: 10.21437/Interspeech.2023-125. 2023.
Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans and Junichi Yamagishi. Range-Based Equal Error Rate for Spoof Localization. Proc. Interspeech, pages 3212-3216. doi: 10.21437/Interspeech.2023-1214. 2023.
Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi and Nicholas Evans. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. Proc. Odyssey, pages 112-119. 2022.
Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino and Nicholas Evans. An Initial Investigation for Detecting Partially Spoofed Audio. Proc. Interspeech, pages 4264-4268. doi: 10.21437/Interspeech.2021-738. 2021.
Lin Zhang, Xin Wang, Erica Cooper and Junichi Yamagishi. Multi-task Learning in Utterance-level and Segmental-level Spoof Detection. Proc. ASVspoof Challenge workshop, pages 9-15. doi: 10.21437/ASVSPOOF.2021-2. 2021.
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, H{'{e}}ctor Delgado, Junichi Yamagishi and Kong Aik Lee. Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing. Proc. Interspeech, pages 4299-4303. doi: 10.21437/Interspeech.2021-1522. 2021.
Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans and H{'{e}}ctor Delgado. ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. Proc. ASVspoof Challenge workshop, pages 47-54. doi: 10.21437/ASVSPOOF.2021-8. 2021.
Massimiliano Todisco, Xin Wang, Ville Vestman, Md. Sahidullah, H{'{e}}ctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi H Kinnunen and Kong Aik Lee. ASVspoof 2019: future horizons in spoofed and fake audio detection. Proc. Interspeech, pages 1008-1012. doi: 10.21437/Interspeech.2019-2249. 2019.

Speech anonymization
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-Fran{c c}ois Bonastre and Mickael Rouvier. SynVox2: Towards a Privacy-Friendly VoxCeleb2 Dataset. Proc. ICASSP, pages 11421-11425. 2024.
Paul-Gauthier Noe, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-Francois Bonastre and Driss Matrouf. Hiding Speaker’s Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline. Proc. ICASSP, pages . 2023.
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi and Natalia Tomashenko. Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models. , pages 279-286. doi: 10.21437/Odyssey.2022-39. 2022.
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi and Natalia Tomashenko. Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions. Proc. Interspeech, pages 4426-4430. doi: 10.21437/Interspeech.2022-11065. 2022.
Jean-Fran{c{c}}ois Bonastre, H{'{e}}ctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier Noe, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang and Junichi Yamagishi. Benchmarking and challenges in security and privacy for voice biometrics. Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, pages 52-56. doi: 10.21437/SPSC.2021-11. 2021.
Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-Fran{c{c}}ois Bonastre, Paul-Gauthier No{'{e}} and Massimiliano Todisco. Introducing the VoicePrivacy Initiative. Proc. Interspeech, pages 1693-1697. doi: 10.21437/Interspeech.2020-1333. 2020.
Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aur{'{e}}lien Bellet and Marc Tommasi. Design Choices for X-Vector Based Speaker Anonymization. Proc. Interspeech, pages 1713-1717. doi: 10.21437/Interspeech.2020-2692. 2020.
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans and Jean-Francois Bonastre. Speaker anonymization using X-vector and neural waveform models. Proc. SSW, pages 155-160. doi: 10.21437/SSW.2019-28. 2019.

Other topics
Jingjing Tang, Xin Wang, Zhe Zhang, Junichi Yamagishd, and Fazekas George. 2025. MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling. In Proc. ISMIR, 2025.
Jingjing Tang, Erica Cooper, Xin Wang, Junichi Yamagishi, and George Fazekas. 2025. Towards an integrated approach for expressive piano performance synthesis from music scores. In Proc. ICASSP, 2025.
Nicolas Jonason, Xin Wang, Erica Cooper, Lauri Juvela, Bob L. T. Sturm, and Junichi Yamagishi. 2024. DDSP-based neural waveform synthesis of polyphonic guitar performance from string-wise MIDI input. In Proc. DAFX, 2024.
Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao and Junichi Yamagishi. Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances. Proc. ICASSP, pages (accepted). 2022.
Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao and Hsin-Min Wang. MOSnet: deep learning-based objective assessment for voice conversion. Proc. Interspeech, pages 1541-1545. doi: 10.21437/Interspeech.2019-2003. 2019.
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki and Junichi Yamagishi. Investigating RNN-based speech enhancement methods for noise-robust text-to-speech. Proc. SSW, pages 146-152. 2016.
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki and Junichi Yamagishi. Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks. Proc. Interspeech, pages 352-356. 2016.

Talk¶

(Slides are available in Talk & slides)

2024 Sep, Survey talk on Voice Privacy (link SEP-2024)
2023 Nov, VoicePersonae and ASVspoof workshop (link NOV-2023):
- Talk 1: From DSP and DNN to DNN+DSP for waveform model
- Talk 2: Harnessing data to improving speech spoofing countermeasure
2023 Aug, Interspeech 2023 tutorial Advances in audio anti-spoofing and deepfake detection using graph neural networks and self-supervised learning. Materials are available on github.
2023 Mar, SPSC webinar: using vocoders to create training data for speech spoofing countermeasure (link MAR-2023).
2022 Sep, SPSC Symposium: tutorial on speaker anonymization (software part) (link :ref: label-slide-2022-sep-1).
2022 May, ICASSP 2022 short course: inclusive Neural Speech Synthesis - neural vocoder part (link MAY-2022).
2021 Dec, Speech Synthesis Forum, China Computer Federation: Two speech security issues after the speech synthesis boom (link OCT-2021).
2021 Oct, JST Science Agora 2021, pre-Agora event: Deepfakes: High-tech Illusions to Trick the Human Brain., with Sascha Frühholz (University of Zurich), Erica Cooper, Florence Steiner (University of Zurich). Video is here
2021 July, Tutorial at ISCA 2021 Speech Processing Courses in Crete: Advancement in Neural Vocoders, with Prof. Yamagishi (link JUL-2021).
2020 Nov., Tutorial as ISCA 2020 Speaker Odyssey: Neural statistical parametric speech synthesis (link DEC-2020).
2020 July, Tutorial at ISCA 2020 Speech Processing Courses in Crete: Neural auto-regressive, source-filter and glottal vocoders for speech and music signals, with Prof. Yamagishi (link JUL-2020).
2019 Sep, Fraunhofer IIS, invited talk: Neural waveform models for text-to-speech synthesis (link SEP-2019).
2019 Jan, IEICE Technical Committee on Speech (SP), invited tutorial, Kanazawa, Japan: Tutorial on recent neural waveform models (link JAN-2019).
2018 Nov, Nagoya Institute of Technology, Tokuda lab: Autoregressive neural networks for parametric speech synthesis (link JAN-2018).
2018 Jun, University of Eastern Finland and Aalto University Autoregressive neural networks for parametric speech synthesis (same content as above).

Awards & scholarship¶

Best paper award for SSW 2016, ISCA SynSig
SOKENDAI Award, SOKENDAI, Japan
Young Researcher’s Award in Speech Field, IEICE ISS, Japan
11th IEEE Signal Processing Society Japan Student Best Paper Award, IEEE Japan
MEXT Scholarship (Ph.D 2015 - 2018), Japan

Language¶

Mandarin
English (Toefl 2015, 112/120)
Japanese (N1, 2021 Dec, 169/180)