Spoken Language Communication Laboratory

What is the Spoken Language Communication Laboratory?
The Spoken Language Communication (SLC) Laboratory conducts research to address problems related to human communication, with a focus of speech and language, paralanguage, and nonverbal information. By applying various artificial intelligence technologies, including deep learning, our lab is tackling tasks that were previously unsolvable. Additionally, we seek knowledge related to human cognitive functions, as well as new information obtained through brain measurements, and utilize it to conduct research. Especially in research activities, we focus not only on theoretical aspects but also on the applicability of technology, aiming to build prototype systems and validate them.

Learn more


Who is Professor Satoshi Nakamura?
Professor Satoshi Nakamura was a director and full professor of the Augmented Human Communication Laboratory, Information Science Division, until his formal retirement in March 2024, and has directed the Spoken Language Communication Laboratory as a specially appointed Professor and Professor Emeritus since April 2024.
Learn more

Research Projects

JSPS KAKEN-S Project

Project Title: A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method

Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science
Period:2021~2025 Project ID:21H05054
PI: Satoshi NAKAMURA, Professor, Graduate School of Science and Technology,
Nara Institute of Science and Technology




JST CREST Project

Project Title: TAPAS: Adapted Personalised Affective Social Skills with Cultural Virtual Agents

Japan Science and Technology Agency, JST-ANR CREST
Period: 2019~2024, Project finished.
PI: Satoshi NAKAMURA, Professor, Graduate School of Science and Technology,
Nara Institute of Science and Technology



Impotant links:

Augmented Human Communication Laboratory, Information Science, Lab Page (2011.4-2024.3):

The laboratory was closed due to the retirement of Professor Satoshi Nakamura.
This link provided access to all the information about the AHC, including its members, research topics, projects, and publications.
http://ahcweb01.naist.jp/en/ (English)
http://ahclab.naist.jp/ (Japanese)

Augmented Human Communication Laboratory, Information Science, Official Page(2011.4-2024.3):

The laboratory was closed due to the retirement of Professor Satoshi Nakamura.
https://isw3.naist.jp/Research/mi-ahc-en.html (English)
https://isw3.naist.jp/Research/mi-ahc-ja.html (Japanese)

JSPS Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science, KAKEN-S:
Period:2021~2025, Project ID:21H05054

Automatic speech-to-speech translation has long been a dream technology for humanity. Through numerous breakthroughs from years of research, we have now reached the stage where this service can be used on smartphones. However, there are still many challenges remaining before we can achieve translation quality comparable to that of a trained simultaneous interpreter. Key issues include how to translate across languages with different word orders, such as between English and Japanese, without waiting for the end of a sentence or utterance; how to balance latency and content fidelity in translation; and how to extract the speaker’s intent from their intonation.
Prof. Satoshi Nakamura has been working on automatic speech-to-speech translation/interpretation research with JSPS funding since 2014.
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-21H05054
AHC Lab Website: http://ahclab.naist.jp/page-4794/kakenhi-ngst/ (Japanese)

Japan Science and Technology Agency,  JST-ANR CREST Project:  TAPAS: Adapted Personalised Affective Social Skills with Cultural Virtual Agents
Period: 2019~2024, Project finished.

This project aims to develop tools and methods for Social Skill Training (SST) for a diverse range of populations. Social skills refer to the ability to manage verbal and nonverbal behaviors during interactions with one or more persons. People facing difficulties interacting with others often struggle to use their social behaviors appropriately and interpret them in others. Therapists use SST to help individuals practice social interaction and overcome their social fears. It relies on role play as a means to put participants in a controlled social situation.
https://www.jst.go.jp/kisoken/aip/colab/image/researchers/pdf/1111098_30263429_en.pdf


JSPS KAKEN-S Project

A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method
Official link: https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-21H05054
AHC Lab Website: http://ahclab.naist.jp/page-4794/kakenhi-ngst/ (Japanese)

Outline of Research

In this study, we address the following three key challenges:
Challenge 1 : Multimodal Simultaneous Interpretation Methods
We aim to advance multimodal simultaneous interpretation by incorporating paralinguistic speech translation, video, pre-existing, and external knowledge sources. This includes optimizing interpretation output and enhancing incremental speech translation techniques.
Challenge 2 : Evaluation Methods and Real-Time Assessment Technologies for Interpretation Quality
We will analyze the interpretation process and develop technologies to support human interpreters. Furthermore, we aim to establish unified evaluation methods applicable to both human and machine interpretation systems, including objective and automatic evaluation techniques based on sensing data such as brain activity.
Challenge 3 : Corpus Construction and System Development**
This includes alignment of interpretation timing and quality annotations, corpus augmentation, construction of practical interpretation systems, and the development of an ecosystem for data collection and refinement using active learning. We also aim to establish lifelong learning methods.

Research Outcomes

Challenge 1 Multimodal Simultaneous Interpretation Methods
A) For the issue of "emphasis," we worked on optimal output that combines prosodic speech features and linguistic expressions with a focus on focalization. We conducted a fundamental study on speech conversion and synthesis technologies equipped with paralinguistic control capabilities. Regarding expressive speech translation, we investigated speaker-specific prosodic synchronization and facial expression individuality in emotional speech. We also explored methods for identity preservation during keyframe interpolation in video generation.
B) As a case study in subtitle translation, we attempted pre-adaptation by explicitly providing information such as domain and character attributes.
C) For optimizing interpretation output, we investigated interpreting strategies using the Local Agreement method and the AlignAtt method, and proceeded with sequentialization of the language processing module in speech synthesis.
Challenge 2 Evaluation Methods and Real-Time Assessment Technologies for Interpretation Quality**
A) We further analyzed interpretation strategies such as "consecutive delivery" and "omission." In collaboration with incremental translation techniques, we explored their practical applications, identifying useful technologies for interpreter assistance.
B) We investigated automatic interpretation quality metrics that consider interpreters' perspectives, such as the degree of consecutive delivery.
C) We advanced research using EEG to analyze syntactic structures that induce high cognitive load, the relationship between word order variations and cognitive load, and employed phase-amplitude coupling (PAC) to analyze such load.
Challenge 3 Corpus Construction and System Development
A) We enhanced an interpretation parallel corpus using automatic alignment and explored its application in simultaneous interpretation systems and quality evaluation.
B) We developed policies for constructing a 50-hour multimodal paralinguistic annotated corpus and a 50-hour prior knowledge corpus.
C) We integrated and evaluated modules, and continued to improve system performance through participation in IWSLT evaluation tasks, aiming to design and implement a sustainable development ecosystem.

Member list

Principal Investigator
Prof. Satoshi Nakamura, NAIST
Co-Investigator
Prof. Tatsuya Kawahara, Kyoto University
Prof. Tomoki Toda, Nagoya University
Prof. Shigeo Morishima, Waseda University
Prof. Hiroshi Saruwatari, Tokyo University
Prof. Kayo Matsushita, Rikkyo University
Prof. Katsuhito Sudo, Nara Womens' University
Prof. Shinnosuke Takamichi, Keio University
Prof. Taro Watanabe, NAIST
Prof. Sakriani Sakti, NAIST
Prof. Hiroki Tanaka, International Christian University
Dr. Seitaro Shinagawa, NAIST (-2024.3)
Prof. Yu Yamada, Rikkyo University (-2024.3)

JST CREST Project
(finished at 2024.12)

ANR-CREST: TAPAS
Training Adapted Personalised Affective Social Skills with Cultural Virtual Agents
Official link: https://www.jst.go.jp/kisoken/aip/colab/image/researchers/pdf/1111098_30263429_en.pdf

Outline of Research

Social skills are abilities to manage verbal or nonverbal behaviors when interacting with one or more persons. People with social skills deficits have difficulty to control their own social behavior and also suffer from interpreting them among others. Social Skill Training (SST) is a well-grounded method developed to experience social interaction, to reduce social stress and to train those people. Clinical psychologists and psychiatrists provide SST. SST includes a role-play of a simulation of actual situations. A final goal of SST is to place the participants in real social situations. In this research, we aim to create virtual agents mimicking human SST specialists.

We analyze the human’s social skills dividing them into several steps and develop training methods. The objective is to develop methods and tools to reduce social stress in everyday situations, including public speaking at schools and workplaces. Target populations are Social Anxiety Disorder (SAD), Schizophrenia, Autism Spectrum Disorder (ASD), and a group of various social pathological phenomenon. In addition to the SST, a platform of Cognitive Behavioral Therapy (CBT) which improves cognitive distortion will also be integrated.

Members

Professor Satoshi Nakamura

Specially Appointed Professor
Professor, Emeritus
IEEE Life Fellow, ISCA Fellow, IPSJ Fellow, ATR Fellow.

Google Scholar:
https://scholar.google.com/citations?user=ckdfXawAAAAJ&hl=ja&oi=ao
Personal Page1:
http://ahclab.naist.jp/Prof.Nakamura/index_e.html
IEEE Signal Processing Article:
https://signalprocessingsociety.org/newsletter/2021/09/member-highlights-satoshi-nakamura

Manami Matsuda

Administrative Assistant