Skip to content



Current Grants


Transformers of multiple modalities for more natural spoken dialog

The goal of the project is the research of more natural spoken dialog systems based on the Transformer framework. Since Transformers could be used in sequence-to-sequence scenarios, their use in natural language understanding and natural language generation is common. We would like to focus on the cases where the input or output of a neural network is speech. To convert speech into semantic representation or dialog intents we will be using the speech recognizer as a black-box but we plan to develop methods and approaches to process speech lattices in the general Transformer or recurrent neural networks. The inverse process of generating speech from intents will employ the pre-trained Transformer models for language generation and the recent DNN-based speech synthesis architectures. The dialog management will use the attention neural mechanisms to keep track of the dialog state and to generate consistent prompts in an informal or conversational style. The challenging task of speech synthesis using the given speech style will be backed by the recorded corpus of conversational speech.

Coordinator: Doc. Ing. Matoušek Jindřich, Ph.D.
Investor: GAČR GA22-27800S
Years:2022 - 2024