Speech processing

Formerly headed by Dr. Beat Pfister, the research of the speech processing group focuses on text-to-speech (TTS) synthesis and speech recognition. These two topics share a common peculiarity, that contrasts to most other topics in speech processing, such as speech coding, speaker recognition, etc. The peculiarity of TTS synthesis and speech recognition is their involvement in both surface structures of natural language, namely text and speech. The aim is to transform one surface structure into the other, i.e., text into speech or vice versa.

It is commonly acknowledged that this cannot be achieved without the linguistic knowledge of the language(s) in concern. In order to be able to recognize speech from a specific language, we must know the set of phonemes of this language, the words that belong to this language and their pronunciation, which words can form expressions and sentences, etc. Our intention is to use always that type of representation which matches the type of knowledge to be represented.

Major research directions were

Rule-Based Language Model for Speech Recognition,
Improving Speech Recognition Through Linguistics,
Prosody Control in Polyglot TTS Synthesis,
Speaker Verification.

Further information can be found on the home page of the speech processing group.

After Dr. Beat Pfister retirement in January 2016, that research area is now discontinued at TEC.

Additional Information

Contact

Prof. em. Dr. Lothar Thiele

Professor Emeritus at the Department of Information Technology and Electrical Engineering

Inst. f. Techn. Informatik u. K.
Gloriastrasse 35
8092 Zürich
Switzerland