Speech Technology to Provide Access to Digital
Grupo TLATOA (CENTIA), Universidad de las Américas-Puebla Sta. Catarina Mártir, Cholula, Pue., MEXICO Abstract. Thanks to the advances in today’s technology in terms of processing speed of computers, storage space and the management of sound and video devices, speech technology is a reality in almost any kind of computerized system. Speech applications are being used in personal computers, cellular phones, etc. This makes this interesting technology accessible to almost anyone. Among it’s most useful applications we can find telephone-based information services, banking and computer assisted language learning systems. There exist already a large number of commercial products that use speech interfaces, developed mainly for English, German and Japanese. That is why a strong effort is being made to make this technology available also for the Spanish speaking community. Alongside the various advances in fundamental research on improving the speech technologies is the consideration of Human-Computer interaction factors, specific to every user’s culture and language that need to be included in information systems. Our work has been to perform basic research in the different speech processing techniques, trying to improve the performance of speech recognition and synthesis (artificial neural networks, hidden Markov Models (HMM’s), Unit Selection, etc.), as well as, the Spanish language, dialogue structure, perception and human-computer interaction approaches, for the development of speech applications.
1. Introduction
The rapid growth of the Internet has produced an extremely large network of information,
interconnecting countries and cultures. This global connectivity opens many opportunities for people to have access to data and exchange information in order to solve different problems. However, it is still not there for everybody. Current technology provides ever-improving human-computer interfaces, but there exist large communities of citizens in different countries that cannot use a computer. This is caused mainly by economical, educational and technological recessions found in those areas. Thus our effort needs to focus on the research and development to overcome barriers and provide access to digital information and communication to broader communities of users, trying to breach the digital divide [1].
Some of the fundamental problems faced by reserachers in this area are the representation,
sharing, access, retrieval and translation of information already present on the internet in other languages. Also, the heterogeneus information networks, which differ in design, reliability and performance can pose a difficulty to the communication.
However, once the digital data is available, the important issue are the channels by which the
acces should be provided. Our work is oriented towards the research, developing and adapting
speech technology to provide digital services in Mexican Spanish. In this way we can improve the access to digital services through voice interfaces (for example, via the telephone). 2. Development of Speech Technology for Mexican Spanish
Several projects sponsored by NSF and CONACyT have provided the support for the creation
of a group dedicated to the research and development of speech technology in Spanish. The group is growing larger with students from undergraduate and graduate level, all working on a specific thesis project linked to one of the projects with other institutions or sponsored by the CONACyT.
Specifically, we been working in the following areas of automatic speech processing: speech
recognition, synthesis, conversational systems, application development, speech technology in education and corpus development. The following sections describe briefly each of these areas. 3. Speech Recognition
Research focuses on the improvement of the methods for training and development of speech
recognizers, as well as testing in order to be able to compare their performance with other recognizers [2, 3]. We have recently trained a general-purpose recognizer based neural networks and another with HMM’s, using a larger speech corpus. The performance has been tested on specific purpose corpora with excellent recognition rates. [4]. However, the corpora used to train these recognizers does not reflect all the different dialectal zones of Mexico, thus there is still room for improvement, once more data becomes available. 4. Speech Synthesis
Until now speech the applications we have developed have used our Mexican Spanish voice
provided with the CSLU Toolkit for use with Festival. Although this speech synthesis was later, improved by adding a duration estimation module [5], the voice still sounds robotic. Now we have started to develop a new voice with a better quality, using a Unit Selection approach for the synthesis. In this approach a speech corpus, recorded with high quality by a professional speaker is searched to extract large units of speech to concatenate them. These units of speech can be entire phrases, parts of phrases, words, or if none can be found, syllables. The result is a much more natural sounding speech. The program developed can use any pre-recorded corpus, which needs to be completely transcribed and labeled (word- and phoneme-level). [6] 5. Conversational Interfaces
The term “conversational” can have different meanings depending on the context, but in
general it refert to an interactive system which works in a restricted domain. Although many speech interfaces are considered conversational, they largely differ in one main aspect, and this is the degree in which the system takes a more active role in the conversation [7].
Conversational system allow users to interact with an automated system and recover
information, perform transactions, or other tasks to solve a certain problem.
The architecture of this kind of systems varies significantly according to its capabilities and the
degree of flexibility it offers. Some systems are only capable of answering to specific commands, while others offer a flexible and continuous dialogue. 5.1 Architecture of Conversational Interfaces
The elements required in a conversational system (see figure 1) are:
• A speech recognition module to convert speech to text.
• The Natural Language Understanding (NLU) module that obtains the semantic
representation of the recognized speech.
• The dialogue manager, which keeps the control of the interaction, and can decide who is
to have a more active role in the dialogue, the speaker or the system.
• A language generating module which puts the retrieved information or the questions of
the system in understandable and correct sentences / phrases.
• A mechanism to transmit the information to the user, which can include, apart from
speech synthesis, other devices to visually display the data as well.
5.2 Conversational Interfaces for Spanish Language
Based on CSLU’s robust parser we had initially integrated natural language processing into
query systems that allowed students to obtain information about their courses via a speech interface [8]. This is a good approach for developing in a fairly short time some NLU interfaces, using the CSLU Toolkit. There are, however much more powerful architectures, that provide better capabilities for mixed initiative conversational interfaces.
In order to provide access to information we are beginning to work with the CU-Communicator
Figure 1. Typical architecture of a conversational interface.
It is our objective to develop conversational applications that are able to perform a dialogue in
Spanish, detecting the degree of expertise of a user with a system and allow a flexible communitaction where the system can take or relinquish the initiative. This is in order to be able to obtain ebough information from the user to perform the required task and at the same time give the user the chance to provide more information than was specifically asked for but the user already knows, speeding up the process of completing the information.
Mixed initiative techniques try to give more flexibility to an expert user and guidance to the
newcomer in order for both to reach the same goal in a user-friendly environment. It implies the
monitoring of the turns taken in the dialogue between the user and the system, both trying to reach the same goal (depending on the task this can be retrieval of information). The implementation of the necessary modules to maintain a mixed-initiative interaction in Mexican Spanish includes natural language analysis and processing [10] and an user agent that monitors the dialogue, registers he elements relevant in the context (filling the slots of required information) and the data given by the user, until it is enough to form a query to consult a database. The development of mechanisms to make digital data accessible to multiple users, together with query languages to access databases implies the definition of task specific vocabularies, dialogue structures and language models. 6. Speech Application Development
Several small applications have been developed as demonstration systems like a voice mail, a
system to access e-mail via the telephone among others [11, 12]. Additionally we collaborated with SpeechWorks Intl. In the development of an auto-attendant for the university, as web as a system that allows the students check on their account status with the university, also via the telephone.
One very interesting application field for speech technology is the education. As part of a larger
project to be developed we created a couple of tools for a computer-assisted language-learning environment. These tools are a first prototype for pronunciation verification and a bilingual dictionary, both for Spanish language students, whose native tongue is American English [13,14]. Another very interesting aspect is the use of speech technology based systems to support language acquisition for deaf children. We are developing a system based on the CSLU Toolkit as a first prototype for the Jean Piaget Special Education School in Puebla. The main challenge here is to create a tool that can be used also by the children that do not speak yet (modulating correctly) and design the interface and the content of the lessons in a way that is easy to use for the students and easy to manage by the teachers. 7. Corpus Development
Each and every one of the previous projects could not be done without the existence of a
sufficiently complete speech corpus. Our group has recorded a corpus consisting of 550 speakers, mainly from the central area and a little from the north of Mexico, speaking a large variety of words (names, numbers, digits, letters) as well as spontaneous speech. This has been one of the most time-consuming tasks, but now this corpus, completely transcribed and labeled, is available to anyone without cost for educational and research purposes. The recording of speech corpus is a continuing effort, aiming to cover in some near future all the dialectal zones of the country, as well as different ages, including children. 8. Conclusions
Our main interest is first, to produce students with the knowledge and capability to work with
speech technology, and the awareness of CHI specific to their environment. Second, we focus on the development of technology that is accessible to Mexican people and also applications that provide a support/aid to the real needs of Mexican society, using speech technology.
References [1] B. Shneiderman, CUU: Bridging the Digital Divide with Universal Usability. Interactions, Special Issue 2001,
Vol. VIII.2, (2001), pp. 11-15.
[2] M. Espinosa and B. Serridge, Comparación entre redes neuronales y modelos ocultos de Markov para el
reconocimiento de voz, utilizando el CSLU Toolkit, in Proceedings of ENC'99, Pachuca, Mexico, September 1999.
[3] M.A. Oliver and I. Kirschning, Evaluación de métodos de determinación automáticos de una transcripción
fonética, in Proceedings of ENC'99, Pachuca, Mexico, September 1999.
[4] E. Clemente, Entrenamiento y Evaluación de reconocedores de Voz de Propósito General basados en Redes
Neuronales feed-forward y Modelos Ocultos de Markov, Graduate Thesis, Dept. Computer Systems Engineering, UDLAP, June 2001.
[5] H. Meza, Modelos Estadísticos de Duración de los Fonemas en un Corpus de Español Mexicano, in Proceedings of CONIELECOM 2000 ,Cholula, Mexico, March 2000.
[6] L. Flores, Síntesis de Voz con Unit Selection, Graduate Thesis, Dept. Computer Systems Engineering, UDLAP,
[7] J. Glass, Challenges for Spoken Dialogue Systems, in Proceedings of 1999 IEEE ASRU Workshop, Keystone,
[8] O. Rosas, Sistema de Consultas utilizando Reconocimiento de Voz y Procesamiento de Lenguaje Natural.
Master Thesis, Dept. Computer Systems Engineering, UDLAP, June 1999.
[9] B. Pellom, W. Ward and S. Pradhan, The CU Communicator: An Architecture for Dialogue Systems.
International Conference on Spoken Language Processing (ICSLP), Beijing China, November 2000.
[10] R.D. Navarrete, R. Davila and A. Sánchez, SAVIA Traductor de un dominio restringido en una biblioteca
digital, in Proceedings of CONIELECOM 2000 , Cholula, Puebla, March 2000.
[11] N. Munive and O. Cervantes, Un Sistema de Correo Electrónico y de Voz usando Reconocimiento de Voz,
Soluciones Avanzadas, 7, 69, (May 1999), 44-48.
[12] N. Munive, A. Vargas, B. Serridge, O. Cervantes and I. Kirschning, Entrenamiento de un reconocedor Fonético
de Dígitos para el Español de México usando el CSLU Toolkit, Computación y Sistemas, 3, 2, (1999), 98-104.
[13] I. Kirschning and N. Aguas, Verification of Correct Pronunciation of Mexican Spanish using Speech
Technology, in Proceedings of MICAI 2000: Advances in Artificial Intelligence, Mexican International Conference on Artificial Intelligence, Springer Verlag, , 493-502, México, April 2000.
[14] I. Kirschning, N. Aguas, and A. Ahuactzin, Aplicación de Tecnología de Voz en la Enseñanza del Español, in
Proceedings of the 1er. Taller Internacional de Tratamiento del Habla, Procesamiento de Voz y el Lenguaje HAVOL 2000, Mexico, July 2000.
BUNDESGERICHTSHOF IM NAMEN DES VOLKES PatG § 21 Abs. 1 Nr. 4, § 38; ZPO § 69 ein Ausführungsbeispiel der Erfindung beschreibenden Merkmalen nur eines in den Patentanspruch aufgenommen, das die mit dem Ausführungsbeispiel erzielte technische Wirkung angibt, liegt darin auch dann keine unzulässige Erweiterung, wenn ein anderer Weg zur Er-zielung derselben Wirkung nicht offenbart is
Glucose Control Solution Test can only be used with the ElementTM Auto-coding Monitoring System and should be used during the following: • When a new vial of blood glucose test strips are opened. • Any suspicion that the blood glucose test meter or blood glucose test strips are not working properly. • When your blood glucose test results are not consistent with your sympt