Microsoft word - paperkes2001.doc
Speech Technology to Provide Access to Digital
Grupo TLATOA (CENTIA), Universidad de las Américas-Puebla
Sta. Catarina Mártir, Cholula, Pue., MEXICO
. Thanks to the advances in today’s technology in terms of processing
speed of computers, storage space and the management of sound and video devices,
speech technology is a reality in almost any kind of computerized system. Speech
applications are being used in personal computers, cellular phones, etc. This makes
this interesting technology accessible to almost anyone. Among it’s most useful
applications we can find telephone-based information services, banking and
computer assisted language learning systems. There exist already a large number of
commercial products that use speech interfaces, developed mainly for English,
German and Japanese. That is why a strong effort is being made to make this
technology available also for the Spanish speaking community. Alongside the
various advances in fundamental research on improving the speech technologies is
the consideration of Human-Computer interaction factors, specific to every user’s
culture and language that need to be included in information systems. Our work has
been to perform basic research in the different speech processing techniques, trying
to improve the performance of speech recognition and synthesis (artificial neural
networks, hidden Markov Models (HMM’s), Unit Selection, etc.), as well as, the
Spanish language, dialogue structure, perception and human-computer interaction
approaches, for the development of speech applications.
The rapid growth of the Internet has produced an extremely large network of information,
interconnecting countries and cultures. This global connectivity opens many opportunities for people to have access to data and exchange information in order to solve different problems. However, it is still not there for everybody. Current technology provides ever-improving human-computer interfaces, but there exist large communities of citizens in different countries that cannot use a computer. This is caused mainly by economical, educational and technological recessions found in those areas. Thus our effort needs to focus on the research and development to overcome barriers and provide access to digital information and communication to broader communities of users, trying to breach the digital divide .
Some of the fundamental problems faced by reserachers in this area are the representation,
sharing, access, retrieval and translation of information already present on the internet in other languages. Also, the heterogeneus information networks, which differ in design, reliability and performance can pose a difficulty to the communication.
However, once the digital data is available, the important issue are the channels by which the
acces should be provided. Our work is oriented towards the research, developing and adapting
speech technology to provide digital services in Mexican Spanish. In this way we can improve
the access to digital services through voice interfaces (for example, via the telephone). 2. Development of Speech Technology for Mexican Spanish
Several projects sponsored by NSF and CONACyT have provided the support for the creation
of a group dedicated to the research and development of speech technology in Spanish. The group is growing larger with students from undergraduate and graduate level, all working on a specific thesis project linked to one of the projects with other institutions or sponsored by the CONACyT.
we been working in the following areas of automatic speech processing: speech
recognition, synthesis, conversational systems, application development, speech technology in
education and corpus development. The following sections describe briefly each of these areas. 3. Speech Recognition
Research focuses on the improvement of the methods for training and development of speech
recognizers, as well as testing in order to be able to compare their performance with other
recognizers [2, 3]. We have recently trained a general-purpose recognizer based neural networks
and another with HMM’s, using a larger speech corpus. The performance has been tested on
specific purpose corpora with excellent recognition rates. . However, the corpora used to train
these recognizers does not reflect all the different dialectal zones of Mexico, thus there is still
room for improvement, once more data becomes available. 4. Speech Synthesis
Until now speech the applications we have developed have used our Mexican Spanish voice
provided with the CSLU Toolkit for use with Festival. Although this speech synthesis was later,
improved by adding a duration estimation module , the voice still sounds robotic. Now we
have started to develop a new voice with a better quality, using a Unit Selection approach for the
synthesis. In this approach a speech corpus, recorded with high quality by a professional speaker
is searched to extract large units of speech to concatenate them. These units of speech can be
entire phrases, parts of phrases, words, or if none can be found, syllables. The result is a much
more natural sounding speech. The program developed can use any pre-recorded corpus, which
needs to be completely transcribed and labeled (word- and phoneme-level).  5. Conversational Interfaces
The term “conversational” can have different meanings depending on the context, but in
general it refert to an interactive system which works in a restricted domain. Although many speech interfaces are considered conversational, they largely differ in one main aspect, and this is the degree in which the system takes a more active role in the conversation .
Conversational system allow users to interact with an automated system and recover
information, perform transactions, or other tasks to solve a certain problem.
The architecture of this kind of systems varies significantly according to its capabilities and the
degree of flexibility it offers. Some systems are only capable of answering to specific commands,
while others offer a flexible and continuous dialogue. 5.1 Architecture of Conversational Interfaces
The elements required in a conversational system (see figure 1) are:
• A speech recognition module to convert speech to text.
• The Natural Language Understanding (NLU) module that obtains the semantic
representation of the recognized speech.
• The dialogue manager, which keeps the control of the interaction, and can decide who is
to have a more active role in the dialogue, the speaker or the system.
• A language generating module which puts the retrieved information or the questions of
the system in understandable and correct sentences / phrases.
• A mechanism to transmit the information to the user, which can include, apart from
speech synthesis, other devices to visually display the data as well.
5.2 Conversational Interfaces for Spanish Language
Based on CSLU’s robust parser we had initially integrated natural language processing into
query systems that allowed students to obtain information about their courses via a speech interface . This is a good approach for developing in a fairly short time some NLU interfaces, using the CSLU Toolkit. There are, however much more powerful architectures, that provide better capabilities for mixed initiative conversational interfaces.
In order to provide access to information we are beginning to work with the CU-Communicator
Figure 1. Typical architecture of a conversational interface.
It is our objective to develop conversational applications that are able to perform a dialogue in
Spanish, detecting the degree of expertise of a user with a system and allow a flexible communitaction where the system can take or relinquish the initiative. This is in order to be able to obtain ebough information from the user to perform the required task and at the same time give the user the chance to provide more information than was specifically asked for but the user already knows, speeding up the process of completing the information.
Mixed initiative techniques try to give more flexibility to an expert user and guidance to the
newcomer in order for both to reach the same goal in a user-friendly environment. It implies the
monitoring of the turns taken in the dialogue between the user and the system, both trying to
reach the same goal (depending on the task this can be retrieval of information). The
implementation of the necessary modules to maintain a mixed-initiative interaction in Mexican
Spanish includes natural language analysis and processing  and an user agent that monitors
the dialogue, registers he elements relevant in the context (filling the slots of required
information) and the data given by the user, until it is enough to form a query to consult a
database. The development of mechanisms to make digital data accessible to multiple users,
together with query languages to access databases implies the definition of task specific
vocabularies, dialogue structures and language models. 6. Speech Application Development
Several small applications have been developed as demonstration systems like a voice mail, a
system to access e-mail via the telephone among others [11, 12]. Additionally we collaborated with SpeechWorks Intl. In the development of an auto-attendant for the university, as web as a system that allows the students check on their account status with the university, also via the telephone.
One very interesting application field for speech technology is the education. As part of a larger
project to be developed we created a couple of tools for a computer-assisted language-learning
environment. These tools are a first prototype for pronunciation verification and a bilingual
dictionary, both for Spanish language students, whose native tongue is American English [13,14].
Another very interesting aspect is the use of speech technology based systems to support
language acquisition for deaf children. We are developing a system based on the CSLU Toolkit
as a first prototype for the Jean Piaget Special Education School in Puebla. The main challenge
here is to create a tool that can be used also by the children that do not speak yet (modulating
correctly) and design the interface and the content of the lessons in a way that is easy to use for
the students and easy to manage by the teachers. 7. Corpus Development
Each and every one of the previous projects could not be done without the existence of a
sufficiently complete speech corpus. Our group has recorded a corpus consisting of 550 speakers,
mainly from the central area and a little from the north of Mexico, speaking a large variety of
words (names, numbers, digits, letters) as well as spontaneous speech. This has been one of the
most time-consuming tasks, but now this corpus, completely transcribed and labeled, is available
to anyone without cost for educational and research purposes. The recording of speech corpus is a
continuing effort, aiming to cover in some near future all the dialectal zones of the country, as
well as different ages, including children. 8. Conclusions
Our main interest is first, to produce students with the knowledge and capability to work with
speech technology, and the awareness of CHI specific to their environment. Second, we focus on the development of technology that is accessible to Mexican people and also applications that provide a support/aid to the real needs of Mexican society, using speech technology.
 B. Shneiderman, CUU: Bridging the Digital Divide with Universal Usability. Interactions
, Special Issue 2001,
, (2001), pp. 11-15.
 M. Espinosa and B. Serridge, Comparación entre redes neuronales y modelos ocultos de Markov para el
reconocimiento de voz, utilizando el CSLU Toolkit, in Proceedings of ENC'99
, Pachuca, Mexico, September 1999.
 M.A. Oliver and I. Kirschning, Evaluación de métodos de determinación automáticos de una transcripción
fonética, in Proceedings of ENC'99,
Pachuca, Mexico, September 1999.
 E. Clemente, Entrenamiento y Evaluación de reconocedores de Voz de Propósito General basados en Redes
y Modelos Ocultos de Markov, Graduate Thesis, Dept. Computer Systems Engineering, UDLAP, June 2001.
 H. Meza, Modelos Estadísticos de Duración de los Fonemas en un Corpus de Español Mexicano, in Proceedings
of CONIELECOM 2000
,Cholula, Mexico, March 2000.
 L. Flores, Síntesis de Voz con Unit Selection, Graduate Thesis, Dept. Computer Systems Engineering, UDLAP,
 J. Glass, Challenges for Spoken Dialogue Systems, in Proceedings of 1999 IEEE ASRU Workshop
 O. Rosas, Sistema de Consultas utilizando Reconocimiento de Voz y Procesamiento de Lenguaje Natural.
Master Thesis, Dept. Computer Systems Engineering, UDLAP, June 1999.
 B. Pellom, W. Ward and S. Pradhan, The CU Communicator: An Architecture for Dialogue Systems.
International Conference on Spoken Language Processing (ICSLP), Beijing China, November 2000.
 R.D. Navarrete, R. Davila and A. Sánchez, SAVIA Traductor de un dominio restringido en una biblioteca
digital, in Proceedings of CONIELECOM 2000
, Cholula, Puebla, March 2000.
 N. Munive and O. Cervantes, Un Sistema de Correo Electrónico y de Voz usando Reconocimiento de Voz,
, 69, (May 1999), 44-48.
 N. Munive, A. Vargas, B. Serridge, O. Cervantes and I. Kirschning, Entrenamiento de un reconocedor Fonético
de Dígitos para el Español de México usando el CSLU Toolkit, Computación y Sistemas
, 2, (1999), 98-104.
 I. Kirschning and N. Aguas, Verification of Correct Pronunciation of Mexican Spanish using Speech
Technology, in Proceedings of MICAI 2000
: Advances in Artificial Intelligence, Mexican International Conference on Artificial Intelligence
, Springer Verlag, , 493-502, México, April 2000.
 I. Kirschning, N. Aguas, and A. Ahuactzin, Aplicación de Tecnología de Voz en la Enseñanza del Español, in
Proceedings of the 1er. Taller Internacional de Tratamiento del Habla, Procesamiento de Voz y el Lenguaje HAVOL 2000
, Mexico, July 2000.
BUNDESGERICHTSHOF IM NAMEN DES VOLKES PatG § 21 Abs. 1 Nr. 4, § 38; ZPO § 69 ein Ausführungsbeispiel der Erfindung beschreibenden Merkmalen nur eines in den Patentanspruch aufgenommen, das die mit dem Ausführungsbeispiel erzielte technische Wirkung angibt, liegt darin auch dann keine unzulässige Erweiterung, wenn ein anderer Weg zur Er-zielung derselben Wirkung nicht offenbart is
Glucose Control Solution Test can only be used with the ElementTM Auto-coding Monitoring System and should be used during the following: • When a new vial of blood glucose test strips are opened. • Any suspicion that the blood glucose test meter or blood glucose test strips are not working properly. • When your blood glucose test results are not consistent with your sympt