260 results on '"Mariño Acebal, José Bernardo"'
Search Results
2. The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
- Author
-
Formiga Fanals, Lluís, Henríquez Quintana, Carlos Alberto, Hernández Huerta, Adolfo, Mariño Acebal, José Bernardo, Monte Moreno, Enrique, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Senyal, Teoria del (Telecomunicació) ,Natural language processing (Computer science) ,Ensenyament i aprenentatge::Aprenentatge de llengües [Àrees temàtiques de la UPC] ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] ,Signal theory (Telecommunication) ,Tractament del llenguatge natural (Informàtica) - Abstract
This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.
- Published
- 2012
3. Automatic and human evaluation study of a rule-based and a statistical Catalan-Spanish machine translation systems
- Author
-
Ruiz Costa-Jussà, Marta, Farrús Cabeceran, Mireia, Mariño Acebal, José Bernardo, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Statistical machine translation ,Signal theory (Telecommunication) ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] ,Machine translation ,Rule-based machine translation - Abstract
Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanishto- Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern.
- Published
- 2011
4. Leveraging online user feedback to improve statistical machine translation
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Formiga, Lluís, Barrón-Cedeño, Alberto, Marquez, Lluis, Henriquez, Carlos A, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Formiga, Lluís, Barrón-Cedeño, Alberto, Marquez, Lluis, Henriquez, Carlos A, and Mariño Acebal, José Bernardo
- Abstract
In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system., Peer Reviewed, Postprint (author's final draft)
- Published
- 2015
5. Examen Final
- Author
-
Hernando, J., Mariño Acebal, José Bernardo, Oliveras Vergés, Albert, Villares Piera, Nemesio Javier, Hernando, J., Mariño Acebal, José Bernardo, Oliveras Vergés, Albert, and Villares Piera, Nemesio Javier
- Abstract
Resolved
- Published
- 2015
6. Linguistic-based evaluation criteria to identify statistical machine translation errors
- Author
-
Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Senyal, Teoria del (Telecomunicació) ,Translating machines ,Traducció ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] ,Linguistic analysis (Linguistics) - Abstract
Machine translation evaluation methods are highly necessary in order to analyze the performance of translation systems. Up to now, the most traditional methods are the use of automatic measures such as BLEU or the quality perception performed by native human evaluations. In order to complement these traditional procedures, the current paper presents a new human evaluation based on the expert knowledge about the errors encountered at several linguistic levels: orthographic, morphological, lexical, semantic and syntactic. The results obtained in these experiments show that some linguistic errors could have more influence than other at the time of performing a perceptual evaluation.
- Published
- 2010
7. The TALP on-line Spanish-Catalan machine-translation system
- Author
-
Poch, M, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Hernández, Adolfo, Henríquez Quintana, Carlos Alberto, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Senyal, Teoria del (Telecomunicació) ,Natural language processing ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] ,Signal theory (Telecommunication) ,Llenguatge natural (Informàtica) -- Processament -- Congressos - Abstract
In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.
- Published
- 2009
8. The TALP & I2R SMT Systems for IWSLT 2008
- Author
-
Li, H., Aw, A., Zhang, Ming, Khalilov, Maxim, Ruiz Costa-Jussà, Marta, Henríquez Quintana, Carlos Alberto, Rodríguez Fonollosa, José Adrián, Hernández, A., Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, Chen, B., Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Senyal, Teoria del (Telecomunicació) ,Processament de la parla ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] ,Signal theory (Telecommunication) ,Speech processing systems ,Machine translation - Abstract
This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.
- Published
- 2008
9. Segmentación lingüística de tuplas para el modelado de la traducción estocástica mediante n-gramas
- Author
-
Gispert Ramis, Adrià de and Mariño Acebal, José Bernardo
- Subjects
Segmentación en tuplas ,Ngram-based statistical machine translation ,Modelo de traducción ,Tuple segmentation ,Traducción estocástica mediante n-gramas ,Translation model - Abstract
La traducción automática estocástica basada en n-gramas se fundamenta en un modelo de lenguaje de n-gramas estándar de unidades bilingües (tuplas) para modelar el proceso de la traducción, cuya estimación requiere de una segmentación para cada par de frases paralelas del corpus de entrenamiento. Esto implica la toma de ciertas decisiones firmes en cuanto a segmentación en unidades de traducción se refiere, especialmente cuando una palabra no es alineada a ninguna otra del otro idioma. En esta comunicación se presenta un estudio de esta situación, comparando técnicas de segmentación ya propuestas en dos tareas de traducción independientes: la tarea de gran vocabulario definida por el corpus de los debates de Parlamento Europeo entre inglés y español, y una tarea de tamaño mucho más reducido de expresiones turísticas entre el árabe y el inglés. Además, se propone una técnica de segmentación nueva que incorpora información lingüística, obteniendo mejores resultados en todas las tareas. Ngram-based Statistical Machine Translation relies on a standard Ngram language model of tuples to estimate the translation process. In training, this translation model requires a segmentation of each parallel sentence, which involves taking a hard decision on tuple segmentation when a word is not linked during word alignment. This is especially critical when this word appears in the target language, as this hard decision is compulsory. In this paper we present a thorough study of this situation, comparing for the first time each of the proposed techniques in two independent tasks, namely English–Spanish European Parliament Proceedings large-vocabulary task and Arabic–English Basic Travel Expressions small-data task. In the face of this comparison, we present a novel segmentation technique which incorporates linguistic information. Results obtained in both tasks outperform all previous techniques. Este trabajo ha sido cofinanciado por el proyecto TC-STAR (Unión Europea, FP6-506738), la Generalitat de Catalunya y el Fondo Social Europeo.
- Published
- 2006
10. Joint training of codebooks and acoustic models in automatic speech recognition using semi-continuous HMMs
- Author
-
Nogueiras Rodríguez, Albino, Caballero Galeote, Mónica, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Automatic speech recognition ,Reconeixement automàtic de la parla ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] - Abstract
In this paper, three different techniques for building semicontinuousHMMbased speech recognisers are compared: the classical one, using Euclidean generated codebooks and independently trained acoustic models; jointly reestimating the codebooks and models obtained with the classical method; and jointly creating codebooks and models growing their size from one centroid to the desired number of them. The way this growth may be done is carefully addressed, focusing on the selection of the splitting direction and the way splitting is implemented. Results in a large vocabulary task show the ef ciency of the approach, with noticeable improvements both in accuracy and CPU consumption. Moreover, this scheme enables the use of the concatenation of features, avoiding the independence assumption usually needed in semi-continuous HMM modelling, and leading to further improvements in accuracy and CPU.
- Published
- 2006
11. Integración de reordenamientos en el algoritmo de decodificación en traducción automática estocástica
- Author
-
Crego Clemente, Josep María and Mariño Acebal, José Bernardo
- Subjects
Etiquetado POS ,Reordenamiento ,POS tagging ,Reordering ,Stochastic machine translation ,Traducción automática estocástica ,Algoritmos de decodificación ,Decoding algorithms - Abstract
En esta comunicación se presenta un marco de trabajo para introducir la capacidad de reordenamiento de palabras en traducción automática (TA). Los reordenamientos producidos en la oración fuente se integran en el algoritmo de decodificación, lo que permite construir un grafo de búsqueda de dimensiones reducidas. A partir de un grafo de búsqueda monótono (sin reordenamientos), se utilizan patrones de reordenamiento (patrones de reescritura motivados lingüísticamente) para añadir arcos que introducen permutaciones de las palabras fuente. Los patrones se aprenden de manera automática a partir del conjunto de entrenamiento, utilizando los alineamientos de palabras (entre las oraciones fuente y destino) y las etiquetas morfo-sintácticas (POS) de las oraciones fuente. Una vez completado el grafo de búsqueda, el algoritmo de decodificación lo atraviesa asignando una probabilidad (coste) a cada hipótesis, ayudándose por un modelo de lenguaje N-grama aprendido de las etiquetas POS del idioma origen después de ser reordenadas (además de por un conjunto de modelos típico en traducción automática). El método propuesto se evalúa en una tarea de traducción del español al inglés y viceversa, utilizando el corpus del Parlamento Europeo, donde pueden observarse mejoras tanto en calidad de la traducción (con medidas subjetivas y automáticas) como en eficiencia computacional. This paper presents a reordering framework for statistical machine translation (SMT) where source-side reorderings are integrated into SMT decoding, allowing for a highly constrained reordered search graph. The monotone search is extended by means of a set of reordering patterns (linguistically motivated rewrite patterns). Patterns are automatically learnt in training from word-to-word alignments and source-side Part-Of-Speech (POS) tags. Traversing the extended search graph, the decoder evaluates every hypothesis making use of a group of widely used SMT models and helped by an additional Ngram language model of source-side POS tags. Experiments are reported on the Euparl task (Spanish-to-English and English-to-Spanish). Results are presented regarding translation accuracy (using human and automatic evaluations) and computational efficiency, showing significant improvements in translation quality for both translation directions at a very low computational cost. Esta comunicación ha sido parcialmente subvencionada por el gobierno español, TIC- 2002-04447-C02 (proyecto Aliado), la Unión Europea, FP6-506738 (proyecto TC-STAR) y la Universidad Politècnica de Catalunya (beca UPC-RECERCA).
- Published
- 2006
12. Clasificación y generalización de formas verbales en sistemas de traducción estocástica
- Author
-
Gispert Ramis, Adrià de, Mariño Acebal, José Bernardo, and Crego Clemente, Josep María
- Subjects
Morphology ,Linguistic knowledge ,Stochastic machine translation ,Morfología ,Verb forms ,Traducción estocástica ,Conocimiento lingüístico ,Formas verbales - Abstract
En esta comunicación se propone un método para incorporar conocimiento lingüístico relativo a las formas verbales en sistemas estocásticos de traducción. Por medio de una clasificación basada en conocimiento de dichas formas, y de su sustitución por el lema del verbo principal durante la fase de entrenamiento, se consigue un mejor alineado en palabras, cuya consecuencia es una mejor estimación del modelo de traducción. Además, a partir de las formas verbales observadas en el entrenamiento es posible generalizar con éxito y proporcionar traducciones a nuevas formas no vistas anteriormente. El método propuesto es evaluado en una tarea de traducción del inglés al español de dominio restringido, donde se alcanza una mejora significativa. This paper introduces a method to incorporate linguistic knowledge regarding verb forms into an stochastic machine translation model. By means of a rule-based classification of these forms, and by substituting them by the base form of the head verb during the training stage, we achieve a better statistical word alignment, which leads to a better estimate of the translation model. Furthermore, a successful generalization strategy can be devised to produce a new translation for unseen verb forms from the translations of seen verb forms. An evaluation of this method in an English to Spanish limited-domain translation task is presented, producing a significant performance improvement. Este trabajo ha sido financiado parcialmente por la CICYT a través del proyecto TIC2002-04447-C02 (ALIADO), la Unión Europea mediante el proyecto FP6-506738 (TC-STAR), y el "Departament de Universitats, Recerca i Societat de la Informació" de la Generalitat de Catalunya.
- Published
- 2005
13. Algoritmo de decodificación de traducción automática estocástica basado en n-gramas
- Author
-
Crego Clemente, Josep María, Mariño Acebal, José Bernardo, and Gispert Ramis, Adrià de
- Subjects
Stochastic machine translation ,N-gram-based translation models ,Modelos de traducción basados en N-gramas ,Traducción automática estocástica ,Algoritmos de decodificación ,Decoding algorithms - Abstract
En esta comunicación se presenta MARIE, un algoritmo de decodificación para un sistema de traducción automática estocástica basado en N-gramas. Para su implementación se utiliza una estrategia de búsqueda en haz, con capacidad para realizar reordenamientos (distorsión). El modelo de traducción está basado en N-gramas bilingües, ampliado para introducir reordenamientos en las cadenas de palabras. La estructura del espacio de búsqueda permite realizar un alto grado de poda, incrementando así la eficiencia del algoritmo. In this paper we describe MARIE, an N-gram-based stochastic machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an N-gram approach, extended to introduce reordering at the phrase level. The search graph structure is designed to perform very accurate comparisons, what allows for a high level of pruning, improving the decoder efficiency. Esta comunicación ha sido parcialmente subvencionada por el gobierno español, TIC-2002-04447-C02 (proyecto Aliado), la Unión Europea, FP6-506738 (proyecto TC-STAR) y la Universidad Politécnica de Catalunya (beca UPC-RECERCA).
- Published
- 2005
14. Examen Final
- Author
-
Oliveras Vergés, Albert, Mariño Acebal, José Bernardo, Hernando Pericás, Francisco Javier, Villares Piera, Nemesio Javier, Oliveras Vergés, Albert, Mariño Acebal, José Bernardo, Hernando Pericás, Francisco Javier, and Villares Piera, Nemesio Javier
- Abstract
Resolved
- Published
- 2014
15. Improving statistical machine translation through adaptation and learning
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Mariño Acebal, José Bernardo, Banchs, Rafael E., Henriquez Q., Carlos A., Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Mariño Acebal, José Bernardo, Banchs, Rafael E., and Henriquez Q., Carlos A.
- Abstract
With the arrival of free on-line machine translation (MT) systems, came the possibility to improve automatic translations with the help of daily users. One of the methods to achieve such improvements is to ask to users themselves for a better translation. It is possible that the system had made a mistake and if the user is able to detect it, it would be a valuable help to let the user teach the system where it made the mistake so it does not make it again if it finds a similar situation. Most of the translation systems you can find on-line provide a text area for users to suggest a better translation (like Google translator) or a ranking system for them to use (like Microsoft's). In 2009, as part of the Seventh Framework Programme of the European Commission, the FAUST project started with the goal of developing "machine translation (MT) systems which respond rapidly and intelligently to user feedback". Specifically, one of the project objective was to "develop mechanisms for instantaneously incorporating user feedback into the MT engines that are used in production environments, ...". As a member of the FAUST project, this thesis focused on developing one such mechanism. Formally, the general objective of this work was to design and implement a strategy to improve the translation quality of an already trained Statistical Machine Translation (SMT) system, using translations of input sentences that are corrections of the system's attempt to translate them. To address this problem we divided it in three specific objectives: 1. Define a relation between the words of a correction sentence and the words in the system's translation, in order to detect the errors that the former is aiming to solve. 2. Include the error corrections in the original system, so it learns how to solve them in case a similar situation occurs. 3. Test the strategy in different scenarios and with different data, in order to validate the applications of the proposed methodology. The main contributio, Esta tesis propone un nuevo método para mejorar un sistema de Traducción Automática Estadística (SMT por sus siglas en inglés) utilizando post-ediciones de sus traducciones automáticas. La estrategia puede asociarse con la adaptación de dominio, considerando las post-ediciones obtenidas a través de usuarios reales del sistema de traducción como el material del dominio a adaptar. El método compara las post-ediciones con las traducciones automáticas con la finalidad de detectar automáticamente los lugares en los que el traductor cometió algún error, para poder aprender de ello. Una vez los errores han sido detectados se realiza un alineado a nivel de palabras entre las oraciones originales y las postediciones, para extraer unidades de traducción que son luego incorporadas al sistema base de manera que se corrijan los errores en futuras traducciones. Nuestros resultados muestran mejoras estadísticamente significativas a partir de un conjunto de datos que representa en tamaño un 0, 5% del material utilizado durante el entrenamiento. Junto con las medidas automáticas de calidad, también presentamos un análisis cualitativo del sistema para validar los resultados. Las mejoras en la traducción se observan en su mayoría en el léxico y el reordenamiento de palabras, seguido de correcciones morfológicas. La estrategia, que introduce los conceptos de corpus aumentado, función de similaridad y unidades de traducción derivadas, es probada con dos paradigmas de SMT (traducción basada en N-gramas y en frases), con dos pares de lengua (Catalán-Español e Inglés-Español) y en diferentes escenarios de adaptación de dominio, incluyendo un dominio abierto en el cual el sistema fue adaptado a través de peticiones recogidas por usuarios reales a través de internet, obteniendo resultados similares durante todas las pruebas. Los resultados de esta investigación forman parte del projecto FAUST (en inglés, Feedback Analysis for User adaptive Statistical Translation), un proyecto del Séptimo Pr, Postprint (published version)
- Published
- 2014
16. Proyecto ALIADO : tecnologías del habla y el lenguaje para un asistente personal
- Author
-
Mariño Acebal, José Bernardo and Rodríguez Hontoria, Horacio
- Subjects
Conversión texto-voz ,Búsqueda de la respuesta ,Agent technology ,Reconocimiento de voz ,Question answering ,Robustez ,Speech recognition ,Agente personal ,Traducción estocástica ,Robustness ,Stochastic speech to speech translation ,Text-to-speech conversion - Abstract
ALIADO aborda el desarrollo de tecnologías del habla y el lenguaje de interés para el diseño de asistentes personales en un entorno plurilingüe. Se dedica especial atención al interfaz oral del usuario con el asistente. Las ayudas facilitadas por el asistente se centra en el uso del lenguaje: la búsqueda de respuesta y la traducción de texto y habla. Finalmente, se propone la construcción de dos demostradores. ALIADO undertakes the developing of spoken and written language technologies for the design of personal assistants in a multilingual environment. Main attention is paid to the design of the oral interface. We consider two examples of language centred help that can be provided by the assistant: “question answering” and text or speech machine translation. The technologies developed will be used to implement two showcases. ALIADO está financiado por el Ministerio de Ciencia y Tecnología como proyecto coordinado (TIC2002-04447-C02).
- Published
- 2003
17. Examen Final
- Author
-
Bellot Pujalte, Pau, Bosio, Mattia, Hernando, J., Casas Pla, Josep Ramon, Salembier Clairon, Philippe Jean, Monte Moreno, Enrique, Mariño Acebal, José Bernardo, Bellot Pujalte, Pau, Bosio, Mattia, Hernando, J., Casas Pla, Josep Ramon, Salembier Clairon, Philippe Jean, Monte Moreno, Enrique, and Mariño Acebal, José Bernardo
- Abstract
Resolved
- Published
- 2013
18. The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural, Formiga Fanals, Lluís, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Rodríguez Fonollosa, José Adrián, Barrón-Cedeño, Alberto, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural, Formiga Fanals, Lluís, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Rodríguez Fonollosa, José Adrián, Barrón-Cedeño, Alberto, and Màrquez Villodre, Lluís
- Abstract
This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard hrasebased Moses systems. Variations include techniques such as morphology generation, training sentence filtering, and domain adaptation through unit derivation. The results show a coherent improvement on TER, METEOR, NIST, and BLEU scores when compared to our baseline system., Postprint (published version)
- Published
- 2013
19. Speech emotion recognition using hidden Markov models
- Author
-
Nogueiras Rodríguez, Albino|||0000-0002-3159-1718, Mariño Acebal, José Bernardo|||0000-0002-9471-8675, Bonafonte Cávez, Antonio|||0000-0002-6240-9915, Moreno Bilbao, M. Asunción|||0000-0002-1823-5970, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Automatic speech recognition ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] ,Processament de la parla - Abstract
This paper introduces a first approach to emotion recognition using RAMSES, the UPC’s speech recognition system. The approach is based on standard speech recognition technology using hidden semi-continuous Markov models. Both the selection of low level features and the design of the recognition system are addressed. Results are given on speaker dependent emotion recognition using the Spanish corpus of INTERFACE Emotional Speech Synthesis Database. The accuracy recognising seven different emotions—the six ones defined in MPEG-4 plus neutral style—exceeds 80% using the best combination of low level features and HMM structure. This result is very similar to that obtained with the same database in subjective evaluation by human judges.
- Published
- 2001
20. Disseny d’una activitat transversal basada en la realització d’un programa de televisió
- Author
-
Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, Traverso Ferrà, Xavier, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, and Traverso Ferrà, Xavier
- Abstract
El marc d’aquest TFM és la família de CFGS d’Imatge i So que es composa de quatre cicles: Realització, Producció, Imatge i So. Aquests estudis estan relacionats directament amb el món audiovisual. L’alumnat d’aquests cicles rep una formació específica per desenvolupar les tasques corresponents a la seva àrea però sovint desconeix les competències de la resta d’especialitats. L’únic moment de la formació on es desenvolupa transversalitat és en els Crèdits de Síntesi que es realitzen a finals del segon curs. Com al món laboral aquests cicles han de treballar junts formant un equip, proposo una activitat que fomenta la transversalitat basada en la realització d’un programa de televisió que es portarà a terme a finals del primer cus. En aquest exercici hi participen membres dels quatre cicles de la família professional formant grups multidisciplinaris de manera aleatòria. L’equip format s’ha de coordinar i organitzar per produir i realitzar un programa de televisió. Les característiques del programa estan acotades però és feina del grup dissenyar el guió i fer la preproducció. La temporització està ben marcada per tal de simular un encàrrec professional. La proposta principal d’implementació és convertir l’activitat en una mena de Crèdit de Síntesi de primer creant un crèdit nou amb les hores de lliure disposició del centre. Cal un gran esforç per encabir l’exercici dins el calendari lectiu i també que el centre disposi dels recursos materials necessaris. L’activitat simula un entorn de treball professional on es treballa en equip però cada membre assumeix el rol que li atorga la seva especialitat. D’aquesta manera es potencien capacitats clau mentre s’aprenen competències de la resta d’àrees fomentant la transversalitat. Es tracta d’una proposta molt potent que millora la formació de l’alumnat, afavorint la creació de perfils professionals en sintonia amb els requeriments del sector audiovisual.
- Published
- 2012
21. Improving English to Spanish out-of-domain translations by morphology generalization and generation
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Formiga Fanals, Lluís, Hernández Huerta, Adolfo, Mariño Acebal, José Bernardo, Monte Moreno, Enrique, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Formiga Fanals, Lluís, Hernández Huerta, Adolfo, Mariño Acebal, José Bernardo, and Monte Moreno, Enrique
- Abstract
This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-ofdomain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and targetlanguage sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-ofdomain data., Peer Reviewed, Postprint (published version)
- Published
- 2012
22. The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Formiga Fanals, Lluís, Henríquez Quintana, Carlos Alberto, Hernández Huerta, Adolfo, Mariño Acebal, José Bernardo, Monte Moreno, Enrique, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Formiga Fanals, Lluís, Henríquez Quintana, Carlos Alberto, Hernández Huerta, Adolfo, Mariño Acebal, José Bernardo, Monte Moreno, Enrique, and Rodríguez Fonollosa, José Adrián
- Abstract
This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method., Peer Reviewed, Postprint (published version)
- Published
- 2012
23. Monolingual and bilingual spanish-catalan speech recognizers developed from SpeechDat databases
- Author
-
Mariño Acebal, José Bernardo, Padrell, J, Moreno Bilbao, M. Asunción, Nadeu Camprubí, Climent, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
Under the SpeechDat specifications, the Spanish member of SpeechDat consortium has recorded a Catalan database that includes one thousand speakers. This communication describes some experimental work that has been carried out using both the Spanish and the Catalan speech material. A speech recognition system has been trained for the Spanish language using a selection of the phonetically balanced utterances from the 4500 SpeechDat training sessions. Utterances with mispronounced or incomplete words and with intermittent noise were discarded. A set of 26 allophones was selected to account for the Spanish sounds and clustered demiphones have been used as context dependent sub-lexical units. Following the same methodology, a recognition system was trained from the Catalan SpeechDat database. Catalan sounds were described with 32 allophones. Additionally, a bilingual recognition system was built for both the Spanish and Catalan languages. By means of clustering techniques, the suitable set of allophones to cover simultaneously both languages was determined. Thus, 33 allophones were selected. The training material was built by the whole Catalan training material and the Spanish material coming from the Eastern region of Spain (the region where Catalan is spoken). The performance of the Spanish, Catalan and bilingual systems were assessed under the same framework. The Spanish system exhibits a significantly better performance than the rest of systems due to its better training. The bilingual system provides an equivalent performance to that afforded by both language specific systems trained with the Eastern Spanish material or the Catalan SpeechDat corpus.
- Published
- 2000
24. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Mariño Acebal, José Bernardo, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Poch, Marc, Hernández Huerta, Adolfo, Herníquez, Carlos, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Mariño Acebal, José Bernardo, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Poch, Marc, Hernández Huerta, Adolfo, Herníquez, Carlos, and Rodríguez Fonollosa, José Adrián
- Abstract
This work aims to improve anN-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource, Postprint (published version)
- Published
- 2011
25. Pla d'Entorn (Pla d'Autonomia de Centre) d'un IES d'extraradi de BCN
- Author
-
Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, Arriasol Sabartés, Joan, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, and Arriasol Sabartés, Joan
- Published
- 2011
26. Automatic and human evaluation study of a rule-based and a statistical Catalan-Spanish machine translation systems
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Ruiz Costa-Jussà, Marta, Farrús Cabeceran, Mireia, Mariño Acebal, José Bernardo, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Ruiz Costa-Jussà, Marta, Farrús Cabeceran, Mireia, Mariño Acebal, José Bernardo, and Rodríguez Fonollosa, José Adrián
- Abstract
Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanishto- Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern., Postprint (published version)
- Published
- 2011
27. L’aula i l’accés a Internet
- Author
-
Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, Domingo Alonso, Laura, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, and Domingo Alonso, Laura
- Abstract
Català: La revolució tecnològica ha influït de manera significativa en la nostra societat. Les noves generacions han crescut immerses en les tecnologies de la informació, el que ha dut als centres educatius a haver de revisar les metodologies emprades fins al moment i proposar-ne de noves. Les TIC, i més concretament Internet, poden resultar un bon recurs a fer servir a les aules: per a motivar als alumnes, per a plantejar activitats d’atenció a la diversitat, etc. No obstant, la utilització de la xarxa comporta tota una sèrie de problemàtiques: la manca de formació docent, la necessitat de garantir la seguretat i el filtratge de continguts, etc. En aquest projecte volem parlar de la necessitat d’anar incorporant les TIC i Internet a les aules, però fent servir eines que ens permetin tenir seguretat i que ens ajudin a no perdre el control del que passa dins l’aula. Proposarem la utilització de proxys per a aconseguir filtrar l’accés a Internet. D’altra banda, proposarem la utilització de programes de control i gestió d’aula com a suport al professorat. Aquestes eines permeten als docents tenir coneixement del què està succeint a classe, en temps real, el que permet gaudir dels avantatges de la utilització d’Internet reduint els problemes associats.
- Published
- 2011
28. La programació d’ordinadors com a eina pedagògica a l’ESO
- Author
-
Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, López Rodríguez, Pau, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Mariño Acebal, José Bernardo, and López Rodríguez, Pau
- Published
- 2010
29. Linguistic-based evaluation criteria to identify statistical machine translation errors
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, and Rodríguez Fonollosa, José Adrián
- Abstract
Machine translation evaluation methods are highly necessary in order to analyze the performance of translation systems. Up to now, the most traditional methods are the use of automatic measures such as BLEU or the quality perception performed by native human evaluations. In order to complement these traditional procedures, the current paper presents a new human evaluation based on the expert knowledge about the errors encountered at several linguistic levels: orthographic, morphological, lexical, semantic and syntactic. The results obtained in these experiments show that some linguistic errors could have more influence than other at the time of performing a perceptual evaluation., Postprint (published version)
- Published
- 2010
30. GILABVIR: Virtual laboratories and remote laboratories in engineering. A teaching innovation group of interest
- Author
-
Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament d'Enginyeria Elèctrica, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, Universitat Politècnica de Catalunya. Departament d'Enginyeria Hidràulica, Marítima i Ambiental, Universitat Politècnica de Catalunya. LIM/UPC - Laboratori d'Enginyeria Marítima, Universitat Politècnica de Catalunya. IEB - Instrumentació Electrònica i Biomèdica, Universitat Politècnica de Catalunya. CITCEA - Centre d'Innovació Tecnològica en Convertidors Estàtics i Accionaments, Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering, Universitat Politècnica de Catalunya. SPCOM - Grup de Recerca de Processament del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Universitat Politècnica de Catalunya. ANTENNALAB - Grup d'Antenes i Sistemes Radio, Cabrera-Bean, Margarita, Bragós Bardia, Ramon, Pérez, Marimar, Mariño Acebal, José Bernardo, Rius Casals, Juan Manuel, Gomis Bellmunt, Oriol, Casany Guerrero, María José, Gironella Cobos, Xavier, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament d'Enginyeria Elèctrica, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, Universitat Politècnica de Catalunya. Departament d'Enginyeria Hidràulica, Marítima i Ambiental, Universitat Politècnica de Catalunya. LIM/UPC - Laboratori d'Enginyeria Marítima, Universitat Politècnica de Catalunya. IEB - Instrumentació Electrònica i Biomèdica, Universitat Politècnica de Catalunya. CITCEA - Centre d'Innovació Tecnològica en Convertidors Estàtics i Accionaments, Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering, Universitat Politècnica de Catalunya. SPCOM - Grup de Recerca de Processament del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Universitat Politècnica de Catalunya. ANTENNALAB - Grup d'Antenes i Sistemes Radio, Cabrera-Bean, Margarita, Bragós Bardia, Ramon, Pérez, Marimar, Mariño Acebal, José Bernardo, Rius Casals, Juan Manuel, Gomis Bellmunt, Oriol, Casany Guerrero, María José, and Gironella Cobos, Xavier
- Abstract
GILABVIR (Grup d’Interès en Laboratoris Virtuals i Remots) is a recently created Virtual and Remote Laboratory Group of Interest of UPC (Universitat Politècnica de Catalunya) and it is integrated in a more general teaching innovation project. RIMA [1], [2]. RIMA has been developed to promote research on the use of innovative learning methodologies applied to engineering education and it was specially created to assess in the new European higher education adaptation process., Postprint (published version)
- Published
- 2010
31. The TALP on-line Spanish-Catalan machine-translation system
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Poch, M, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Hernández, Adolfo, Henríquez Quintana, Carlos Alberto, Rodríguez Fonollosa, José Adrián, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Poch, M, Farrús Cabeceran, Mireia, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Hernández, Adolfo, Henríquez Quintana, Carlos Alberto, and Rodríguez Fonollosa, José Adrián
- Abstract
In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described., Postprint (published version)
- Published
- 2009
32. Desarrollo de un traductor automático estadístico catalán/castellano
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Mariño Acebal, José Bernardo, Muñoz Sánchez, Roberto, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Mariño Acebal, José Bernardo, and Muñoz Sánchez, Roberto
- Published
- 2009
33. A second opinion approach for speech recognition verification
- Author
-
Hernández-Ábrego, G, Mariño Acebal, José Bernardo, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
In order to improve the reliability of speech recognition results, a verifying system, that takes profit of the information given from an alternative recognition step is proposed. The alternative results are considered as a second opinion about the nature of the speech recognition process. Some features are extracted from both opinion sources and compiled, through a fuzzy inference system, into a more discriminant confidence measure able to verify correct results and disregard wrong ones. This approach is tested in a keyword spotting task taken form the Spanish SpeechDat database. Results show a considerable reduction of false rejections at a fixed false alarm rate compared to baseline systems.
- Published
- 1999
34. Fuzzy reasoning in confidence evaluation of speech recognition
- Author
-
Hernández-Abrego, G, Mariño Acebal, José Bernardo, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
Confidence measures represent a systematic way to express reliability of speech recognition results. A common approach to confidence measuring is to take profit of the information that several recognition-related features offer and to combine them, through a given compilation mechanism , into a more effective way to distinguish between correct and incorrect recognition results. We propose to use a fuzzy reasoning scheme to perform the information compilation step. Our approach opposes the previously proposed ones because ours treats the uncertainty of recognition hypotheses in terms of
- Published
- 1999
35. Minimum confusibility training of context dependent demiphones
- Author
-
Nogueiras Rodríguez, Albino|||0000-0002-3159-1718, Mariño Acebal, José Bernardo|||0000-0002-9471-8675, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
During the last years two different approaches have been widely used in order to improve the acoustic modeling in continuous speech recognition systems: discriminative training algorithms and context dependent subword units. However, while the use of each of these techniques leads to much better results than standard maximum likelihood trained phone models, their combination, i.e. discriminative training of context dependent units, has revealed to be a much more dificult task. In this paper we deal with minimum confusibility training of demiphones using TIMIT database. By applying this approach recently introduced by the authors, the string error rate in the recognition of TIDIGITS using demiphones is reduced some 24% with respect to maximum likelihood training. This improvement is added to the 8% reduction already provided by demiphones with respect to minimum confusibility trained phones.
- Published
- 1999
36. The TALP & I2R SMT Systems for IWSLT 2008
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Li, H., Aw, A., Zhang, M., Khalilov, Maxim, Ruiz Costa-Jussà, Marta, Henríquez Quintana, Carlos Alberto, Rodríguez Fonollosa, José Adrián, Hernández, A., Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, Chen, B., Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Li, H., Aw, A., Zhang, M., Khalilov, Maxim, Ruiz Costa-Jussà, Marta, Henríquez Quintana, Carlos Alberto, Rodríguez Fonollosa, José Adrián, Hernández, A., Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, and Chen, B.
- Abstract
This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks., Postprint (published version)
- Published
- 2008
37. System combination for machine translation for spoken and written language
- Author
-
Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Matusov, Evgeny, Leusch, Gregor, Federico, Marcello, Mariño Acebal, José Bernardo, Ney, Hermann, Bertoldi, Nicola, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Matusov, Evgeny, Leusch, Gregor, Federico, Marcello, Mariño Acebal, José Bernardo, Ney, Hermann, and Bertoldi, Nicola
- Abstract
This paper describes a recently developed method for computing a consensus translation from the outputs of multiple machine translation (MT) systems. A possibly new translation hypothesis can be produced as a result of this system combination algorithm. The consensus translation is computed by creating a confusion network and performing weighted majority voting, similarly to the well-established ROVER approach of (Fiscus 1997) for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original machine translation hypotheses are learned by using an enhanced statistical alignment algorithm that explicitly models word reordering. This is the first known application of this algorithm in the context of system combination. The context of a whole document of translations rather than a single sentence is taken into account to improve the alignment quality. The proposed alignment and voting approach was evaluated on several machine translation tasks, including a large vocabulary task. The method was also tested in the framework of multi- source and speech translation. Significant improvements in translation quality were achieved on all tasks. Here, we report experimental results for combining MT systems participating in the TC-STAR (speech translation) Project., Peer Reviewed, Postprint (published version)
- Published
- 2008
38. On the impact of morphology in English to Spanish statistical MT
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Gispert Ramis, Adrià de, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Gispert Ramis, Adrià de, and Mariño Acebal, José Bernardo
- Abstract
This paper presents a thorough study of the impact of morphology derivation on N-gram-based Statistical Machine Translation (SMT) models from English into a morphology-rich language such as Spanish. For this purpose, we define a framework under the assumption that a certain degree of morphology-related information is not only being ignored by current statistical translation models, but also has a negative impact on their estimation due to the data sparseness it causes. Moreover, we describe how this information can be decoupled from the standard bilingual N-gram models and introduced separately by means of a well-defined and better informed feature-based classification task. Results are presented for the European Parliament Plenary Sessions (EPPS) English ¿ Spanish task, showing oracle scores based on to what extent SMT models can benefit from simplifying Spanish morphological surface forms for each Part-Of-Speech category. We show that verb form morphological richness greatly weakens the standard statistical models, and we carry out a posterior morphology classification by defining a simple set of features and applying machine learning techniques. In addition to that, we propose a simple technique to deal with Spanish enclitic pronouns. Both techniques are empirically evaluated and final translation results show improvements over the baseline by just dealing with Spanish morphology. In principle, the study is also valid for translation from English into any other Romance language (Portuguese, Catalan, French, Galician, Italian, etc.). The proposed method can be applied to both monotonic and non-monotonic decoding scenarios, thus revealing the interaction between word-order decoding and the proposed morphology simplification techniques. Overall results achieve statistically significant improvement over baseline performance in this demanding task., Peer Reviewed, Postprint (published version)
- Published
- 2008
39. An adaptive gradient-search based algorithm for discriminative training of hmm's
- Author
-
Nogueiras Rodríguez, Albino, Mariño Acebal, José Bernardo, Monte Moreno, Enrique, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Natural language processing (Computer science) ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,Processament en llenguatge natural (Informàtica) - Abstract
Although having revealed to be a very powerful tool in acoustic modelling, discriminative training presents a major drawback: the lack of a formulation guaranteeing convergence in no matter which initial conditions, such as the Baum-Welch algorithm in maximum likelihood training. For this reason, a gradient descent search is usually used in this kind of problem. Unfortunately, standard gradient descent algorithms rely heavily on the election of the learning rates. This dependence is specially cumbersome because it represents that, at each run of the discriminative training procedure, a search should be carried out over the parameters ruling the algorithm. In this paper we describe an adaptive procedure for determining the optimal value of the step size at each iteration. While the calculus and memory overhead of the algorithm is negligible, results show less dependence on the initial learning rate than standard gradient descent and, using the same idea in order to apply self-scaling, it clearly outperforms it.
- Published
- 1998
40. Using x-gram for efficient speech recognition
- Author
-
Bonafonte Cávez, Antonio, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
X-grams are a generalization of the n-grams, where the number of previous conditioning words is different for each case and decided from the training data. X-grams reduce perplexity with respect to trigrams and need less number of parameters. In this paper, the representation of the x-grams using finite state automata is considered. This representation leads to a new model, the non-deterministic x-grams, an approximation that is much more efficient, suffering small degradation on the modeling capability. Empirical experiments for a continuous speech recognition task show how, for each ending word, the number of transitions is reduced from 1222 (the size of the lexicon) to around 66.
- Published
- 1998
41. Spanish dialects: phonetic transcription
- Author
-
Moreno Bilbao, M. Asunción|||0000-0002-1823-5970, Mariño Acebal, José Bernardo|||0000-0002-9471-8675, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
It is well known that canonical Spanish, the dialectal variant `central' of Spain, so called Castilian, can be transcribed by rules. This paper deals with the automatic grapheme to phoneme transcription rules in several Spanish dialects from Latin America. Spanish is a language spoken by more than 300 million people, has an important geographical dispersion compared among other languages and has been historically influenced by many native languages. In this paper authors expand the Castilian transcription rules to a set of different dialectal variants of Latin America. Transcriptions are based on SAMPA symbols. The paper includes an identification of sounds that doesn't appear in Castilian, extend accepted SAMPA symbols for Spanish (Castilian) to different dialectal variants, describes the necessary rules to implement an automatic Orthographic to Phonetic transcription in several dialectal Spanish variants and show some quantitative results of dialectal differences.
- Published
- 1998
42. Low delay phone recognition
- Author
-
Rodríguez Fonollosa, José Adrián, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,education ,Telecommunication ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Published
- 1998
43. Introducing linguistic knowledge into statistical machine translation.
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Mariño Acebal, José Bernardo, Gispert Ramis, Adrià, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Mariño Acebal, José Bernardo, and Gispert Ramis, Adrià
- Abstract
Aquesta tesi està dedicada a l'estudi de la utilització de informació morfosintàctica en el marc dels sistemes de traducció estocàstica, amb l'objectiu de millorar-ne la qualitat a través de la incorporació de informació lingüística més enllà del nivell simbòlic superficial de les paraules. El sistema de traducció estocàstica utilitzat en aquest treball segueix un enfocament basat en tuples, unitats bilingües que permeten estimar un model de traducció de probabilitat conjunta per mitjà de la combinació, dins un entorn log-linial, de cadenes d'n-grames i funcions característiques addicionals. Es presenta un estudi detallat d'aquesta aproximació, que inclou la seva transformació des d'una implementació d'X-grames en autòmats d'estats finits, més orientada a la traducció de veu, cap a l'actual solució d'n-grames orientada a la traducció de text de gran vocabulari. La tesi estudia també les fases d'entrenament i decodificació, així com el rendiment per a diferents tasques (variant el tamany dels corpora o el parell d'idiomes) i els principals problemes reflectits en les anàlisis d'error. La tesis també investiga la incorporació de informació lingüística específicament en aliniament per paraules. Es proposa l'extensió mitjançant classificació de formes verbals d'un algorisme d'aliniament paraula a paraula basat en co-ocurrències, amb resultats positius. Així mateix, s'avalua de forma empírica l'impacte en qualitat d'aliniament i de traducció que s'obté mitjançant l'etiquetatge morfològic, la lematització, la classificació de formes verbals i el truncament o stemming del text paral·lel. Pel que fa al model de traducció, es proposa un model de tractament de les formes verbals per mitjà d'un model de instanciació addicional, i es realitzen experiments en la direcció d'anglès a castellà. La tesi també introdueix un model de llenguatge d'etiquetes morfològiques del destí per tal d'abordar problemes de concordança. Final, This Ph.D. thesis dissertation addresses the use of morphosyntactic information in order to improve the performance of Statistical Machine Translation (SMT) systems, providing them with additional linguistic information beyond the surface level of words from parallel corpora. The statistical machine translation system in this work here follows a tuple-based approach, modelling joint-probability translation models via log-linear combination of bilingual n-grams with additional feature functions. A detailed study of the approach is conducted. This includes its initial development from a speech-oriented Finite-State Transducer architecture implementing X-grams towards a large-vocabulary text-oriented n-grams implementation, training and decoding particularities, portability across language pairs and tasks, and main difficulties as revealed in error analyses. The use of linguistic knowledge to improve word alignment quality is also studied. A cooccurrence-based one-to-one word alignment algorithm is extended with verb form classification with successful results. Additionally, we evaluate the impact in word alignment and translation quality of Part-Of-Speech, base form, verb form classification and stemming on state-of-art word alignment tools. Furthermore, the thesis proposes a translation model tackling verb form generation through an additional verb instance model, reporting experiments in English-to-Spanish tasks. Disagreement is addressed via incorporating a target Part-Of-Speech language model. Finally, we study the impact of morphology derivation on Ngram-based SMT formulation, empirically evaluating the quality gain that is to be gained via morphology reduction., Postprint (published version)
- Published
- 2007
44. Técnicas robustas de reconocimiento del habla en ambientes adversos
- Author
-
Hernando Pericás, Francisco Javier, Nadeu Camprubí, Climent, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Etiquetado múltiple ,Predicción lineal de la parte causal de la autocorrelación ,Processament de la parla ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] ,Filtrado de parámetros espectrales ,Speech processing systems ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] ,Reconocimiento del habla - Abstract
El comportamiento de los sistemas actuales de reconocimiento del habla se degrada rápidamente en presencia de ruido de fondo. Recientemente, se ha propuesto una técnica de representación de la señal de voz basada en la predicción lineal de la parte causal de la autocorrelación (OSALPC) que ha mostrado ser atractiva para el reconocimiento de habla ruidosa debido a sus altas prestaciones con respecto a la predicción lineal (LPC) convencional en condiciones severas de ruido blanco aditivo y a su simplicidad computacional. El propósito de este artículo es doble: 1) mostrar que la técnica OSALPC obtiene también buenas prestaciones en un entorno ruidoso real (ruido de coche), y 2) explorar su combinación con varias técnicas robustas de medida de similitud, mostrando que sus prestaciones mejoran aún más filtrando convenientemente los parámetros espectrales y realizando un etiquetado múltiple de los mismos. | The performance of the existing speech recognition systems degrades rapidly in the presence of background noise. A novel representation of the speech signal, which is based on Linear Prediction of the One-Sided Autocorrelation sequence (OSALPC), has shown to be attractive to speech recognition because of both its high recognition performance with respect to the standard LPC in severe conditions of additive white noise and its computational simplicity. The aim of this work is twofold: 1) to show that OSALPC also achieves good performance in a case of real noisy speech (in a car environment), and 2) to explore its combination with several robust similarity measuring techniques, showing that its performance even improves by filtering and multilabeling conveniently the spectral parameters.
- Published
- 1997
45. Formació Integral dels tècnics i continguts sociohumanístics
- Author
-
Nadeu Camprubí, Climent|||0000-0002-5863-0983 and Mariño Acebal, José Bernardo|||0000-0002-9471-8675
- Subjects
Educació social ,education ,Telecommunication ,Telecomunicació -- Revistes ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Published
- 1997
46. Joint training of codebooks and acoustic models in automatic speech recognition using semi-continuous HMMs
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Nogueiras Rodríguez, Albino, Caballero Galeote, Mónica, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Nogueiras Rodríguez, Albino, Caballero Galeote, Mónica, and Mariño Acebal, José Bernardo
- Abstract
In this paper, three different techniques for building semicontinuousHMMbased speech recognisers are compared: the classical one, using Euclidean generated codebooks and independently trained acoustic models; jointly reestimating the codebooks and models obtained with the classical method; and jointly creating codebooks and models growing their size from one centroid to the desired number of them. The way this growth may be done is carefully addressed, focusing on the selection of the splitting direction and the way splitting is implemented. Results in a large vocabulary task show the ef ciency of the approach, with noticeable improvements both in accuracy and CPU consumption. Moreover, this scheme enables the use of the concatenation of features, avoiding the independence assumption usually needed in semi-continuous HMM modelling, and leading to further improvements in accuracy and CPU., Peer Reviewed, Postprint (published version)
- Published
- 2006
47. N-gram-based Machine Translation
- Author
-
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, Crego Clemente, Josep Maria, Gispert Brosa, Adrian de, Lambert, Patrik, Rodríguez Fonollosa, José Adrián, Ruiz Costa-Jussà, Marta, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, Crego Clemente, Josep Maria, Gispert Brosa, Adrian de, Lambert, Patrik, Rodríguez Fonollosa, José Adrián, and Ruiz Costa-Jussà, Marta
- Abstract
This article describes in detail an n-gram approach to statistical machine translation. This approach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS)., Peer Reviewed
- Published
- 2006
48. Modelado de la señal en reconocimiento de habla ruidosa
- Author
-
Pascual, E, Hernando Pericás, Francisco Javier|||0000-0002-1730-8154, Mariño Acebal, José Bernardo|||0000-0002-9471-8675, Gustavo, H, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] ,Processament de la parla ,Speech processing systems ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
Conventional modelling techniques of speech suffer a very big performance degradation in adverse noisy environments. So, it is necessary to research for more robust representations of speech signal. This paper presents new models that have succeeded in adverse environments. They are hybrid models of the classical parametrizations techniques used so far that have demonstrated being very useful in order to obtain good results in different noisy environments. In order to prove the their performance we have used white and machine noise in our experiments.
- Published
- 1996
49. Modelo estocástico de traducción basado en N-gramas de tuplas bilingües y combinación log-lineal de características
- Author
-
Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, Crego Clemente, Josep María, Gispert Ramis, Adrià de, Lambert, Patrik, Rodríguez Fonollosa, José Adrián, Ruiz Costa-Jussà, Marta, Mariño Acebal, José Bernardo, Banchs Martínez, Rafael Enrique, Crego Clemente, Josep María, Gispert Ramis, Adrià de, Lambert, Patrik, Rodríguez Fonollosa, José Adrián, and Ruiz Costa-Jussà, Marta
- Abstract
En esta comunicación se presenta un sistema de traducción estocástica basado en el modelado mediante N-gramas de la probabilidad conjunta de textos bilingües. La unidad básica del modelo es la tupla, par de cadenas de palabras del lenguaje fuente (a traducir) y el lenguaje destino (traducción). La traducción se lleva a cabo mediante la maximización de una combinación lineal de los logaritmos de la probabilidad asignada a la traducción por el modelo de traducción y otras características, siguiendo la aproximación de entropía máxima. Las prestaciones del sistema de traducción son evaluadas con una tarea de traducción del habla: la traducción entre inglés y español (y viceversa) de transcripciones de intervenciones de los miembros del Parlamento Europeo. Los resultados alcanzados se encuentran al nivel del estado del arte., This communication introduces a stochastic machine translation system based on Ngram modelling of the joint probability of bilingual texts. The basic unit of this model is called a tuple and consists of a pair of both source (to be translated) language and target language (translation) word-strings. Translation is driven by a log-linear combination of the N-gram model probability and other features, according to the maximum entropy language modelling approach. The translation performance is evaluated by means of a speech-to-speech translation tasks: translation from Spanish to English (and viceversa) of European Parliament speeches. The system reaches a state-of-art performance.
- Published
- 2005
50. Maximum likelihood based discriminative training of acoustic models
- Author
-
Nogueiras Rodríguez, Albino|||0000-0002-3159-1718, Mariño Acebal, José Bernardo, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Telecomunicació ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Sound ,Telecommunication ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] - Abstract
In this paper, a framework for discriminative training of acoustic models based on Generalised Probabilistic Descent (GPD) method is presented. The key feature of our proposal, Maximum Likelihood based Discriminative Training of Acoustic Models (MLDT), is the use of maximum likelihood trained HMM's instead of the original speech signal. We focus our attention in performing discriminative training applied to a discrete hidden Markov models continuos speech recogniser, achieving a 4.6% error rate reduction on a Spanish speaker-independent phoneme recognition task.
- Published
- 1995
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.