20 results on '"Màrquez Villodre, Lluís"'
Search Results
2. Exploring the construction of semantic class classifiers for WSD
- Author
-
Villarejo Muñoz, Luis, Màrquez Villodre, Lluís, and Rigau Claramunt, German
- Subjects
SUMO ,Machine learning ,Ontologies ,WSD ,Semantic Fields ,Ontologías ,Aprendizaje automático - Abstract
El objetivo de este artículo es dar a conocer la metodología, los resultados y las futuras líneas de investigación que se derivan de una novedosa aproximación a la tarea de Word Sense Disambiguation (WSD). Dicha aproximación consiste, a grandes rasgos, en la construcción de clasificadores para distintas clases semánticas y su posterior combinación. De esta forma, esperamos contar no sólo con sistemas de WSD con una granularidad más gruesa que la ofrecida comúnmente por los sistemas basados en sentidos de WordNet, sino también con sistemas que ofrezcan distintas perspectivas del problema de forma que su combinación resulte beneficiosa para el resultado final de la tarea. The aim of this paper is describing the experiments, results achieved and further work, in a novel approach to the Word Sense Disambiguation (WSD) task. This novel approach consists, mainly, in the learning and combination of several semantic class classifiers. So we can not only get WSD systems with coarser granularity than the traditionally offered by WordNet senses, but also systems showing different views of the task which allows us to improve the overall task results. This research has been supported by the European project MEANING (IST-2001-34460).
- Published
- 2005
3. 3LB-LEX : léxico verbal con frames sintáctico-semánticos
- Author
-
Civit Torruella, Montserrat, Aldezabal Roteta, Izaskun, Pociello Irigoyen, Elisabete, Taulé Delor, Mariona, Aparicio Mera, Juan José, Màrquez Villodre, Lluís, Navarro Colorado, Francisco de Borja, Castellví Vives, Joan, and Martí Antonín, Maria Antònia
- Subjects
Semantic annotation ,Thematic roles ,Verbal lexicon ,Papeles temáticos ,Léxico verbal ,Anotación semántica - Abstract
La creación de léxicos (verbales) computacionales es larga y costosa. A partir de los corpora creados en el proyecto 3LB se deriva un léxico verbal con información sintáctica y semántica (synsets de EWN). A partir de esta información se establece la correspondencia entre funciones sintácticas y papeles temáticos para cada sentido de cada verbo. El último paso será el etiquetado de los corpora con papeles temáticos. Como resultado, los corpora de 3LB se habrán enriquecido con el etiquetado de papeles temáticos y el léxico verbal con los frames semánticos. The creation of computational (verbal) lexicons is a time-consuming task. From the corpora created at the 3LB project, a verbal lexicon with syntactic and semantic (synsets from EWN) information is being built. From this information the correspondence between syntactic functions and thematic roles for each verb sense will set. The last step will be the tagging of the corpora with thematic roles. As a result of this, the 3LB corpora will be enriched with the tagging of thematic roles and the verbal lexicon with the semantic frames. Este trabajo ha sido parcialmente financiado por los proyectos XTRACT-2 (BFF2002-04226-C03-03), CESS-CE (HUM2004-21127-E) y R2D2 (TIC-2003-07158-C04-01).
- Published
- 2005
4. MT techniques in a retrieval system of semantically enriched patents
- Author
-
González Bermúdez, Meritxell, Mateva, Maria, Enache, Ramona, España Bonet, Cristina, Màrquez Villodre, Lluís, Popov, Borislav, Ranta, Aarne, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Traducció automàtica ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,Machine translating - Abstract
This paper focuses on how automatic translation techniques integrated in a patent retrieval system increase its capabilities and make possible extended features and functionalities. We describe 1) a novel methodology for natural language to SPARQL translation based on a grammar– ontology interoperability automation and a query grammar for the patents domain; 2) a devised strategy for statisticalbased translation of patents that allows to transfer semantic annotations to the target language; 3) a built-in knowledge representation infrastructure that uses multilingual semantic annotations; and 4) an online application that offers a multilingual search interface over structural knowledge databases (domain ontologies) and multilingual documents (biomedical patents) that have been automatically translated.
5. A graphical interface for MT evaluation and error analysis
- Author
-
González Bermúdez, Meritxell, Giménez, J., Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Automatic translations ,Traducció automàtica ,Informàtica::Intel·ligència artificial [Àrees temàtiques de la UPC] ,ASIYA (Toolkit) - Abstract
Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis.
6. A joint model for parsing syntactic and semantic dependencies
- Author
-
Lluis Martorell, Xavier, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Parsing (Computer grammar) ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,Analitzadors sintàctics - Abstract
This paper describes a system that jointly parses syntactic and semantic dependencies, presented at the CoNLL-2008 shared task (Surdeanu et al., 2008). It combines online Peceptron learning (Collins, 2002) with a parsing model based on the Eisner algorithm (Eisner, 1996), extended so as to jointly assign syntactic and semantic labels. Overall results are 78.11 global F1, 85.84 LAS, 70.35 semantic F1. Official results for the shared task (63.29 global F1; 71.95 LAS; 54.52 semantic F1) were significantly lower due to bugs present at submission time.
7. A second-order joint Eisner model for syntactic and semantic dependency parsing
- Author
-
Lluis Martorell, Xavier, Bott, Stefan Markus, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Informàtica::Aplicacions de la informàtica [Àrees temàtiques de la UPC] ,Natural language processing (Computer science) ,Computational linguistics -- Research ,Lingüística computacional ,Llenguatge natural (Informàtica) -- Processament - Abstract
We present a system developed for the CoNLL-2009 Shared Task (Hajic et al., 2009). We extend the Carreras (2007) parser to jointly annotate syntactic and semantic dependencies. This state-of-the-art parser factorizes the built tree in second-order factors. We include semantic dependencies in the factors and extend their score function to combine syntactic and semantic scores. The parser is coupled with an on-line averaged perceptron (Collins, 2002) as the learning method. Our averaged results for all seven languages are 71.49 macro F1, 79.11 LAS and 63.06 semantic F1.
8. The TALP-UPC approach to system selection: ASIYA features and pairwise classification using random forests
- Author
-
Formiga Fanals, Lluís, González Bermúdez, Meritxell, Barrón-Cedeño, Alberto, Rodríguez Fonollosa, José Adrián|||0000-0001-9513-7939, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural, and Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
- Subjects
Education--Experimental methods ,Translators ,Ensenyament -- Mètodes experimentals ,Traductors (Programes d'ordinador) ,Ensenyament i aprenentatge [Àrees temàtiques de la UPC] ,Ensenyament i aprenentatge::Metodologies docents [Àrees temàtiques de la UPC] - Abstract
This paper describes the TALP-UPC participation in the WMT’13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English and 97 for English-to-Spanish) ranging from standard QE features to features based on pseudo-references and semantic similarity. We approached system selection by means of pairwise ranking decisions. For that, we learned Random Forest classifiers especially tailored for the problem. Evaluation at development time showed considerably good results in a cross-validation experiment, with Kendall’s values around 0.30. The results on the test set dropped significantly, raising different discussions to be taken into account.
9. On the portability and tuning of supervised word sense disambiguation systems
- Author
-
Escudero Bakx, Gerard|||0000-0002-4914-1686, Màrquez Villodre, Lluís, Rigau Claramunt, German, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Informàtica [Àrees temàtiques de la UPC] ,LazyBoosting algorithm ,Natural language processing ,Machine learning ,Word sense disambiguation ,Portability and tuning of NLP systems ,WSD - Abstract
This report describes a set of experiments carried out to explore the portability of alternative supervised Word Sense Disambiguation algorithms. The aim of the work is threefold: firstly, studying the performance of these algorithms when tested on a different corpus from that they were trained on; secondly, exploring their ability to tune to new domains, and thirdly, demonstrating empirically that the LazyBoosting algorithm outperforms state-of-the-art supervised WSD algorithms in both previous situations.
10. Naive Bayes and exemplar-based approaches to word sense disambiguation revisited
- Author
-
Escudero Bakx, Gerard, Màrquez Villodre, Lluís, Rigau Claramunt, German, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Naive Bayes ,Exemplar--based classification ,Word sense disambiguation ,Informàtica::Intel·ligència artificial [Àrees temàtiques de la UPC] ,WSD - Abstract
This paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar--based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusing information about the comparison between both methods appearing in the related literature. In doing so, several directions have been explored, including: testing several modifications of the basic learning algorithms and varying the feature space. Secondly, an improvement of both algorithms is proposed, in order to deal with large attribute sets. This modification, which basically consists in using only the positive information appearing in the examples, allows to improve greatly the efficiency of the methods, with no loss in accuracy. The experiments have been performed on the largest sense--tagged corpus available containing the most frequent and ambiguous English words. Results show that the Exemplar--based approach to WSD is generally superior to the Bayesian approach, especially when a specific metric for dealing with symbolic attributes is used.
11. Machine learning and natural language processing
- Author
-
Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Informàtica [Àrees temàtiques de la UPC] ,Natural language processing ,Machine learning ,Word sense disambiguation ,WSD ,ML ,NLP ,Supervised learning - Abstract
In this report, some collaborative work between the fields of Machine Learning (ML) and Natural Language Processing (NLP) is presented. The document is structured in two parts. The first part includes a superficial but comprehensive survey covering the state-of-the-art of machine learning techniques applied to natural language learning tasks. In the second part, a particular problem, namely Word Sense Disambiguation (WSD), is studied in more detail. In doing so, four algorithms for supervised learning, which belong to different families, are compared in a benchmark corpus for the WSD task. Both qualitative and quantitative conclusions are drawn.
12. Real-life translation quality estimation for MT system selection
- Author
-
Formiga Fanals, Lluís, Màrquez Villodre, Lluís, Pujantell Traserra, Jaume, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Ensenyament i aprenentatge::Aprenentatge de llengües [Àrees temàtiques de la UPC] ,Translations ,Traducció -- Investigació - Abstract
Research on translation quality annotation and estimation usually makes use of standard language, sometimes related to a specific language genre or domain. However, real-life machine translation (MT), performed for instance by on-line translation services, has to cope with some extra dif- ficulties related to the usage of open, non-standard and noisy language. In this paper we study the learning of quality estimation (QE) models able to rank translations from real-life input according to their goodness without the need of translation references. For that, we work with a corpus collected from the 24/7 Reverso.net MT service, translated by 5 different MT systems, and manually annotated with quality scores. We define several families of features and train QE predictors in the form of regressors or direct rankers. The predictors show a remarkable correlation with gold standard rankings and prove to be useful in a system combination scenario, obtaining better results than any individual translation system.
13. A hybrid system for patent translation
- Author
-
Enache, Ramona, España Bonet, Cristina, Ranta, Aarne, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Patents d'invenció -- Traducció ,Patents--Translations ,Traducció automàtica ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] - Abstract
This work presents a HMT system for patent translation. The system exploits the high coverage of SMT and the high precision of an RBMT system based on GF to deal with specific issues of the language. The translator is specifically developed to translate patents and it is evaluated in the English-French language pair. Although the number of issues tackled by the grammar are not extremely numerous yet, both manual and automatic evaluations consistently show their preference for the hybrid system in front of the two individual translators.
14. Patent translation within the MOLTO project
- Author
-
España Bonet, Cristina, Enache, Ramona, Slaski, Adam, Ranta, Aarne, Màrquez Villodre, Lluís, González Bermúdez, Meritxell, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Traducció automàtica -- Congressos ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,MOLTO (European project) ,Patents ,Patents -- Machine translating - Abstract
MOLTO is an FP7 European project whose goal is to translate texts between multiple languages in real time with high quality. Patents translation is a case of study where research is focused on simultaneously obtaining a large coverage without loosing quality in the translation. This is achieved by hybridising between a grammar-based multilingual translation system, GF, and a specialised statistical machine translation system. Moreover, both individual systems by themselves already represent a step forward in the translation of patents in the biomedical domain, for which the systems have been trained.
15. Deep evaluation of hybrid architectures: simple metrics correlated with human judgments
- Author
-
Labaka, Gorka, Díaz de Ilarraza Sánchez, Arantza, Sarasola Gabiola, Kepa, España Bonet, Cristina, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Traducció automàtica ,Machine translation ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,Rule-based machine translation - Abstract
The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone.
16. Boosting trees for anti-spam email filtering (Extended version)
- Author
-
Carreras Pérez, Xavier, Màrquez Villodre, Lluís, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Informàtica::Aplicacions de la informàtica [Àrees temàtiques de la UPC] ,Unwanted electronic mail messages ,Informàtica [Àrees temàtiques de la UPC] ,Boosting trees ,Anti-spam email filtering - Abstract
In this work, a set of comparative experiments for the problem of automatically filtering unwanted electronic mail messages are performed on two public corpora: PU1 and LingSpam. Several variants of the AdaBoost algorithm with confidence-rated predictions (Schapire et al., 99) have been applied, which differ in the complexity of the base learners considered. Two main conclusions can be drawn from our experiments: a) The boosting--based methods clearly outperform the other learning algorithms results published on the two evaluation corpora, achieving very high levels of the F_1 measure; b) Increasing the complexity of the base learners allows to obtain better high-precision classifiers, which is a very important issue when misclassification costs are considered.
17. A Proposal for wide-coverage Spanish named entity recognition
- Author
-
Arévalo, M., Carreras Pérez, Xavier, Màrquez Villodre, Lluís, Martí Antonin, Maria Antònia, Padró, Lluís, Simon, Maria José, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Informàtica [Àrees temàtiques de la UPC] ,Machine learning ,AdaBoost ,Named entity recognition for Spanish - Abstract
This paper presents a proposal for wide--coverage Named Entity Recognition for Spanish. First, a linguistic description of the typology of Named Entities is proposed. Following this definition an architecture of sequential processes is described for addressing the recognition and classification of strong and weak Named Entities. The former are treated using Machine Learning techniques (AdaBoost) and simple attributes requiring non tagged corpora complemented with external information sources (a list of trigger words and a gazetteer). The latter are approached through a context free grammar for recognizing syntactic patterns. A deep evaluation of the first task on real corpora to validate the appropriateness of the approach is presented. A preliminar version of the context free grammar is qualitatively evaluated with also good results on a small hand--tagged corpus.
18. Anotación semiautomática con papeles temáticos de los corpus CESS-ECE
- Author
-
Martí Antonín, Maria Antònia, Taulé Delor, Mariona, Màrquez Villodre, Lluís, Bertran Ibarz, Manuel, and Universitat de Barcelona
- Subjects
Anotación semántica automática ,Natural language processing (Computer science) ,Corpora (Linguistics) ,Corpus CESS-ECE ,Tractament del llenguatge natural (Informàtica) ,Papeles temáticos ,Corpus (Lingüística) - Abstract
En este artículo se presenta la metodología seguida en el proceso de anotación semántica automática (estructura argumental y papeles temáticos de los predicados verbales) del corpus CESS-ECE-CAT/ESP, así como la evaluación de los resultados obtenidos. A partir de un léxico verbal (1.482 verbos) con información sobre las funciones sintácticas de cada verbo y su proyección temático-argumental, se ha anotado automáticamente el treebank CESS-ECE aplicando un conjunto de reglas simples sobre los árboles sintácticos. Se ha conseguido anotar automáticamente el 60% de los argumentos y papeles temáticos, con un error muy bajo (inferior al 2%). Este índice de calidad elevado permite usar la presente metodología para semiautomatizar el proceso de anotación semántica del corpus, con el consiguiente ahorro en tiempo de anotación manual. Una vez completada la anotación este corpus podrá ser utilizado como fuente de información para los sistemas de anotación automática de papeles temáticos. In this paper we present the methodology followed in the automatic semantic annotation (argument structure and thematic roles of the verbal predicates) of the CESS-ECECAT/ ESP corpus. Building from a verbal lexicon (1,482 entries) with information about the syntactic functions and their projection to arguments and thematic roles, we present a set of simple rules to automatically enrich syntactic trees with semantic information. This procedure permits to automatically annotate 60% of the expected arguments and thematic roles with a fairly low error rate (below 2%). Given the high quality of the obtained results, we claim that this methodology provides substantial savings in manual annotation effort and allows a semiautomatic approach to corpus annotation. Once completed, the CESS-ECE corpus will permit researchers to develop complete systems for automatic Semantic Role Labeling of Catalan and Spanish.
19. SVMTool: a general POS tagger generator based on support vector machines
- Author
-
Giménez Linares, Jesús Ángel, Màrquez Villodre, Lluís, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
POS tagger generator ,SVMTool ,Support Vector Machines ,SVM ,Natural language processing ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,NLP - Abstract
This paper presents the SVMTool , a simple, flexible, effective and efficient part-of-speech tagger based on Support Vector Machines. The SVMTool offers a fairly good balance among these properties which make it really practical for current NLP applications. It is very easy to use and easily configurable so as to perfectly fit the needs of a number of different applications.
20. Robust Part of Speech Tagging
- Author
-
Martínez Garcia, Eva and Màrquez Villodre, Lluís
- Subjects
Natural language processing (Computer science) ,Traducció automàtica ,Tractament del llenguatge natural (Informàtica) ,Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC] ,Machine translating - Abstract
Generally, NLP tools use well-formed and annotated data to learn patterns by using machine learning techniques. However, in this work we will focus on the language used in an on-line platform for machine translation. In this area it is usual to have a framework such the following: a web-page which offer a service of translation between pairs of languages. The problem is that the casual users utilize the service to translate any type of text (cut and paste, single words, bad formatting, snipets, informal language, pre-traductions, etc.). Hence, in this situation we will find very often words with mistakes that make the system provides a bad translation because it is not able to understand the input. The main goal of our work is, once we have identified the problem of dealing with non-standard-input is to develop a robust PoS tagger from the SVMTagger.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.