Author: "Díaz de Ilarraza, Arantza" / Topic: natural language processing - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Díaz de Ilarraza, Arantza"' showing total 6 results

Start Over Author "Díaz de Ilarraza, Arantza" Topic natural language processing

6 results on '"Díaz de Ilarraza, Arantza"'

1. Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification.

Author: Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, and Sarasola K
Subjects: Natural Language Processing, Semantics, Vocabulary
Abstract: Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work., Competing Interests: The authors have declared that no competing interests exist.
Published: 2020
Full Text: View/download PDF

2. EUSKOR: End-to-end coreference resolution system for Basque.

Author: Soraluze A, Arregi O, Arregi X, and Díaz de Ilarraza A
Subjects: Humans, Pattern Recognition, Automated, Semantics, Spain, Language, Natural Language Processing
Abstract: This paper describes the process of adapting the Stanford Coreference resolution module to the Basque language, taking into account the characteristics of the language. The module has been integrated in a linguistic analysis pipeline obtaining an end-to-end coreference resolution system for the Basque language. The adaptation process explained can benefit and facilitate other languages with similar characteristics in the implementation of their coreference resolution systems. During the experimentation phase, we have demonstrated that language-specific features have a noteworthy effect on coreference resolution, obtaining a gain in CoNLL score of 7.07 with respect to the baseline system. We have also analysed the effect that preprocessing has in coreference resolution, comparing the results obtained with automatic mentions versus gold mentions. When gold mentions are provided, the results increase 11.5 points in CoNLL score in comparison with results obtained when automatic mentions are used. The contribution of each sieve is analysed concluding that morphology is essential for agglutinative languages to obtain good performance in coreference resolution. Finally, an error analysis of the coreference resolution system is presented which have revealed our system's weak points and help to determine the improvements of the system. As a result of the error analysis, we have enriched the Basque coreference resolution adding new two sieves, obtaining an improvement of 0.24 points in CoNLL F1 when automatic mentions are used and of 0.39 points when the gold mentions are provided., Competing Interests: The authors have declared that no competing interests exist.
Published: 2019
Full Text: View/download PDF

3. Coreferential Relations in Basque: The Annotation Process.

Author: Ceberio K, Aduriz I, Díaz de Ilarraza A, and Garcia-Azkoaga I
Subjects: Humans, Language, Spain, Linguistics, Natural Language Processing
Abstract: In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution.
Published: 2018
Full Text: View/download PDF

4. Ebaluatoia: crowd evaluation for English-Basque machine translation.

Author: Aranberri, Nora, Labaka, Gorka, Díaz de Ilarraza, Arantza, and Sarasola, Kepa
Subjects: LANGUAGE & languages, ENGLISH language, BASQUE language, LINGUISTICS, NATURAL language processing, PAIRED comparisons (Mathematics)
Abstract: This work explores the feasibility of a crowd-based pair-wise comparison evaluation to get feedback on machine translation progress for under-resourced languages. Specifically, we propose a task based on simple work units to compare the outputs of five English-to-Basque systems, which we implement in a web application. In our design, we put forward two key aspects that we believe community collaboration initiatives should consider in order to attract and maintain participants, that is, providing both a community challenge and a personal challenge. We describe how these aspects can comply with a strict methodology to ensure research validity. In particular, we consider the evaluation set size and the characteristics of the test sentences, the number of evaluators per comparison pair, and a mechanism to identify dishonest participation (or participants with insufficient linguistic knowledge). We also describe our dissemination effort, which targeted both general users and interest groups. Over 500 people participated actively in the Ebaluatoia campaign and we were able to collect over 35,000 evaluations in a short period of 10 days. From the results, we complete the ranking of the systems under evaluation and establish whether the difference in quality between the systems is significant. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

5. Improving mention detection for Basque based on a deep error analysis.

Author: SORALUZE, ANDER, ARREGI, OLATZ, ARREGI, XABIER, and DÍAZ DE ILARRAZA, ARANTZA
Subjects: BASQUE language, ERROR detection (Information theory), NATURAL language processing, COMPUTER network protocols, MATCHING theory
Abstract: This paper presents the improvement process of a mention detector for Basque. The system is rule-based and takes into account the characteristics of mentions in Basque. A classification of error types is proposed based on the errors that occur during mention detection. A deep error analysis distinguishing error types and causes is presented and improvements are proposed. At the final stage, the system obtains an F-measure of 74.57% under the Exact Matching protocol and of 80.57% under Lenient Matching. We also show the performance of the mention detector with gold standard data as input, in order to omit errors caused by the previous stages of linguistic processing. In this scenario, we obtain an F-measure of 85.89% with Strict Matching and of 89.06% with Lenient Matching, i.e., a difference of 11.32 and 8.49 percentage points, respectively. Finally, how improvements in mention detection affect coreference resolution is analysed. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

6. Computer aided classification of diagnostic terms in spanish.

Author: Pérez, Alicia, Gojenola, Koldo, Casillas, Arantza, Oronoz, Maite, and Díaz de Ilarraza, Arantza
Subjects: *COMPUTER-aided design, *MEDICAL records, *SPANISH language, *FINITE state machines, INTERNATIONAL Statistical Classification of Diseases & Related Health Problems
Abstract: The goal of this paper is to classify Medical Records (MRs) by their diagnostic terms (DTs) according to the International Classification of Diseases Clinical Modification (ICD-9-CM). The challenge we face is twofold: (i) to treat the natural and non-standard language in which doctors express their diagnostics and (ii) to perform a large-scale classification problem. We propose the use of Finite-State Transducers (FSTs) that, for their underlying topology, constrain the allowed input DT string while synchronously produce the output ICD-9-CM class. It is outstanding their versatility to efficiently implement soft-matching operations between terms expressed in natural language to standard terms and, hence, to the final ICD-9-CM code. The FSTs were built up from a corpora and standard resources such as the ICD-9-CM and SNOMED CT amongst others. Our corpora count on a big-data comprising more than 20,000 DTs from MRs from the Basque Hospital System so as to model natural language in this domain. An F1-measure of 91.2 was achieved on a test-set of 2850 randomly selected DTs, and a random 5-fold cross validation on a training set served to double-check the stability of the provided results. Real MRs were of much help to adapt the system to natural language. Misspellings, colloquial and specific language and abbreviations made the classification process difficult. The FSTs were proven efficient in this large-scale classification task. Moreover, the composition operation for FSTs made it easy the addition of new features to the system. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Díaz de Ilarraza, Arantza"'

1. Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification.

2. EUSKOR: End-to-end coreference resolution system for Basque.

3. Coreferential Relations in Basque: The Annotation Process.

4. Ebaluatoia: crowd evaluation for English-Basque machine translation.

5. Improving mention detection for Basque based on a deep error analysis.

6. Computer aided classification of diagnostic terms in spanish.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

6 results on '"Díaz de Ilarraza, Arantza"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources