17 results on '"Reut Tsarfaty"'
Search Results
2. Textual geolocation in Hebrew: mapping challenges via natural place description analysis
- Author
-
Tal Bauman, Tzuf Paz-Argaman, Itai Mondshine, Reut Tsarfaty, Itzhak Omer, and Sagi Dalyot
- Subjects
textual geolocation ,geographic information retrieval ,hebrew ,natural language processing ,spatial cognition and reasoning ,Geography (General) ,G1-922 - Abstract
Describing where a place is situated is an innate communication skill that relies on spatial cognition, spatial reasoning, and linguistic systems. Accordingly, textual geolocation, a task for retrieving the coordinates of a place from linguistic descriptions, requires computerized spatial inference and natural language understanding. Yet, machine-based textual geolocation is currently limited, mainly due to the lack of rich geo-textual datasets necessitated to train natural language models that, in-turn, cannot adequately interpret the language-based expressions. These limitations are intensified in morphologically rich and resource-poor languages, such as Hebrew. This study aims to analyze and understand the linguistic systems used for place descriptions in Hebrew, later to be used to train machine learning natural language models. A novel crowdsourced geo-textual dataset is developed, composed of 5,695 written place descriptions provided by 1,554 native Hebrew speakers. All place descriptions rely on memory only, which increases spatial vagueness and requires referring expression resolution. Qualitative linguistic analysis of place descriptions shows that geospatial reasoning is greatly used in Hebrew, while empirical analysis with textual geolocation engines indicates that literal descriptions pose challenges for existing methods, as they require real understanding of space and geospatial references and cannot simply be geolocated by matching gazetteer with textual geo-entity extractions. The findings offer improved understanding of the challenges entailed in natural language processing of Hebrew geolocation, contributing to formalizing computerized systems used in future machine learning models for complex geographic information retrieval tasks.
- Published
- 2024
- Full Text
- View/download PDF
3. 'Um…, It’s Really Difficult to… Um… Speak Fluently': Neural Tracking of Spontaneous Speech
- Author
-
Galit Agmon, Manuela Jaeger, Reut Tsarfaty, Martin G. Bleichner, and Elana Zion Golumbic
- Subjects
Language. Linguistic theory. Comparative grammar ,P101-410 ,Neurophysiology and neuropsychology ,QP351-495 - Published
- 2023
- Full Text
- View/download PDF
4. Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design
- Author
-
Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, and Vera Demberg
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2023
- Full Text
- View/download PDF
5. Morphology Without Borders: Clause-Level Morphology
- Author
-
Omer Goldman and Reut Tsarfaty
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2023
- Full Text
- View/download PDF
6. Draw Me a Flower: Processing and Grounding Abstraction in Natural Language
- Author
-
Royi Lachmy, Valentina Pyatkin, Avshalom Manevich, and Reut Tsarfaty
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2022
- Full Text
- View/download PDF
7. Text-based NP Enrichment
- Author
-
Yanai Elazar, Victoria Basmov, Yoav Goldberg, and Reut Tsarfaty
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2022
- Full Text
- View/download PDF
8. Neural Modeling for Named Entities and Morphology (NEMO2)
- Author
-
Dan Bareket and Reut Tsarfaty
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Abstract
AbstractNamed Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically rich languages (MRLs) pose a challenge to this basic formulation, as the boundaries of named entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need to answer two fundamental questions, namely, what are the basic units to be labeled, and how can these units be detected and classified in realistic settings (i.e., where no gold morphology is available). We empirically investigate these questions on a novel NER benchmark, with parallel token- level and morpheme-level NER annotations, which we develop for Modern Hebrew, a morphologically rich-and-ambiguous language. Our results show that explicitly modeling morphological boundaries leads to improved NER performance, and that a novel hybrid architecture, in which NER precedes and prunes morphological decomposition, greatly outperforms the standard pipeline, where morphological decomposition strictly precedes NER, setting a new performance bar for both Hebrew NER and Hebrew morphological decomposition tasks.
- Published
- 2021
- Full Text
- View/download PDF
9. Design Patterns in Fluid Construction Grammar Luc Steels (editor) Universitat Pompeu Fabra and Sony Computer Science Laboratory, Paris Amsterdam: John Benjamins Publishing Company (Constructional Approaches to Language series, edited by Mirjam Fried and Jan-Ola Östman, volume 11), 2012, xi+332 pp; hardbound, ISBN 978-90-272-0433-2, €99.00, $149.00
- Author
-
Nathan Schneider and Reut Tsarfaty
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2021
- Full Text
- View/download PDF
10. Parsing Morphologically Rich Languages: Introduction to the Special Issue
- Author
-
Reut Tsarfaty, Djamé Seddah, Sandra Kübler, and Joakim Nivre
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2021
- Full Text
- View/download PDF
11. Neural Modeling for Named Entities and Morphology (NEMO^2)
- Author
-
Dan Bareket and Reut Tsarfaty
- Subjects
Space (punctuation) ,FOS: Computer and information sciences ,Linguistics and Language ,Computer science ,computer.software_genre ,Security token ,Named-entity recognition ,Artificial Intelligence ,Sequence ,Computer Science - Computation and Language ,business.industry ,Hebrew ,Communication ,Pipeline (software) ,language.human_language ,Computer Science Applications ,Human-Computer Interaction ,Task (computing) ,ComputingMethodologies_PATTERNRECOGNITION ,language ,Benchmark (computing) ,Artificial intelligence ,business ,computer ,Computation and Language (cs.CL) ,Natural language processing - Abstract
Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically-Rich Languages (MRLs) pose a challenge to this basic formulation, as the boundaries of Named Entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need to answer two fundamental questions, namely, what are the basic units to be labeled, and how can these units be detected and classified in realistic settings, i.e., where no gold morphology is available. We empirically investigate these questions on a novel NER benchmark, with parallel tokenlevel and morpheme-level NER annotations, which we develop for Modern Hebrew, a morphologically rich-and-ambiguous language. Our results show that explicitly modeling morphological boundaries leads to improved NER performance, and that a novel hybrid architecture, in which NER precedes and prunes morphological decomposition, greatly outperforms the standard pipeline, where morphological decomposition strictly precedes NER, setting a new performance bar for both Hebrew NER and Hebrew morphological decomposition tasks., Accepted to TACL. This is a pre-MIT Press publication version
- Published
- 2020
12. RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation
- Author
-
Tzuf Paz-Argaman and Reut Tsarfaty
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer science ,business.industry ,02 engineering and technology ,Machine learning ,computer.software_genre ,Task (project management) ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Action (philosophy) ,11. Sustainability ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Representation (mathematics) ,business ,Baseline (configuration management) ,Computation and Language (cs.CL) ,computer ,Natural language ,Abstraction (linguistics) - Abstract
Following navigation instructions in natural language requires a composition of language, action, and knowledge of the environment. Knowledge of the environment may be provided via visual sensors or as a symbolic world representation referred to as a map. Here we introduce the Realistic Urban Navigation (RUN) task, aimed at interpreting navigation instructions based on a real, dense, urban map. Using Amazon Mechanical Turk, we collected a dataset of 2515 instructions aligned with actual routes over three regions of Manhattan. We propose a strong baseline for the task and empirically investigate which aspects of the neural architecture are important for the RUN success. Our results empirically show that entity abstraction, attention over words and worlds, and a constantly updating world-state, significantly contribute to task accuracy., accepted to appear at the EMNLP 2019 conference
- Published
- 2019
13. Parsing Morphologically Rich Languages: Introduction to the Special Issue
- Author
-
Reut Tsarfaty, Djamé Seddah, Sandra Kübler, Joakim Nivre, Uppsala University, Analyse Linguistique Profonde à Grande Echelle, Large-scale deep linguistic processing (ALPAGE), Université Paris Diderot - Paris 7 (UPD7)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université Paris-Sorbonne (UP4), Indiana University [Bloomington], Indiana University System, ANR-08-EMER-0013,SEQUOIA,Analyse syntaxique probabiliste à large couverture du français(2008), Inria Paris-Rocquencourt, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7)
- Subjects
Linguistics and Language ,Machine translation ,Computer science ,02 engineering and technology ,computer.software_genre ,Language and Linguistics ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Parsing ,business.industry ,05 social sciences ,Sentiment analysis ,050301 education ,Automatic summarization ,Linguistics ,Computer Science Applications ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,13. Climate action ,020201 artificial intelligence & image processing ,Artificial intelligence ,Computational linguistics ,business ,0503 education ,computer ,Natural language processing ,Sentence ,Word order ,Bottom-up parsing - Abstract
International audience; Parsing is a key task in natural language processing. It involves predicting, for each natural language sentence, an abstract representation of the grammatical entities in the sentence and the relations between these entities. This representation provides an interface to compositional semantics and to the notions of "who did what to whom." The last two decades have seen great advances in parsing English, leading to major leaps also in the performance of applications that use parsers as part of their backbone, such as systems for information extraction, sentiment analysis, text summarization, and machine translation. Attempts to replicate the success of parsing English for other languages have often yielded unsatisfactory results. In particular, parsing languages with complex word structure and flexible word order has been shown to require non-trivial adaptation. This special issue reports on methods that successfully address the challenges involved in parsing a range of morphologically rich languages (MRLs). This introduction characterizes MRLs, describes the challenges in parsing MRLs, and outlines the contributions of the articles in the special issue. These contributions present up-to-date research efforts that address parsing in varied, cross-lingual settings. They show that parsing MRLs addresses challenges that transcend particular representational and algorithmic choices.
- Published
- 2013
- Full Text
- View/download PDF
14. pyBART: Evidence-based Syntactic Transformations for IE
- Author
-
Yoav Goldberg, Reut Tsarfaty, and Aryeh Tiktinsky
- Subjects
FOS: Computer and information sciences ,Evidence-based practice ,Computer Science - Computation and Language ,Computer science ,business.industry ,02 engineering and technology ,Python (programming language) ,computer.software_genre ,Syntax ,Relationship extraction ,03 medical and health sciences ,Information extraction ,0302 clinical medicine ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Computation and Language (cs.CL) ,Natural language processing ,computer.programming_language - Abstract
Syntactic dependencies can be predicted with high accuracy, and are useful for both machine-learned and pattern-based information extraction tasks. However, their utility can be improved. These syntactic dependencies are designed to accurately reflect syntactic relations, and they do not make semantic relations explicit. Therefore, these representations lack many explicit connections between content words, that would be useful for downstream applications. Proposals like English Enhanced UD improve the situation by extending universal dependency trees with additional explicit arcs. However, they are not available to Python users, and are also limited in coverage. We introduce a broad-coverage, data-driven and linguistically sound set of transformations, that makes event-structure and many lexical relations explicit. We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation. The library can work as a standalone package or be integrated within a spaCy NLP pipeline. When evaluated in a pattern-based relation extraction scenario, our representation results in higher extraction scores than Enhanced UD, while requiring fewer patterns., Comment: Accepted ACL2020 system demonstration paper
- Full Text
- View/download PDF
15. ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization
- Author
-
Gal Chechik, Tzuf Paz-Argaman, Yuval Atzmon, and Reut Tsarfaty
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Zest ,business.industry ,Computer science ,02 engineering and technology ,010501 environmental sciences ,Zero shot learning ,computer.software_genre ,01 natural sciences ,Automatic summarization ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Natural language processing ,0105 earth and related environmental sciences - Abstract
We study the problem of recognizing visual entities from the textual descriptions of their classes. Specifically, given birds' images with free-text descriptions of their species, we learn to classify images of previously-unseen species based on specie descriptions. This setup has been studied in the vision community under the name zero-shot learning from text, focusing on learning to transfer knowledge about visual aspects of birds from seen classes to previously-unseen ones. Here, we suggest focusing on the textual description and distilling from the description the most relevant information to effectively match visual features to the parts of the text that discuss them. Specifically, (1) we propose to leverage the similarity between species, reflected in the similarity between text descriptions of the species. (2) we derive visual summaries of the texts, i.e., extractive summaries that focus on the visual features that tend to be reflected in images. We propose a simple attention-based model augmented with the similarity and visual summaries components. Our empirical results consistently and significantly outperform the state-of-the-art on the largest benchmarks for text-based zero-shot learning, illustrating the critical importance of texts for zero-shot image-recognition., Comment: 11 pages, Findings of EMNLP 2020
- Full Text
- View/download PDF
16. Evaluating Models’ Local Decision Boundaries via Contrast Sets
- Author
-
Qiang Ning, Ben Bogin, Sihao Chen, Hannaneh Hajishirzi, Ben Zhou, Eric Wallace, Phoebe Mulcaire, Dheeru Dua, Kevin Lin, Ananth Gottumukkala, Jonathan Berant, Sanjay Subramanian, Ally Zhang, Victoria Basmov, Noah A. Smith, Pradeep Dasigi, Nitish Gupta, Jiangming Liu, Daniel Khashabi, Matt Gardner, Sameer Singh, Gabriel Ilharco, Reut Tsarfaty, Yoav Artzi, Nelson F. Liu, and Yanai Elazar
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Parsing ,Computer science ,business.industry ,Supervised learning ,Sentiment analysis ,Contrast (statistics) ,02 engineering and technology ,Decision rule ,computer.software_genre ,Machine learning ,020204 information systems ,Test set ,0202 electrical engineering, electronic engineering, information engineering ,Decision boundary ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Computation and Language (cs.CL) ,Test data - Abstract
Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture the abilities a dataset is intended to test. We propose a more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data. In particular, after a dataset is constructed, we recommend that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Contrast sets provide a local view of a model’s decision boundary, which can be used to more accurately evaluate a model’s true linguistic capabilities. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets (e.g., DROP reading comprehension, UD parsing, and IMDb sentiment analysis). Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets—up to 25% in some cases. We release our contrast sets as new evaluation benchmarks and encourage future dataset construction efforts to follow similar annotation processes.
- Full Text
- View/download PDF
17. Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages
- Author
-
Seddah, Djamé, Analyse Linguistique Profonde à Grande Echelle, Large-scale deep linguistic processing (ALPAGE), Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7), Université Paris-Sorbonne (UP4), Marianna Apidianaki and Ido Dagan and Jennifer Foster and Yuval Marton and Djamé Seddah and Reut Tsarfaty, Université Paris Diderot - Paris 7 (UPD7)-Inria Paris-Rocquencourt, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.