Author: "Haffari, Gholamreza" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

1. Generate, Annotate, and Learn: NLP with Synthetic Text

Author: He, Xuanli, Nassar, Islam, Kiros, Jamie, Haffari, Gholamreza, and Norouzi, Mohammad
Subjects: FOS: Computer and information sciences, Human-Computer Interaction, Computer Science - Machine Learning, Linguistics and Language, ComputingMethodologies_PATTERNRECOGNITION, Artificial Intelligence, Communication, Machine Learning (cs.LG), Computer Science Applications
Abstract: This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We formulate a general framework called ``generate, annotate, and learn (GAL)'' to take advantage of synthetic text within knowledge distillation, self-training, and few-shot learning applications. To generate high-quality task-specific text, we either fine-tune LMs on inputs from the task of interest, or prompt large LMs with few examples. We use the best available classifier to annotate synthetic text with soft pseudo labels for knowledge distillation and self-training, and use LMs to obtain hard labels for few-shot learning. We train new supervised models on the combination of labeled and pseudo-labeled data, which results in significant gains across several applications. We investigate key components of GAL and present theoretical and empirical arguments against the use of class-conditional LMs to generate synthetic labeled text instead of unlabeled text. GAL achieves new state-of-the-art knowledge distillation results for 6-layer transformers on the GLUE leaderboard., Comment: accepted to TACL2022
Published: 2022

2. Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation

Author: Wu, Minghao, Foster, George, Qu, Lizhen, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Existing work in document-level neural machine translation commonly concatenates several consecutive sentences as a pseudo-document, and then learns inter-sentential dependencies. This strategy limits the model's ability to leverage information from distant context. We overcome this limitation with a novel Document Flattening (DocFlat) technique that integrates Flat-Batch Attention (FBA) and Neural Context Gate (NCG) into Transformer model to utilize information beyond the pseudo-document boundaries. FBA allows the model to attend to all the positions in the batch and learns the relationships between positions explicitly and NCG identifies the useful information from the distant context. We conduct comprehensive experiments and analyses on three benchmark datasets for English-German translation, and validate the effectiveness of two variants of DocFlat. Empirical results show that our approach outperforms strong baselines with statistical significance on BLEU, COMET and accuracy on the contrastive test set. The analyses highlight that DocFlat is highly effective in capturing the long-range information., Comment: 15 pages, 8 figures, accepted by EACL 2023
Published: 2023
Full Text: View/download PDF

3. ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

Author: Nassar, Islam, Hayat, Munawar, Abbasnejad, Ehsan, Rezatofighi, Hamid, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Confidence-based pseudo-labeling is among the dominant approaches in semi-supervised learning (SSL). It relies on including high-confidence predictions made on unlabeled data as additional targets to train the model. We propose ProtoCon, a novel SSL method aimed at the less-explored label-scarce SSL where such methods usually underperform. ProtoCon refines the pseudo-labels by leveraging their nearest neighbours' information. The neighbours are identified as the training proceeds using an online clustering approach operating in an embedding space trained via a prototypical loss to encourage well-formed clusters. The online nature of ProtoCon allows it to utilise the label history of the entire dataset in one training cycle to refine labels in the following cycle without the need to store image embeddings. Hence, it can seamlessly scale to larger datasets at a low cost. Finally, ProtoCon addresses the poor training signal in the initial phase of training (due to fewer confident predictions) by introducing an auxiliary self-supervised loss. It delivers significant gains and faster convergence over state-of-the-art across 5 datasets, including CIFARs, ImageNet and DomainNet., Comment: Accepted in CVPR2023 (highlight)
Published: 2023
Full Text: View/download PDF

4. Investigating Pre-trained Audio Encoders in the Low-Resource Condition

Author: Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, and Shareghi, Ehsan
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Pre-trained speech encoders have been central to pushing state-of-the-art results across various speech understanding and generation tasks. Nonetheless, the capabilities of these encoders in low-resource settings are yet to be thoroughly explored. To address this, we conduct a comprehensive set of experiments using a representative set of 3 state-of-the-art encoders (Wav2vec2, WavLM, Whisper) in the low-resource setting across 7 speech understanding and generation tasks. We provide various quantitative and qualitative analyses on task performance, convergence speed, and representational properties of the encoders. We observe a connection between the pre-training protocols of these encoders and the way in which they capture information in their internal layers. In particular, we observe the Whisper encoder exhibits the greatest low-resource capabilities on content-driven tasks in terms of performance and convergence speed., Comment: INTERSPEECH 2023
Published: 2023
Full Text: View/download PDF

5. A Minimal Approach for Natural Language Action Space in Text-based Games

Author: Ryu, Dongwon Kelvin, Fang, Meng, Pan, Shirui, Haffari, Gholamreza, and Shareghi, Ehsan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Text-based games (TGs) are language-based interactive environments for reinforcement learning. While language models (LMs) and knowledge graphs (KGs) are commonly used for handling large action space in TGs, it is unclear whether these techniques are necessary or overused. In this paper, we revisit the challenge of exploring the action space in TGs and propose $ \epsilon$-admissible exploration, a minimal approach of utilizing admissible actions, for training phase. Additionally, we present a text-based actor-critic (TAC) agent that produces textual commands for game, solely from game observations, without requiring any KG or LM. Our method, on average across 10 games from Jericho, outperforms strong baselines and state-of-the-art agents that use LM and KG. Our approach highlights that a much lighter model design, with a fresh perspective on utilizing the information within the environments, suffices for an effective exploration of exponentially large action spaces.
Published: 2023
Full Text: View/download PDF

6. Turning Flowchart into Dialog: Plan-based Data Augmentation for Low-Resource Flowchart-grounded Troubleshooting Dialogs

Author: Zhan, Haolan, Maruf, Sameen, Qu, Lizhen, Wang, Yufei, Zukerman, Ingrid, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Flowchart-grounded troubleshooting dialogue (FTD) systems, which follow the instructions of a flowchart to diagnose users' problems in specific domains (eg., vehicle, laptop), have been gaining research interest in recent years. However, collecting sufficient dialogues that are naturally grounded on flowcharts is costly, thus FTD systems are impeded by scarce training data. To mitigate the data sparsity issue, we propose a plan-based data augmentation (PlanDA) approach that generates diverse synthetic dialog data at scale by transforming concise flowchart into dialogues. Specifically, its generative model employs a variational-base framework with a hierarchical planning strategy that includes global and local latent planning variables. Experiments on the FloDial dataset show that synthetic dialogue produced by PlanDA improves the performance of downstream tasks, including flowchart path retrieval and response generation, in particular on the Out-of-Flowchart settings. In addition, further analysis demonstrate the quality of synthetic data generated by PlanDA in paths that are covered by current sample dialogues and paths that are not covered.
Published: 2023
Full Text: View/download PDF

7. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

Author: Zhuo, Terry Yue, Li, Zhuang, Huang, Yujin, Shiri, Fatemeh, Wang, Weiqing, Haffari, Gholamreza, and Li, Yuan-Fang
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-tuned neural semantic parsers are susceptible to adversarial attacks on natural-language inputs. While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios, as it requires both substantial computational resources and expensive human annotation on in-domain semantic parsing data. This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples. To address this challenge, we propose methods for improving robustness without the need for significant amounts of labeled data or heavy computational resources., Comment: Accepted at EACL2023 (main)
Published: 2023
Full Text: View/download PDF

8. The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning

Author: Li, Zhuang, Qu, Lizhen, Cohen, Philip R., Tumuluri, Raj V., and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias. In this work, we propose an active learning approach that exploits the strengths of both human and machine translations by iteratively adding small batches of human translations into the machine-translated training set. Besides, we propose novel aggregated acquisition criteria that help our active learning method select utterances to be manually translated. Our experiments demonstrate that an ideal utterance selection can significantly reduce the error and bias in the translated data, resulting in higher parser accuracies than the parsers merely trained on the machine-translated data., Comment: ACL 2023
Published: 2023
Full Text: View/download PDF

9. Active Learning for Multilingual Semantic Parser

Author: Li, Zhuang and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Current multilingual semantic parsing (MSP) datasets are almost all collected by translating the utterances in the existing datasets from the resource-rich language to the target language. However, manual translation is costly. To reduce the translation effort, this paper proposes the first active learning procedure for MSP (AL-MSP). AL-MSP selects only a subset from the existing datasets to be translated. We also propose a novel selection method that prioritizes the examples diversifying the logical form structures with more lexical choices, and a novel hyperparameter tuning method that needs no extra annotation cost. Our experiments show that AL-MSP significantly reduces translation costs with ideal selection methods. Our selection method with proper hyperparameters yields better parsing performance than the other baselines on two multilingual datasets., Comment: EACL 2023 (findings), updated Fig. 1
Published: 2023
Full Text: View/download PDF

10. RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise

Author: Zhao, Jinming, Yang, Hao, Haffari, Gholamreza, and Shareghi, Ehsan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C., Comment: EMNLP 2022 Finding
Published: 2022
Full Text: View/download PDF

11. LAVA: Label-efficient Visual Learning and Adaptation

Author: Nassar, Islam, Hayat, Munawar, Abbasnejad, Ehsan, Rezatofighi, Hamid, Harandi, Mehrtash, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: We present LAVA, a simple yet effective method for multi-domain visual transfer learning with limited data. LAVA builds on a few recent innovations to enable adapting to partially labelled datasets with class and domain shifts. First, LAVA learns self-supervised visual representations on the source dataset and ground them using class label semantics to overcome transfer collapse problems associated with supervised pretraining. Secondly, LAVA maximises the gains from unlabelled target data via a novel method which uses multi-crop augmentations to obtain highly robust pseudo-labels. By combining these ingredients, LAVA achieves a new state-of-the-art on ImageNet semi-supervised protocol, as well as on 7 out of 10 datasets in multi-domain few-shot learning on the Meta-dataset. Code and models are made available., Comment: Accepted in WACV2023
Published: 2022
Full Text: View/download PDF

12. Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Author: Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, and Shareghi, Ehsan
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve state-of-the-art. In text domain this has been partly attributed to sub-optimality of the representation space in pre-trained Transformers. In this work, we take a sober look into pre-trained speech encoders and rewire their representation space without requiring any task-specific labels. Our method utilises neutrally synthesised version of audio inputs along with frame masking to construct positive pairs for contrastive self-supervised learning. When used for augmenting the wav2vec 2 encoder, we observe consistent improvement of isotropy in the representation space. Our experiments on 6 speech processing tasks, exhibit a significant convergence speedup during task fine-tuning as well as consistent task improvement, specially in low-resource settings., Comment: 8 pages, 3 figures
Published: 2022
Full Text: View/download PDF

13. An Additive Instance-Wise Approach to Multi-class Model Interpretation

Author: Vo, Vy, Nguyen, Van, Le, Trung, Tran, Quan Hung, Haffari, Gholamreza, Camtepe, Seyit, and Phung, Dinh
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.
Published: 2022
Full Text: View/download PDF

14. Learning Object-Language Alignments for Open-Vocabulary Object Detection

Author: Lin, Chuang, Sun, Peize, Jiang, Yi, Luo, Ping, Qu, Lizhen, Haffari, Gholamreza, Yuan, Zehuan, and Cai, Jianfei
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing object detection methods are bounded in a fixed-set vocabulary by costly labeled data. When dealing with novel categories, the model has to be retrained with more bounding box annotations. Natural language supervision is an attractive alternative for its annotation-free attributes and broader object concepts. However, learning open-vocabulary object detection from language is challenging since image-text pairs do not contain fine-grained object-language alignments. Previous solutions rely on either expensive grounding annotations or distilling classification-oriented vision models. In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data. We formulate object-language alignment as a set matching problem between a set of image region features and a set of word embeddings. It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way. Extensive experiments on two benchmark datasets, COCO and LVIS, demonstrate our superior performance over the competing approaches on novel categories, e.g. achieving 32.0% mAP on COCO and 21.7% mask mAP on LVIS. Code is available at: https://github.com/clin1223/VLDet., Comment: Technical Report
Published: 2022
Full Text: View/download PDF

15. Complex Reading Comprehension Through Question Decomposition

Author: Guo, Xiao-Yu, Li, Yuan-Fang, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Multi-hop reading comprehension requires not only the ability to reason over raw text but also the ability to combine multiple evidence. We propose a novel learning approach that helps language models better understand difficult multi-hop questions and perform "complex, compositional" reasoning. Our model first learns to decompose each multi-hop question into several sub-questions by a trainable question decomposer. Instead of answering these sub-questions, we directly concatenate them with the original question and context, and leverage a reading comprehension model to predict the answer in a sequence-to-sequence manner. By using the same language model for these two components, our best seperate/unified t5-base variants outperform the baseline by 7.2/6.1 absolute F1 points on a hard subset of DROP dataset., Comment: 10 pages, 1 figure, accepted at ALTA 2022
Published: 2022
Full Text: View/download PDF

16. It Is Not As Good As You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Author: Zhao, Jinming, Arthur, Philip, Haffari, Gholamreza, Cohn, Trevor, and Shareghi, Ehsan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora. We argue that SiMT systems should be trained and tested on real interpretation data. To illustrate this argument, we propose an interpretation test set and conduct a realistic evaluation of SiMT trained on offline translations. Our results, on our test set along with 3 existing smaller scale language pairs, highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data. In the absence of interpretation training data, we propose a translation-to-interpretation (T2I) style transfer method which allows converting existing offline translations into interpretation-style data, leading to up-to 2.8 BLEU improvement. However, the evaluation gap remains notable, calling for constructing large-scale interpretation corpora better suited for evaluating and developing SiMT systems., Comment: EMNLP2021
Published: 2021

17. Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs

Author: Xu, Qiongkai, He, Xuanli, Lyu, Lingjuan, Qu, Lizhen, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Computation and Language, Cryptography and Security (cs.CR), Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Machine-learning-as-a-service (MLaaS) has attracted millions of users to their splendid large-scale models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrated that attackers manage to steal or extract the victim models. Nonetheless, none of the previous stolen models can outperform the original black-box APIs. In this work, we conduct unsupervised domain adaptation and multi-victim ensemble to showing that attackers could potentially surpass victims, which is beyond previous understanding of model extraction. Extensive experiments on both benchmark datasets and real-world APIs validate that the imitators can succeed in outperforming the original black-box models on transferred domains. We consider our work as a milestone in the research of imitation attack, especially on NLP APIs, as the superior performance could influence the defense or even publishing strategy of API providers., Comment: COLING 2022 (oral)
Published: 2021
Full Text: View/download PDF

18. Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?

Author: Saleh, Fahimeh, Buntine, Wray, Haffari, Gholamreza, and Du, Lan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages, rather than training separate models for different languages. Learning a single model can enhance the low-resource translation by leveraging data from multiple languages. However, the performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer. In this paper, we propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer. HKD generates a set of multilingual teacher-assistant models via a selective knowledge distillation mechanism based on the language groups, and then distils the ultimate multilingual model from those assistants in an adaptive way. Experimental results derived from the TED dataset with 53 languages demonstrate the effectiveness of our approach in avoiding the negative transfer effect in MNMT, leading to an improved translation performance (about 1 BLEU score on average) compared to strong baselines.
Published: 2021
Full Text: View/download PDF

19. Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

Author: Lin, Chuang, Jiang, Yi, Cai, Jianfei, Qu, Lizhen, Haffari, Gholamreza, and Yuan, Zehuan
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving. Recent Transformer-based VLN methods have made great progress benefiting from the direct connections between visual observations and the language instruction via the multimodal cross-attention mechanism. However, these methods usually represent temporal context as a fixed-length vector by using an LSTM decoder or using manually designed hidden states to build a recurrent Transformer. Considering a single fixed-length vector is often insufficient to capture long-term temporal context, in this paper, we introduce Multimodal Transformer with Variable-length Memory (MTVM) for visually-grounded natural language navigation by modelling the temporal context explicitly. Specifically, MTVM enables the agent to keep track of the navigation trajectory by directly storing previous activations in a memory bank. To further boost the performance, we propose a memory-aware consistency loss to help learn a better joint representation of temporal context with random masked instructions. We evaluate MTVM on popular R2R and CVDN datasets, and our model improves Success Rate on R2R unseen validation and test set by 2% each, and reduce Goal Process by 1.6m on CVDN test set., Comment: ECCV 2022
Published: 2021
Full Text: View/download PDF

20. Learning to Multi-Task Learn for Better Neural Machine Translation

Author: Zaremoodi, Poorya and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Scarcity of parallel sentence pairs is a major challenge for training high quality neural machine translation (NMT) models in bilingually low-resource scenarios, as NMT is data-hungry. Multi-task learning is an elegant approach to inject linguistic-related inductive biases into NMT, using auxiliary syntactic and semantic tasks, to improve generalisation. The challenge, however, is to devise effective training schedules, prescribing when to make use of the auxiliary tasks during the training process to fill the knowledge gaps of the main translation task, a setting referred to as biased-MTL. Current approaches for the training schedule are based on hand-engineering heuristics, whose effectiveness vary in different MTL settings. We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the MTL setting of interest. We formulate the training schedule as a Markov decision process which paves the way to employ policy learning methods to learn the scheduling policy. We effectively and efficiently learn the training schedule policy within the imitation learning framework using an oracle policy algorithm that dynamically sets the importance weights of auxiliary tasks based on their contributions to the generalisability of the main NMT task. Experiments on low-resource NMT settings show the resulting automatically learned training schedulers are competitive with the best heuristics, and lead to up to +1.1 BLEU score improvements.
Published: 2020
Full Text: View/download PDF

21. Domain Adaptative Causality Encoder

Author: Moghimifar, Farhad, Haffari, Gholamreza, and Baktashmotlagh, Mahsa
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Current approaches which are mainly based on the extraction of low-level relations among individual events are limited by the shortage of publicly available labelled data. Therefore, the resulting models perform poorly when applied to a distributionally different domain for which labelled data did not exist at the time of training. To overcome this limitation, in this paper, we leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation. The term adaptive is used since the training and test data come from two distributionally different datasets, which to the best of our knowledge, this work is the first to address. Moreover, we present a new causality dataset, namely MedCaus, which integrates all types of causality in the text. Our experiments on four different benchmark causality datasets demonstrate the superiority of our approach over the existing baselines, by up to 7% improvement, on the tasks of identification and localisation of the causal relations from the text., Comment: ALTA2020
Published: 2020
Full Text: View/download PDF

22. Question Generation from Paragraphs: A Tale of Two Hierarchical Models

Author: Kumar, Vishwajeet, Chaki, Raktim, Talluri, Sai Teja, Ramakrishnan, Ganesh, Li, Yuan-Fang, and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Automatic question generation from paragraphs is an important and challenging problem, particularly due to the long context from paragraphs. In this paper, we propose and study two hierarchical models for the task of question generation from paragraphs. Specifically, we propose (a) a novel hierarchical BiLSTM model with selective attention and (b) a novel hierarchical Transformer architecture, both of which learn hierarchical representations of paragraphs. We model a paragraph in terms of its constituent sentences, and a sentence in terms of its constituent words. While the introduction of the attention mechanism benefits the hierarchical BiLSTM model, the hierarchical Transformer, with its inherent attention and positional encoding mechanisms also performs better than flat transformer model. We conducted empirical evaluation on the widely used SQuAD and MS MARCO datasets using standard metrics. The results demonstrate the overall effectiveness of the hierarchical models over their flat counterparts. Qualitatively, our hierarchical models are able to generate fluent and relevant questions
Published: 2019
Full Text: View/download PDF

23. HetFHMM: A novel approach to infer tumor heterogeneity using factorial Hidden Markov model

Author: Haffari, Gholamreza, Cai, Zhaoxiang, Rahman, Mohammad S., and Nicholson, Ann E.
Subjects: Genomics (q-bio.GN), FOS: Biological sciences, Quantitative Biology - Genomics
Abstract: Cancer arises from successive rounds of mutations which generate tumor cells with different genomic variation i.e. clones. For drug responsiveness and therapeutics, it is necessary to identify the clones in tumor sample accurately. Many methods are developed to infer tumor heterogeneity by either computing cellular prevalence and tumor phylogeny or predicting genotype of mutations. All methods suffer some problems e.g. inaccurate computation of clonal frequencies, discarding clone specific genotypes etc. In the paper, we propose a method, called- HetFHMM to infer tumor heterogeneity by predicting clone specific genotypes and cellular prevalence. To infer clone specific genotype, we consider the presence of multiple mutations at any genomic location. We also tested our model on different simulated data. The results shows that HetFHMM outperforms recent methods which infer tumor heterogeneity. Therefore, HetFHMM is a novel approach in tumor heterogeneity research area., Comment: 9 pages
Published: 2015
Full Text: View/download PDF

24. Novel Bernstein-like Concentration Inequalities for the Missing Mass

Author: Khanloo, Bahman Yari Saeed and Haffari, Gholamreza
Subjects: FOS: Computer and information sciences, Statistics - Machine Learning, Machine Learning (stat.ML)
Abstract: We are concerned with obtaining novel concentration inequalities for the missing mass, i.e. the total probability mass of the outcomes not observed in the sample. We not only derive - for the first time - distribution-free Bernstein-like deviation bounds with sublinear exponents in deviation size for missing mass, but also improve the results of McAllester and Ortiz (2003) andBerend and Kontorovich (2013, 2012) for small deviations which is the most interesting case in learning theory. It is known that the majority of standard inequalities cannot be directly used to analyze heterogeneous sums i.e. sums whose terms have large difference in magnitude. Our generic and intuitive approach shows that the heterogeneity issue introduced in McAllester and Ortiz (2003) is resolvable at least in the case of missing mass via regulating the terms using our novel thresholding technique., Comment: arXiv admin note: text overlap with arXiv:1402.6262. Appears in 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015
Published: 2015
Full Text: View/download PDF

25. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

Author: Curtis, Christina, Shah, Sohrab P, Chin, Suet-Feung, Turashvili, Gulisa, Rueda, Oscar M, Dunning, Mark J, Speed, Doug, Lynch, Andy G, Samarajiwa, Shamith, Yuan, Yinyin, Gräf, Stefan, Ha, Gavin, Haffari, Gholamreza, Bashashati, Ali, Russell, Roslin, McKinney, Steven, METABRIC Group, Langerød, Anita, Green, Andrew, Provenzano, Elena, Wishart, Gordon, Pinder, Sarah, Watson, Peter, Markowetz, Florian, Murphy, Leigh, Ellis, Ian, Purushotham, Arnie, Børresen-Dale, Anne-Lise, Brenton, James D, Tavaré, Simon, Caldas, Carlos, Aparicio, Samuel, Chin, Suet-Feung [0000-0001-5697-1082], Rueda Palacio, Oscar [0000-0003-0008-4884], Dunning, Mark [0000-0002-8853-9435], Lynch, Andy [0000-0002-7876-7338], Samarajiwa, Shamith [0000-0003-1046-0601], Graf, Stefan [0000-0002-1315-8873], Markowetz, Florian [0000-0002-2784-5308], Brenton, James [0000-0002-5738-6683], and Apollo - University of Cambridge Repository
Subjects: DNA Copy Number Variations, Genome, Human, MAP Kinase Kinase 4, Gene Expression Profiling, Breast Neoplasms, Genomics, Kaplan-Meier Estimate, Prognosis, Polymorphism, Single Nucleotide, Gene Expression Regulation, Neoplastic, Treatment Outcome, Humans, Female, Gene Regulatory Networks, Protein Phosphatase 2, Genes, Neoplasm
Abstract: The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
Published: 2012

26. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

Author: Curtis, Christina, Shah, Sohrab P, Chin, Suet-Feung, Turashvili, Gulisa, Rueda, Oscar M, Dunning, Mark J, Speed, Doug, Lynch, Andy G, Samarajiwa, Shamith, Yuan, Yinyin, Gräf, Stefan, Ha, Gavin, Haffari, Gholamreza, Bashashati, Ali, Russell, Roslin, McKinney, Steven, METABRIC Group, Langerød, Anita, Green, Andrew, Provenzano, Elena, Wishart, Gordon, Pinder, Sarah, Watson, Peter, Markowetz, Florian, Murphy, Leigh, Ellis, Ian, Purushotham, Arnie, Børresen-Dale, Anne-Lise, Brenton, James D, Tavaré, Simon, Caldas, Carlos, and Aparicio, Samuel
Subjects: DNA Copy Number Variations, Genome, Human, MAP Kinase Kinase 4, Gene Expression Profiling, Breast Neoplasms, Genomics, Kaplan-Meier Estimate, Prognosis, Polymorphism, Single Nucleotide, 3. Good health, Gene Expression Regulation, Neoplastic, Treatment Outcome, Humans, Female, Gene Regulatory Networks, Protein Phosphatase 2, Genes, Neoplasm
Abstract: The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.

27. HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology

Author: Shrestha, Raunak, Hodzic, Ermin, Sauerwald, Thomas, Dao, Phuong, Wang, Kendric, Yeung, Jake, Anderson, Shawn, Vandin, Fabio, Haffari, Gholamreza, Collins, Colin C, and Sahinalp, S Cenk
Subjects: DNA Copy Number Variations, Mutation, Computational Biology, Humans, Breast Neoplasms, Female, Genomics, Protein Interaction Maps, Transcriptome, Software, 3. Good health
Abstract: Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology.

28. Exploring diversity in back translation for low-resource machine translation

Author: Burchell, Laurie, Birch-Mayne, Alexandra, Heafield, Kenneth, Cherry, Colin, Fan, Angela, Foster, George, Haffari, Gholamreza (Reza), Khadivi, Shahram, Peng, Nanyun (Violet), Ren, Xiang, Shareghi, Ehsan, and Swayamdipta, Swabha
Abstract: Back translation is one of the most widely used methods for improving the performance of neural machine translation systems. Recent research has sought to enhance the effectiveness of this method by increasing the ‘diversity’ of the generated translations. We argue that the definitions and metrics used to quantify ‘diversity’ in previous work have been insufficient. This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity. We present novel metrics for measuring these different aspects of diversity and carry out empirical analysis into the effect of these types of diversity on final neural machine translation model performance for low-resource English↔Turkish and mid-resource English↔Icelandic. Our findings show that generating back translation using nucleus sampling results in higher final model performance, and that this method of generation has high levels of both lexical and syntactic diversity. We also find evidence that lexical diversity is more important than syntactic for back translation performance.
Published: 2022

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

28 results on '"Haffari, Gholamreza"'

1. Generate, Annotate, and Learn: NLP with Synthetic Text

2. Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation

3. ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

4. Investigating Pre-trained Audio Encoders in the Low-Resource Condition

5. A Minimal Approach for Natural Language Action Space in Text-based Games

6. Turning Flowchart into Dialog: Plan-based Data Augmentation for Low-Resource Flowchart-grounded Troubleshooting Dialogs

7. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

8. The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning

9. Active Learning for Multilingual Semantic Parser

10. RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise

11. LAVA: Label-efficient Visual Learning and Adaptation

12. Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

13. An Additive Instance-Wise Approach to Multi-class Model Interpretation

14. Learning Object-Language Alignments for Open-Vocabulary Object Detection

15. Complex Reading Comprehension Through Question Decomposition

16. It Is Not As Good As You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

17. Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs

18. Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?

19. Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

20. Learning to Multi-Task Learn for Better Neural Machine Translation

21. Domain Adaptative Causality Encoder

22. Question Generation from Paragraphs: A Tale of Two Hierarchical Models

23. HetFHMM: A novel approach to infer tumor heterogeneity using factorial Hidden Markov model

24. Novel Bernstein-like Concentration Inequalities for the Missing Mass

25. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

26. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

27. HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology

28. Exploring diversity in back translation for low-resource machine translation

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Journal

Database

Publisher

28 results on '"Haffari, Gholamreza"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources