Author: "Piontkovskaya, Irina" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Piontkovskaya, Irina"' showing total 31 results

Start Over Author "Piontkovskaya, Irina"

31 results on '"Piontkovskaya, Irina"'

1. Robust AI-Generated Text Detection by Restricted Embeddings

Author: Kuznetsov, Kristian, Tulchinskii, Eduard, Kushnareva, Laida, Magai, German, Barannikov, Serguei, Nikolenko, Sergey, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD, Comment: Accepted to Findings of EMNLP 2024
Published: 2024

2. Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA

Author: Tulchinskii, Eduard, Kushnareva, Laida, Kuznetsov, Kristian, Voznyuk, Anastasia, Andriiainen, Andrei, Piontkovskaya, Irina, Burnaev, Evgeny, and Barannikov, Serguei
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: A standard way to evaluate the abilities of LLM involves presenting a multiple-choice question and selecting the option with the highest logit as the model's predicted answer. However, such a format for evaluating LLMs has limitations, since even if the model knows the correct answer, it may struggle to select the corresponding letter simply due to difficulties in following this rigid format. To address this, we introduce new scores that better capture and reveal model's underlying knowledge: the Query-Key Score (QK-score), derived from the interaction between query and key representations in attention heads, and the Attention Score, based on attention weights. These scores are extracted from specific \textit{select-and-copy} heads, which show consistent performance across popular Multi-Choice Question Answering (MCQA) datasets. Based on these scores, our method improves knowledge extraction, yielding up to 16\% gain for LLaMA2-7B and up to 10\% for larger models on popular MCQA benchmarks. At the same time, the accuracy on a simple synthetic dataset, where the model explicitly knows the right answer, increases by almost 60\%, achieving nearly perfect accuracy, therefore demonstrating the method's efficiency in mitigating MCQA format limitations. To support our claims, we conduct experiments on models ranging from 7 billion to 70 billion parameters in both zero- and few-shot setups.
Published: 2024

3. Improving Interpretability and Robustness for the Detection of AI-Generated Images

Author: Gaintseva, Tatiana, Kushnareva, Laida, Magai, German, Piontkovskaya, Irina, Nikolenko, Sergey, Benning, Martin, Barannikov, Serguei, and Slabaugh, Gregory
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.
Published: 2024

4. Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

Author: Bout, Andrey, Podolskiy, Alexander, Nikolenko, Sergey, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language
Abstract: Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data. Sufficient amounts of high-quality manually annotated data are not available, so recent research has relied on generating synthetic data, pretraining on it, and then fine-tuning on real datasets; performance gains have been achieved either by ensembling or by using huge pretrained models such as XXL-T5 as the backbone. In this work, we explore an orthogonal direction: how to use available data more efficiently. First, we propose auxiliary tasks that exploit the alignment between the original and corrected sentences, such as predicting a sequence of corrections. We formulate each task as a sequence-to-sequence problem and perform multi-task training. Second, we discover that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance, so we set out to find the best training schedule. Together, these two ideas lead to significant improvements, producing results that improve state of the art with much smaller models; in particular, we outperform the best models based on T5-XXL (11B parameters) with a BART-based model (400M parameters)., Comment: EMNLP 2023
Published: 2023

5. AI-generated text boundary detection with RoFT

Author: Kushnareva, Laida, Gaintseva, Tatiana, Magai, German, Barannikov, Serguei, Abulkhanov, Dmitry, Kuznetsov, Kristian, Tulchinskii, Eduard, Piontkovskaya, Irina, and Nikolenko, Sergey
Subjects: Computer Science - Computation and Language
Abstract: Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings., Comment: Our official repository: https://github.com/SilverSolver/ai_boundary_detection
Published: 2023

6. GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding

Author: Yakovlev, Konstantin, Podolskiy, Alexander, Bout, Andrey, Nikolenko, Sergey, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language
Abstract: Grammatical error correction (GEC) is an important NLP task that is currently usually solved with autoregressive sequence-to-sequence models. However, approaches of this class are inherently slow due to one-by-one token generation, so non-autoregressive alternatives are needed. In this work, we propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network that outputs a self-attention weight matrix that can be used in beam search to find the best permutation of input tokens (with auxiliary {ins} tokens) and a decoder network based on a step-unrolled denoising autoencoder that fills in specific tokens. This allows us to find the token permutation after only one forward pass of the permutation network, avoiding autoregressive constructions. We show that the resulting network improves over previously known non-autoregressive methods for GEC and reaches the level of autoregressive methods that do not use language-specific synthetic data generation methods. Our results are supported by a comprehensive experimental validation on the ConLL-2014 and Write&Improve+LOCNESS datasets and an extensive ablation study that supports our architectural and algorithmic choices., Comment: ACL 2023
Published: 2023

7. Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval

Author: Yakovlev, Konstantin, Polyakov, Gregory, Alimova, Ilseyar, Podolskiy, Alexander, Bout, Andrey, Nikolenko, Sergey, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language
Abstract: A recent trend in multimodal retrieval is related to postprocessing test set results via the dual-softmax loss (DSL). While this approach can bring significant improvements, it usually presumes that an entire matrix of test samples is available as DSL input. This work introduces a new postprocessing approach based on Sinkhorn transformations that outperforms DSL. Further, we propose a new postprocessing setting that does not require access to multiple test queries. We show that our approach can significantly improve the results of state of the art models such as CLIP4Clip, BLIP, X-CLIP, and DRL, thus achieving a new state-of-the-art on several standard text-video retrieval datasets both with access to the entire test set and in the single-query setting., Comment: SIGIR 2023
Published: 2023

8. Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Author: Tulchinskii, Eduard, Kuznetsov, Kristian, Kushnareva, Laida, Cherniavskii, Daniil, Barannikov, Serguei, Piontkovskaya, Irina, Nikolenko, Sergey, and Burnaev, Evgeny
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Mathematics - Algebraic Topology, 68T50
Abstract: Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over different text domains and varying proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant for human-written texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings for a given text sample. We show that the average intrinsic dimensionality of fluent texts in a natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.
Published: 2023

9. Can BERT eat RuCoLA? Topological Data Analysis to Explain

Author: Proskurina, Irina, Piontkovskaya, Irina, and Artemova, Ekaterina
Subjects: Computer Science - Computation and Language
Abstract: This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features. Our approach uses the best practices of topological data analysis (TDA) in NLP: we construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers. We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines. We experiment with two datasets, CoLA and RuCoLA in English and Russian, typologically different languages. On top of that, we propose several black-box introspection techniques aimed at detecting changes in the attention mode of the LMs during fine-tuning, defining the LM's prediction confidences, and associating individual heads with fine-grained grammar phenomena. Our results contribute to understanding the behavior of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs. We release the code and the experimental results for further uptake., Comment: Accepted to the Workshop on Slavic NLP @ EACL 2023
Published: 2023
Full Text: View/download PDF

10. PanGu-{\Sigma}: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Author: Ren, Xiaozhe, Zhou, Pingyi, Meng, Xinfan, Huang, Xinjing, Wang, Yadao, Wang, Weichao, Li, Pengfei, Zhang, Xiaoda, Podolskiy, Alexander, Arshinov, Grigory, Bout, Andrey, Piontkovskaya, Irina, Wei, Jiansheng, Jiang, Xin, Su, Teng, Liu, Qun, and Yao, Jun
Subjects: Computer Science - Computation and Language
Abstract: The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-{\Sigma}. With parameter inherent from PanGu-{\alpha}, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-{\Sigma} provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation.
Published: 2023

11. Topological Data Analysis for Speech Processing

Author: Tulchinskii, Eduard, Kuznetsov, Kristian, Kushnareva, Laida, Cherniavskii, Daniil, Barannikov, Serguei, Piontkovskaya, Irina, Nikolenko, Sergey, and Burnaev, Evgeny
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Mathematics - Algebraic Topology
Abstract: We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/, Comment: Accepted to INTERSPEECH 2023 conference
Published: 2022
Full Text: View/download PDF

12. Betti numbers of attention graphs is all you really need

Author: Kushnareva, Laida, Piontkovski, Dmitri, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language
Abstract: We apply methods of topological analysis to the attention graphs, calculated on the attention heads of the BERT model ( arXiv:1810.04805v2 ). Our research shows that the classifier built upon basic persistent topological features (namely, Betti numbers) of the trained neural network can achieve classification results on par with the conventional classification method. We show the relevance of such topological text representation on three text classification benchmarks. For the best of our knowledge, it is the first attempt to analyze the topology of an attention-based neural network, widely used for Natural Language Processing., Comment: This short paper was submitted to "Topological Data Analysis and Beyond" Workshop at NeurIPS 2020 at July 2020, but wasn't accepted. Later the ideas from this short paper found a rich development in arXiv:2109.04825 and arXiv:2205.09630
Published: 2022

13. Template-based Approach to Zero-shot Intent Recognition

Author: Lamanov, Dmitry, Burnyshev, Pavel, Artemova, Ekaterina, Malykh, Valentin, Bout, Andrey, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language
Abstract: The recent advances in transfer learning techniques and pre-training of large contextualized encoders foster innovation in real-life applications, including dialog assistants. Practical needs of intent recognition require effective data usage and the ability to constantly update supported intents, adopting new ones, and abandoning outdated ones. In particular, the generalized zero-shot paradigm, in which the model is trained on the seen intents and tested on both seen and unseen intents, is taking on new importance. In this paper, we explore the generalized zero-shot setup for intent recognition. Following best practices for zero-shot text classification, we treat the task with a sentence pair modeling approach. We outperform previous state-of-the-art f1-measure by up to 16\% for unseen intents, using intent labels and user utterances and without accessing external sources (such as knowledge bases). Further enhancement includes lexicalization of intent labels, which improves performance by up to 7\%. By using task transferring from other sentence pair tasks, such as Natural Language Inference, we gain additional improvements., Comment: accepted to INLG 2022
Published: 2022

14. Acceptability Judgements via Examining the Topology of Attention Maps

Author: Cherniavskii, Daniil, Tulchinskii, Eduard, Mikhailov, Vladislav, Proskurina, Irina, Kushnareva, Laida, Artemova, Ekaterina, Barannikov, Serguei, Piontkovskaya, Irina, Piontkovski, Dmitri, and Burnaev, Evgeny
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Mathematics - Algebraic Topology
Abstract: The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by $8$%-$24$% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the foundation for analyzing the linguistic functions of attention heads and interpreting the correspondence between the graph features and grammatical phenomena., Comment: Accepted to EMNLP 2022 Findings
Published: 2022
Full Text: View/download PDF

15. Artificial Text Detection via Examining the Topology of Attention Maps

Author: Kushnareva, Laida, Cherniavskii, Daniil, Mikhailov, Vladislav, Artemova, Ekaterina, Barannikov, Serguei, Bernstein, Alexander, Piontkovskaya, Irina, Piontkovski, Dmitri, and Burnaev, Evgeny
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Mathematics - Algebraic Topology
Abstract: The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10\% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information., Comment: Accepted to EMNLP 2021
Published: 2021
Full Text: View/download PDF

16. A Single Example Can Improve Zero-Shot Data Generation

Author: Burnyshev, Pavel, Malykh, Valentin, Bout, Andrey, Artemova, Ekaterina, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language
Abstract: Sub-tasks of intent classification, such as robustness to distribution shift, adaptation to specific user groups and personalization, out-of-domain detection, require extensive and flexible datasets for experiments and evaluation. As collecting such datasets is time- and labor-consuming, we propose to use text generation methods to gather datasets. The generator should be trained to generate utterances that belong to the given intent. We explore two approaches to generating task-oriented utterances. In the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training. In the one-shot approach, the model is presented with a single utterance from a test intent. We perform a thorough automatic, and human evaluation of the dataset generated utilizing two proposed approaches. Our results reveal that the attributes of the generated data are close to original test sets, collected via crowd-sourcing., Comment: To appear in INLG2021 proceedings
Published: 2021

17. Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Author: Podolskiy, Alexander, Lipin, Dmitry, Bout, Andrey, Artemova, Ekaterina, and Piontkovskaya, Irina
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Real-life applications, heavily relying on machine learning, such as dialog systems, demand out-of-domain detection methods. Intent classification models should be equipped with a mechanism to distinguish seen intents from unseen ones so that the dialog agent is capable of rejecting the latter and avoiding undesired behavior. However, despite increasing attention paid to the task, the best practices for out-of-domain intent detection have not yet been fully established. This paper conducts a thorough comparison of out-of-domain intent detection methods. We prioritize the methods, not requiring access to out-of-domain data during training, gathering of which is extremely time- and labor-consuming due to lexical and stylistic variation of user utterances. We evaluate multiple contextual encoders and methods, proven to be efficient, on three standard datasets for intent classification, expanded with out-of-domain utterances. Our main findings show that fine-tuning Transformer-based encoders on in-domain data leads to superior results. Mahalanobis distance, together with utterance representations, derived from Transformer-based encoders, outperforms other methods by a wide margin and establishes new state-of-the-art results for all datasets. The broader analysis shows that the reason for success lies in the fact that the fine-tuned Transformer is capable of constructing homogeneous representations of in-domain utterances, revealing geometrical disparity to out of domain utterances. In turn, the Mahalanobis distance captures this disparity easily. The code is available in our GitHub repo: https://github.com/huawei-noah/noah-research/tree/master/Maha_OOD ., Comment: AAAI 2021
Published: 2021

18. Differentially Private Distributed Learning for Language Modeling Tasks

Author: Popov, Vadim, Kudinov, Mikhail, Piontkovskaya, Irina, Vytovtov, Petr, and Nevidomsky, Alex
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computer Science - Learning
Abstract: One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, users' language (e.g. in private messaging) could change in a year and be completely different from what we observe in publicly available data. At the same time, public data can be used for obtaining general knowledge (i.e. general model of English). We study approaches to distributed fine-tuning of a general model on user private data with the additional requirements of maintaining the quality on the general data and minimization of communication costs. We propose a novel technique that significantly improves prediction quality on users' language compared to a general model and outperforms gradient compression methods in terms of communication efficiency. The proposed procedure is fast and leads to an almost 70% perplexity reduction and 8.7 percentage point improvement in keystroke saving rate on informal English texts. We also show that the range of tasks our approach is applicable to is not limited by language modeling only. Finally, we propose an experimental framework for evaluating differential privacy of distributed training of language models and show that our approach has good privacy guarantees.
Published: 2017

19. Binary Autoencoder for Text Modeling

Author: Baynazarov, Ruslan, Piontkovskaya, Irina, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Ustalov, Dmitry, editor, Filchenkov, Andrey, editor, and Pivovarova, Lidia, editor
Published: 2019
Full Text: View/download PDF

20. Topological Data Analysis for Speech Processing

Author: Tulchinskii, Eduard, primary, Kuznetsov, Kristian, additional, Kushnareva, Laida, additional, Cherniavskii, Daniil, additional, Barannikov, Serguei, additional, Piontkovskaya, Irina, additional, Nikolenko, Sergey, additional, and Burnaev, Evgeny, additional
Published: 2023
Full Text: View/download PDF

21. Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval

Author: Yakovlev, Konstantin, primary, Polyakov, Gregory, additional, Alimova, Ilseyar, additional, Podolskiy, Alexander, additional, Bout, Andrey, additional, Nikolenko, Sergey, additional, and Piontkovskaya, Irina, additional
Published: 2023
Full Text: View/download PDF

22. Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

Author: Bout, Andrey, primary, Podolskiy, Alexander, additional, Nikolenko, Sergey, additional, and Piontkovskaya, Irina, additional
Published: 2023
Full Text: View/download PDF

23. GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding

Author: Yakovlev, Konstantin, primary, Podolskiy, Alexander, additional, Bout, Andrey, additional, Nikolenko, Sergey, additional, and Piontkovskaya, Irina, additional
Published: 2023
Full Text: View/download PDF

24. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Author: Ren, Xiaozhe, Zhou, Pingyi, Meng, Xinfan, Huang, Xinjing, Wang, Yadao, Wang, Weichao, Li, Pengfei, Zhang, Xiaoda, Podolskiy, Alexander, Arshinov, Grigory, Bout, Andrey, Piontkovskaya, Irina, Wei, Jiansheng, Jiang, Xin, Su, Teng, Liu, Qun, and Yao, Jun
Subjects: FOS: Computer and information sciences, Computation and Language (cs.CL)
Abstract: The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-Σ. With parameter inherent from PanGu-α, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-Σ provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation.
Published: 2023
Full Text: View/download PDF

25. Template-based Approach to Zero-shot Intent Recognition

Author: Lamanov, Dmitry, primary, Burnyshev, Pavel, additional, Artemova, Katya, additional, Malykh, Valentin, additional, Bout, Andrey, additional, and Piontkovskaya, Irina, additional
Published: 2022
Full Text: View/download PDF

26. Ask Me Anything in Your Native Language

Author: Sorokin, Nikita, primary, Abulkhanov, Dmitry, additional, Piontkovskaya, Irina, additional, and Malykh, Valentin, additional
Published: 2022
Full Text: View/download PDF

27. Acceptability Judgements via Examining the Topology of Attention Maps

Author: Cherniavskii, Daniil, primary, Tulchinskii, Eduard, additional, Mikhailov, Vladislav, additional, Proskurina, Irina, additional, Kushnareva, Laida, additional, Artemova, Ekaterina, additional, Barannikov, Serguei, additional, Piontkovskaya, Irina, additional, Piontkovski, Dmitri, additional, and Burnaev, Evgeny, additional
Published: 2022
Full Text: View/download PDF

28. Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Author: Podolskiy, Alexander, primary, Lipin, Dmitry, additional, Bout, Andrey, additional, Artemova, Ekaterina, additional, and Piontkovskaya, Irina, additional
Published: 2021
Full Text: View/download PDF

29. Artificial Text Detection via Examining the Topology of Attention Maps

Author: Kushnareva, Laida, primary, Cherniavskii, Daniil, additional, Mikhailov, Vladislav, additional, Artemova, Ekaterina, additional, Barannikov, Serguei, additional, Bernstein, Alexander, additional, Piontkovskaya, Irina, additional, Piontkovski, Dmitri, additional, and Burnaev, Evgeny, additional
Published: 2021
Full Text: View/download PDF

30. Single Example Can Improve Zero-Shot Data Generation

Author: Burnyshev, Pavel, primary, Malykh, Valentin, additional, Bout, Andrey, additional, Artemova, Ekaterina, additional, and Piontkovskaya, Irina, additional
Published: 2021
Full Text: View/download PDF

31. SumTitles: a Summarization Dataset with Low Extractiveness

Author: Malykh, Valentin, primary, Chernis, Konstantin, additional, Artemova, Ekaterina, additional, and Piontkovskaya, Irina, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

31 results on '"Piontkovskaya, Irina"'

1. Robust AI-Generated Text Detection by Restricted Embeddings

2. Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA

3. Improving Interpretability and Robustness for the Detection of AI-Generated Images

4. Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

5. AI-generated text boundary detection with RoFT

6. GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding

7. Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval

8. Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

9. Can BERT eat RuCoLA? Topological Data Analysis to Explain

10. PanGu-{\Sigma}: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

11. Topological Data Analysis for Speech Processing

12. Betti numbers of attention graphs is all you really need

13. Template-based Approach to Zero-shot Intent Recognition

14. Acceptability Judgements via Examining the Topology of Attention Maps

15. Artificial Text Detection via Examining the Topology of Attention Maps

16. A Single Example Can Improve Zero-Shot Data Generation

17. Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

18. Differentially Private Distributed Learning for Language Modeling Tasks

19. Binary Autoencoder for Text Modeling

20. Topological Data Analysis for Speech Processing

21. Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval

22. Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

23. GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding

24. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

25. Template-based Approach to Zero-shot Intent Recognition

26. Ask Me Anything in Your Native Language

27. Acceptability Judgements via Examining the Topology of Attention Maps

28. Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

29. Artificial Text Detection via Examining the Topology of Attention Maps

30. Single Example Can Improve Zero-Shot Data Generation

31. SumTitles: a Summarization Dataset with Low Extractiveness

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

31 results on '"Piontkovskaya, Irina"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources