Author: "Phukan A" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Phukan A"' showing total 5,137 results

Start Over Author "Phukan A"

5,137 results on '"Phukan A"'

1. Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks

Author: Phukan, Orchid Chetia, Koshal, Devyani, Behera, Swarup Ranjan, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, 68T45, I.2.7
Abstract: Speech forensic tasks (SFTs), such as automatic speaker recognition (ASR), speech emotion recognition (SER), gender recognition (GR), and age estimation (AE), find use in different security and biometric applications. Previous works have applied various techniques, with recent studies focusing on applying speech foundation models (SFMs) for improved performance. However, most prior efforts have centered on building individual models for each task separately, despite the inherent similarities among these tasks. This isolated approach results in higher computational resource requirements, increased costs, time consumption, and maintenance challenges. In this study, we address these challenges by employing a multi-task learning strategy. Firstly, we explore the various state-of-the-art (SOTA) SFMs by extracting their representations for learning these SFTs and investigating their effectiveness at each task specifically. Secondly, we analyze the performance of the extracted representations on the SFTs in a multi-task learning framework. We observe a decline in performance when SFTs are modeled together compared to individual task-specific models, and as a remedy, we propose multi-view learning (MVL). Views are representations from different SFMs transformed into distinct abstract spaces by characteristics unique to each SFM. By leveraging MVL, we integrate these diverse representations to capture complementary information across tasks, enhancing the shared learning process. We introduce a new framework called TANGO (Task Alignment with iNter-view Gated Optimal transport) to implement this approach. With TANGO, we achieve the topmost performance in comparison to individual SFM representations as well as baseline fusion techniques across benchmark datasets such as CREMA-D, emo-DB, and BAVED.
Published: 2024

2. Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals

Author: Phukan, Orchid Chetia, Behera, Swarup Ranjan, Girish, Akhtar, Mohd Mujtaba, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing, 68T45, I.2.7
Abstract: Despite being trained exclusively on speech data, speech foundation models (SFMs) like Whisper have shown impressive performance in non-speech tasks such as audio classification. This is partly because speech shares some common traits with audio, enabling SFMs to transfer effectively. In this study, we push the boundaries by evaluating SFMs on a more challenging out-of-domain (OOD) task: classifying physiological time-series signals. We test two key hypotheses: first, that SFMs can generalize to physiological signals by capturing shared temporal patterns; second, that multilingual SFMs will outperform others due to their exposure to greater variability during pre-training, leading to more robust, generalized representations. Our experiments, conducted for stress recognition using ECG (Electrocardiogram), EMG (Electromyography), and EDA (Electrodermal Activity) signals, reveal that models trained on SFM-derived representations outperform those trained on raw physiological signals. Among all models, multilingual SFMs achieve the highest accuracy, supporting our hypothesis and demonstrating their OOD capabilities. This work positions SFMs as promising tools for new uncharted domains beyond speech.
Published: 2024

3. SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning

Author: Jain, Sarthak, Phukan, Orchid Chetia, Behera, Swarup Ranjan, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, 68T45, I.2.7
Abstract: In this work, we introduce SeQuiFi, a novel approach for mitigating catastrophic forgetting (CF) in speech emotion recognition (SER). SeQuiFi adopts a sequential class-finetuning strategy, where the model is fine-tuned incrementally on one emotion class at a time, preserving and enhancing retention for each class. While various state-of-the-art (SOTA) methods, such as regularization-based, memory-based, and weight-averaging techniques, have been proposed to address CF, it still remains a challenge, particularly with diverse and multilingual datasets. Through extensive experiments, we demonstrate that SeQuiFi significantly outperforms both vanilla fine-tuning and SOTA continual learning techniques in terms of accuracy and F1 scores on multiple benchmark SER datasets, including CREMA-D, RAVDESS, Emo-DB, MESD, and SHEMO, covering different languages.
Published: 2024

4. ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos

Author: Phukan, Arpan, Gupta, Manish, and Ekbal, Asif
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Previous studies on question generation from videos have mostly focused on generating questions about common objects and attributes and hence are not entity-centric. In this work, we focus on the generation of entity-centric information-seeking questions from videos. Such a system could be useful for video-based learning, recommending ``People Also Ask'' questions, video-based chatbots, and fact-checking. Our work addresses three key challenges: identifying question-worthy information, linking it to entities, and effectively utilizing multimodal signals. Further, to the best of our knowledge, there does not exist a large-scale dataset for this task. Most video question generation datasets are on TV shows, movies, or human activities or lack entity-centric information-seeking questions. Hence, we contribute a diverse dataset of YouTube videos, VideoQuestions, consisting of 411 videos with 2265 manually annotated questions. We further propose a model architecture combining Transformers, rich context signals (titles, transcripts, captions, embeddings), and a combination of cross-entropy and contrastive loss function to encourage entity-centric question generation. Our best method yields BLEU, ROUGE, CIDEr, and METEOR scores of 71.3, 78.6, 7.31, and 81.9, respectively, demonstrating practical usability. We make the code and dataset publicly available. https://github.com/thePhukan/ECIS-VQG, Comment: Accepted in EMNLP 2024, https://openreview.net/forum?id=CriKOn01dI
Published: 2024

5. Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection

Author: Phukan, Orchid Chetia, Girish, Akhtar, Mohd Mujtaba, Behera, Swarup Ranjan, Choudhury, Nitin, Buduru, Arun Balaji, Sharma, Rajesh, and Prasanna, S. R Mahadeva
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, 68T45, I.2.7
Abstract: The adaptation of foundation models has significantly advanced environmental audio deepfake detection (EADD), a rapidly growing area of research. These models are typically fine-tuned or utilized in their frozen states for downstream tasks. However, the dimensionality of their representations can substantially lead to a high parameter count of downstream models, leading to higher computational demands. So, a general way is to compress these representations by leveraging state-of-the-art (SOTA) unsupervised dimensionality reduction techniques (PCA, SVD, KPCA, GRP) for efficient EADD. However, with the application of such techniques, we observe a drop in performance. So in this paper, we show that representation vectors contain redundant information, and randomly selecting 40-50% of representation values and building downstream models on it preserves or sometimes even improves performance. We show that such random selection preserves more performance than the SOTA dimensionality reduction techniques while reducing model parameters and inference time by almost over half., Comment: Submitted to ICASSP 2025
Published: 2024

6. Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Author: Phukan, Orchid Chetia, Behera, Swarup Ranjan, Singh, Shubham, Singh, Muskaan, Rajan, Vandana, Buduru, Arun Balaji, Sharma, Rajesh, and Prasanna, S. R. Mahadeva
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, 68T45, I.2.7
Abstract: In this study, we address the challenge of depression detection from speech, focusing on the potential of non-semantic features (NSFs) to capture subtle markers of depression. While prior research has leveraged various features for this task, NSFs-extracted from pre-trained models (PTMs) designed for non-semantic tasks such as paralinguistic speech processing (TRILLsson), speaker recognition (x-vector), and emotion recognition (emoHuBERT)-have shown significant promise. However, the potential of combining these diverse features has not been fully explored. In this work, we demonstrate that the amalgamation of NSFs results in complementary behavior, leading to enhanced depression detection performance. Furthermore, to our end, we introduce a simple novel framework, FuSeR, designed to effectively combine these features. Our results show that FuSeR outperforms models utilizing individual NSFs as well as baseline fusion techniques and obtains state-of-the-art (SOTA) performance in E-DAIC benchmark with RMSE of 5.51 and MAE of 4.48, establishing it as a robust approach for depression detection., Comment: Submitted to ICASSP 2025
Published: 2024

7. Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition

Author: Phukan, Orchid Chetia, Akhtar, Mohd Mujtaba, Girish, Behera, Swarup Ranjan, Kalita, Sishir, Buduru, Arun Balaji, Sharma, Rajesh, and Prasanna, S. R Mahadeva
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, 68T45, I.2.7
Abstract: In this study, we investigate multimodal foundation models (MFMs) for emotion recognition from non-verbal sounds. We hypothesize that MFMs, with their joint pre-training across multiple modalities, will be more effective in non-verbal sounds emotion recognition (NVER) by better interpreting and differentiating subtle emotional cues that may be ambiguous in audio-only foundation models (AFMs). To validate our hypothesis, we extract representations from state-of-the-art (SOTA) MFMs and AFMs and evaluated them on benchmark NVER datasets. We also investigate the potential of combining selected foundation model representations to enhance NVER further inspired by research in speech recognition and audio deepfake detection. To achieve this, we propose a framework called MATA (Intra-Modality Alignment through Transport Attention). Through MATA coupled with the combination of MFMs: LanguageBind and ImageBind, we report the topmost performance with accuracies of 76.47%, 77.40%, 75.12% and F1-scores of 70.35%, 76.19%, 74.63% for ASVP-ESD, JNV, and VIVAE datasets against individual FMs and baseline fusion techniques and report SOTA on the benchmark datasets., Comment: Submitted to ICASSP 2025
Published: 2024

8. Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models

Author: Phukan, Orchid Chetia, Jain, Sarthak, Behera, Swarup Ranjan, Buduru, Arun Balaji, Sharma, Rajesh, and Prasanna, S. R Mahadeva
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound, 68T45, I.2.7
Abstract: In this study, for the first time, we extensively investigate whether music foundation models (MFMs) or speech foundation models (SFMs) work better for singing voice deepfake detection (SVDD), which has recently attracted attention in the research community. For this, we perform a comprehensive comparative study of state-of-the-art (SOTA) MFMs (MERT variants and music2vec) and SFMs (pre-trained for general speech representation learning as well as speaker recognition). We show that speaker recognition SFM representations perform the best amongst all the foundation models (FMs), and this performance can be attributed to its higher efficacy in capturing the pitch, tone, intensity, etc, characteristics present in singing voices. To our end, we also explore the fusion of FMs for exploiting their complementary behavior for improved SVDD, and we propose a novel framework, FIONA for the same. With FIONA, through the synchronization of x-vector (speaker recognition SFM) and MERT-v1-330M (MFM), we report the best performance with the lowest Equal Error Rate (EER) of 13.74 %, beating all the individual FMs as well as baseline FM fusions and achieving SOTA results., Comment: Submitted to ICASSP 2025
Published: 2024

9. Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus

Author: Akhtar, Mohd Mujtaba, Girish, Singh, Muskaan, and Phukan, Orchid Chetia
Subjects: Computer Science - Machine Learning
Abstract: Autism Spectrum Disorder (ASD) is a complex neuro-developmental challenge, presenting a spectrum of difficulties in social interaction, communication, and the expression of repetitive behaviors in different situations. This increasing prevalence underscores the importance of ASD as a major public health concern and the need for comprehensive research initiatives to advance our understanding of the disorder and its early detection methods. This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children through the analysis of code-switched speech (English and Hindi). Employing advanced audio processing techniques, the research integrates acoustic, paralinguistic, and linguistic information using Transformer Encoders. This innovative fusion strategy is designed to improve classification robustness and accuracy, crucial for early and precise ASD identification. The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group. The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years, resulting in a total of 159.75 minutes of voice recordings. The feature analysis focuses on MFCCs and extensive statistical attributes to capture speech pattern variability and complexity. The best model performance is achieved using a hierarchical fusion technique with an accuracy of 98.75% using a combination of acoustic and linguistic features first, followed by paralinguistic features in a hierarchical manner.
Published: 2024

10. ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds

Author: Chaudhuri, Yashwardhan, Mundra, Paridhi, Batra, Arnesh, Phukan, Orchid Chetia, and Buduru, Arun Balaji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Recognition and interpretation of bird vocalizations are pivotal in ornithological research and ecological conservation efforts due to their significance in understanding avian behaviour, performing habitat assessment and judging ecological health. This paper presents an audio spectrogram-guided classification framework called ASGIR for improved bird sound recognition and information retrieval. Our work is accompanied by a simple-to-use, two-step information retrieval system that uses geographical location and bird sounds to localize and retrieve relevant bird information by scraping Wikipedia page information of recognized birds. ASGIR offers a substantial performance on a random subset of 51 classes of Xeno-Canto dataset Bird sounds from European countries with a median of 100\% performance on F1, Precision and Sensitivity metrics. Our code is available as follows: https://github.com/MainSample1234/AS-GIR ., Comment: Accepted to INTERSPEECH'24
Published: 2024

11. VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds

Author: Mundra, Paridhi, Sharma, Manik, Chaudhuri, Yashwardhan, Phukan, Orchid Chetia, and Buduru, Arun Balaji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: As respiratory illnesses become more common, it is crucial to quickly and accurately detect them to improve patient care. There is a need for improved diagnostic methods for immediate medical assessments for optimal patient outcomes. This paper introduces VoxMed, a UI-assisted one-step classifier that uses digital stethoscope recordings to diagnose respiratory diseases. It employs an Audio Spectrogram Transformer(AST) for feature extraction and a 1-D CNN-based architecture to classify respiratory diseases, offering professionals information regarding their patients respiratory health in seconds. We use the ICBHI dataset, which includes stethoscope recordings collected from patients in Greece and Portugal, to classify respiratory diseases. GitHub repository: https://github.com/Sample-User131001/VoxMed, Comment: Accepted to INTERSPEECH'24
Published: 2024

12. AVR: Synergizing Foundation Models for Audio-Visual Humor Detection

Author: Sharma, Sarthak, Phukan, Orchid Chetia, Singh, Drishti, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In this work, we present, AVR application for audio-visual humor detection. While humor detection has traditionally centered around textual analysis, recent advancements have spotlighted multimodal approaches. However, these methods lean on textual cues as a modality, necessitating the use of ASR systems for transcribing the audio-data. This heavy reliance on ASR accuracy can pose challenges in real-world applications. To address this bottleneck, we propose an innovative audio-visual humor detection system that circumvents textual reliance, eliminating the need for ASR models. Instead, the proposed approach hinges on the intricate interplay between audio and visual content for effective humor detection., Comment: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations
Published: 2024

13. Towards Multilingual Audio-Visual Question Answering

Author: Phukan, Orchid Chetia, Mallick, Priyabrata, Behera, Swarup Ranjan, Narayani, Aalekhya Satya, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, 68T45
Abstract: In this paper, we work towards extending Audio-Visual Question Answering (AVQA) to multilingual settings. Existing AVQA research has predominantly revolved around English and replicating it for addressing AVQA in other languages requires a substantial allocation of resources. As a scalable solution, we leverage machine translation and present two multilingual AVQA datasets for eight languages created from existing benchmark AVQA datasets. This prevents extra human annotation efforts of collecting questions and answers manually. To this end, we propose, MERA framework, by leveraging state-of-the-art (SOTA) video, audio, and textual foundation models for AVQA in multiple languages. We introduce a suite of models namely MERA-L, MERA-C, MERA-T with varied model architectures to benchmark the proposed datasets. We believe our work will open new research directions and act as a reference benchmark for future works in multilingual AVQA., Comment: Accepted to Interspeech 2024
Published: 2024

14. The Reasonable Effectiveness of Speaker Embeddings for Violence Detection

Author: Jain, Sarthak, Phukan, Orchid Chetia, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD systems. Like many related applications in audio processing, the most common approach for improving the performance, would be by leveraging self-supervised (SSL) pre-trained models (PTMs). However, as these SSL models are very large models with million of parameters and this can hinder real-world deployment especially in compute-constraint environment. To resolve this, we propose the usage of speaker recognition models which are much smaller compared to the SSL models. Experimentation with speaker recognition model embeddings with SVM & Random Forest as classifiers, we show that speaker recognition model embeddings perform the best in comparison to state-of-the-art (SOTA) SSL models and achieve SOTA results., Comment: Accepted to INTERSPEECH 24 Show & Tell Demonstrations
Published: 2024

15. PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation

Author: Koshal, Devyani, Phukan, Orchid Chetia, Jain, Sarthak, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in developing models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite their inherent interconnectedness. As such in this demonstration, we present PERSONA, an application for predicting ER, GR, and AE with a single model in the backend. One notable point is we show that representations from speaker recognition pre-trained model (PTM) is better suited for such a multi-task learning format than the state-of-the-art (SOTA) self-supervised (SSL) PTM by carrying out a comparative study. Our methodology obviates the need for deploying separate models for each task and can potentially conserve resources and time during the training and deployment phases., Comment: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations
Published: 2024

16. ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

Author: Phukan, Orchid Chetia, Jain, Sarthak, Singh, Shubham, Singh, Muskaan, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce ComFeAT, an application that employs a CNN model trained on a combination of features extracted from PTMs, a.k.a. neural features and spectral features to enhance depression detection. Spectral features are robust to domain variations, but, they are not as good as neural features in performance, suprisingly, combining them shows complementary behavior and improves over both neural and spectral features individually. The proposed method also improves over previous state-of-the-art (SOTA) works on E-DAIC benchmark., Comment: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations
Published: 2024

17. NeuRO: An Application for Code-Switched Autism Detection in Children

Author: Akhtar, Mohd Mujtaba, Girish, Phukan, Orchid Chetia, and Singh, Muskaan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Code-switching is a common communication phenomenon where individuals alternate between two or more languages or linguistic styles within a single conversation. Autism Spectrum Disorder (ASD) is a developmental disorder posing challenges in social interaction, communication, and repetitive behaviors. Detecting ASD in individuals with code-switch scenario presents unique challenges. In this paper, we address this problem by building an application NeuRO which aims to detect potential signs of autism in code-switched conversations, facilitating early intervention and support for individuals with ASD., Comment: Accepted to INTERSPEECH 24 Show & Tell Demonstrations
Published: 2024

18. CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

Author: Phukan, Orchid Chetia, Chaurasia, Yashasvi, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In this study, we investigate representations from paralingual Pre-Trained model (PTM) for Audio Abuse Detection (AAD), which has not been explored for AAD. Our results demonstrate their superiority compared to other PTM representations on the ADIMA benchmark. Furthermore, combining PTM representations enhances AAD performance. Despite these improvements, challenges with cross-lingual generalizability still remain, and certain languages require training in the same language. This demands individual models for different languages, leading to scalability, maintenance, and resource allocation issues and hindering the practical deployment of AAD systems in linguistically diverse real-world environments. To address this, we introduce CoLLAB, a novel framework that doesn't require training and allows seamless merging of models trained in different languages through weight-averaging. This results in a unified model with competitive AAD performance across multiple languages.
Published: 2024

19. Educating the 'Adivasis': Understanding Ekal Vidyalayas inside the Tea Gardens of Assam

Author: Shilpi Shikha Phukan
Abstract: Based on fieldwork in Tinsukia district of Assam, this article examines the interplay of Ekal, ideology and education among the Adivasis in the tea gardens. The Ekal schools are one-teacher, informal schools operated by the Sangh Parivar in the underprivileged regions of India and Nepal. Considering the backward status of Adivasis in these tea gardens, the official narrative of the school is to reduce the literacy gap among the community, impart "sanskar" and empower them with entrepreneurial skills. I argue that Ekal's organisational structure and funding are corporate, and its conception of 'empowerment' is neoliberal. This works as an effective model for Hindutva as the Sangh seeks to bring popularly Christian-dominated tea gardens into its Hindutva fold. This is done by practices both inside and outside the school.
Published: 2024
Full Text: View/download PDF

20. Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering

Author: Phukan, Anirudh, Somasundaram, Shwetha, Saxena, Apoorv, Goswami, Koustava, and Srinivasan, Balaji Vasan
Subjects: Computer Science - Computation and Language
Abstract: With the enhancement in the field of generative artificial intelligence (AI), contextual question answering has become extremely relevant. Attributing model generations to the input source document is essential to ensure trustworthiness and reliability. We observe that when large language models (LLMs) are used for contextual question answering, the output answer often consists of text copied verbatim from the input prompt which is linked together with "glue text" generated by the LLM. Motivated by this, we propose that LLMs have an inherent awareness from where the text was copied, likely captured in the hidden states of the LLM. We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of LLMs. Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers. Our experimental results demonstrate that our method performs on par or better than GPT-4 at identifying verbatim copied segments in LLM generations and in attributing these segments to their source. Importantly, our method shows robust performance across various LLM architectures, highlighting its broad applicability. Additionally, we present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
Published: 2024

21. SONIC: Synergizing VisiON Foundation Models for Stress RecogNItion from ECG signals

Author: Phukan, Orchid Chetia, Das, Ankita, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Stress recognition through physiological signals such as Electrocardiogram (ECG) signals has garnered significant attention. Traditionally, research in this field predominantly focused on utilizing handcrafted features or raw signals as inputs for learning algorithms. However, there is now a burgeoning interest within the community in leveraging large-scale vision foundation models (VFMs) like ResNet50, VGG19, and others. These VFMs are increasingly preferred due to their ability to capture complex features, enhancing the accuracy and effectiveness of stress recognition systems. However, no particular focus has been given on combining these VFMs. The combination of VFMs offers promising benefits by harnessing their collective knowledge to extract richer representations for improved stress recognition. So, to mitigate this research gap, we focus on combining different VFMs for stress recognition from ECG and propose SONIC, a novel framework that combines VFMs through their logits and training a fully connected network on the combined logits. Through extensive experimentation, SONIC showed the top performance against individual VFMs performance on the WESAD benchmark. With SONIC, we report state-of-the-art (SOTA) performance in WESAD with 99.36% and 99.24% (stress vs non-stress) and 97.66% and 97.10% (amusement vs stress vs baseline) in accuracy and F1 respectively.
Published: 2024

22. Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

Author: Phukan, Orchid Chetia, Kashyap, Gautam Siddharth, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In this work, we investigate multilingual speech Pre-Trained models (PTMs) for Audio deepfake detection (ADD). We hypothesize that multilingual PTMs trained on large-scale diverse multilingual data gain knowledge about diverse pitches, accents, and tones, during their pre-training phase and making them more robust to variations. As a result, they will be more effective for detecting audio deepfakes. To validate our hypothesis, we extract representations from state-of-the-art (SOTA) PTMs including monolingual, multilingual as well as PTMs trained for speaker and emotion recognition, and evaluated them on ASVSpoof 2019 (ASV), In-the-Wild (ITW), and DECRO benchmark databases. We show that representations from multilingual PTMs, with simple downstream networks, attain the best performance for ADD compared to other PTM representations, which validates our hypothesis. We also explore the possibility of fusion of selected PTM representations for further improvements in ADD, and we propose a framework, MiO (Merge into One) for this purpose. With MiO, we achieve SOTA performance on ASV and ITW and comparable performance on DECRO with current SOTA works., Comment: Accepted to NAACL (Findings) 2024
Published: 2024

23. High levels of terrestrial gamma radiation exposure in the Kopili Fault Zone on the eastern wedge of the Shillong Plateau, India

Author: Gogoi, Pranjal Protim, Phukan, Sarat, and Barooah, Debajyoti
Published: 2024
Full Text: View/download PDF

24. Miscellaneous prospects of invasive Lantana camara biomass—a standpoint on bioenergy generation and value addition

Author: Chongloi, Vahshi, Phukan, Mayur Mausoom, and Bora, Plaban
Published: 2024
Full Text: View/download PDF

25. Bioenergy generation and value addition from processing plant-generated industrial tea waste: a thermochemical approach

Author: Haque, Mehseema, Bora, Plaban, Phukan, Mayur Mausoom, and Borah, Tapanjit
Published: 2024
Full Text: View/download PDF

26. Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?

Author: Phukan, Orchid Chetia, Kashyap, Gautam Siddharth, Buduru, Arun Balaji, and Sharma, Rajesh
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Availability of representations from pre-trained models (PTMs) have facilitated substantial progress in speech emotion recognition (SER). Particularly, representations from PTM trained for paralinguistic speech processing have shown state-of-the-art (SOTA) performance for SER. However, such paralinguistic PTM representations haven't been evaluated for SER in linguistic environments other than English. Also, paralinguistic PTM representations haven't been investigated in benchmarks such as SUPERB, EMO-SUPERB, ML-SUPERB for SER. This makes it difficult to access the efficacy of paralinguistic PTM representations for SER in multiple languages. To fill this gap, we perform a comprehensive comparative study of five SOTA PTM representations. Our results shows that paralinguistic PTM (TRILLsson) representations performs the best and this performance can be attributed to its effectiveness in capturing pitch, tone and other speech characteristics more effectively than other PTM representations., Comment: Accepted to INTERSPEECH 24
Published: 2024

27. A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting

Author: Chaudhuri, Yashwardhan, Kumar, Ankit, Phukan, Orchid Chetia, and Buduru, Arun Balaji
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Crowd counting finds direct applications in real-world situations, making computational efficiency and performance crucial. However, most of the previous methods rely on a heavy backbone and a complex downstream architecture that restricts the deployment. To address this challenge and enhance the versatility of crowd-counting models, we introduce two lightweight models. These models maintain the same downstream architecture while incorporating two distinct backbones: MobileNet and MobileViT. We leverage Adjacent Feature Fusion to extract diverse scale features from a Pre-Trained Model (PTM) and subsequently combine these features seamlessly. This approach empowers our models to achieve improved performance while maintaining a compact and efficient design. With the comparison of our proposed models with previously available state-of-the-art (SOTA) methods on ShanghaiTech-A ShanghaiTech-B and UCF-CC-50 dataset, it achieves comparable results while being the most computationally efficient model. Finally, we present a comparative study, an extensive ablation study, along with pruning to show the effectiveness of our models.
Published: 2024

28. Physio-metabolic alterations in Labeo rohita (Hamilton, 1822) and native predator Chitala chitala (Hamilton, 1822) in presence of an invasive species Piractus brachypomus (G. Cuvier, 1818)

Author: Borah, Kankana, Phukan, Bipul, Talukdar, Avinash, Deka, Pankaj, Pokhrel, Hemanta, Kalita, Manoj, Kumar, Annam Pavan, Ali, Ayub, Bhuyan, Pradip Chandra, Patowary, Arnab Narayan, Kumar Sarma, Dipak, Ahmed, Mustafa, Kalita, Rinku, and Xavier, Martin
Published: 2024
Full Text: View/download PDF

29. A fractional order model for dynamics of HIV infection through various modes of transmission

Author: Phukan, Jyotiska and Dutta, Hemen
Published: 2024
Full Text: View/download PDF

30. Externally supplied ascorbic acid moderates detrimental effects of UV-C exposure in cyanobacteria

Author: Phukan, Tridip, Ryntathiang, Sukjailin, and Syiem, Mayashree B.
Published: 2024
Full Text: View/download PDF

31. Assessment of terrestrial gamma radiations and radiological risks in Makum Coalfield, India

Author: Paul, Susmita, Gogoi, Pranjal Protim, Phukan, Sarat, and Barooah, Debajyoti
Published: 2024
Full Text: View/download PDF

32. QuMIN: quantum multi-modal data fusion for humor detection

Author: Phukan, Arpan, Haq Khan, Anas Anwarul, and Ekbal, Asif
Published: 2024
Full Text: View/download PDF

33. A study on sheath structure in discharge and diffusion region of a double plasma device

Author: Mishra, Mrinal Kr., Phukan, Arindam, and Chakraborty, Monojit
Published: 2024
Full Text: View/download PDF

34. Quantification of 222Rn exhalation rates and effective 226Ra content from geological samples across the Kopili Fault Zone, India

Author: Gogoi, Pranjal Protim, Phukan, Sarat, and Barooah, Debajyoti
Published: 2024
Full Text: View/download PDF

35. Correlation of Serum Lymphocyte-Derived Biomarkers in Muscle Invasive and Non-Muscle Invasive Bladder Cancer: a Hospital Based Retrospective Study

Author: Singh, Avnish Kumar, Sarma, Debanga, Phukan, Mandeep, Bagchi, Pushkal Kumar, and Barua, Sasanka Kumar
Published: 2024
Full Text: View/download PDF

36. Reinforcement Learning-based Knowledge Graph Reasoning for Explainable Fact-checking

Author: Nikopensius, Gustav, Mayank, Mohit, Phukan, Orchid Chetia, and Sharma, Rajesh
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: Fact-checking is a crucial task as it ensures the prevention of misinformation. However, manual fact-checking cannot keep up with the rate at which false information is generated and disseminated online. Automated fact-checking by machines is significantly quicker than by humans. But for better trust and transparency of these automated systems, explainability in the fact-checking process is necessary. Fact-checking often entails contrasting a factual assertion with a body of knowledge for such explanations. An effective way of representing knowledge is the Knowledge Graph (KG). There have been sufficient works proposed related to fact-checking with the usage of KG but not much focus is given to the application of reinforcement learning (RL) in such cases. To mitigate this gap, we propose an RL-based KG reasoning approach for explainable fact-checking. Extensive experiments on FB15K-277 and NELL-995 datasets reveal that reasoning over a KG is an effective way of producing human-readable explanations in the form of paths and classifications for fact claims. The RL reasoning agent computes a path that either proves or disproves a factual claim, but does not provide a verdict itself. A verdict is reached by a voting mechanism that utilizes paths produced by the agent. These paths can be presented to human readers so that they themselves can decide whether or not the provided evidence is convincing or not. This work will encourage works in this direction for incorporating RL for explainable fact-checking as it increases trustworthiness by providing a human-in-the-loop approach., Comment: Accepted to ASONAM 2023
Published: 2023

37. Seismic monitoring of 2020 Baghjan oil-well blowout incident in Assam, India

Author: Santanu Baruah, Shankho Niyogi, Abhijit Ghosh, Davide Piccinini, Gilberto Saccorotti, Alan L. Kafka, Danica Roth, Mahendra Kumar Yadava, Manoj K. Phukan, G. Narahari Sastry, Mohamed F. Abdelwahed, J. R. Kayal, Sausthov M. Bhattacharyya, Chandan Dey, Kimlina Gogoi, Timangshu Chetia, Prachurjya Borthakur, Sebastiano D’Amico, Nandita Dutta, and Sowrav Saikia
Subjects: Baghjan oil-blowout, Blowout quake, Rayleigh waves, Air-ground-coupled air waves (AGCA), Medicine, Science
Abstract: Abstract Characterization of a productive oil/gas well blowout through seismological methods is relatively uncommon. In this paper, we conduct an in-depth seismic evaluation of one of the world’s most significant onshore oil well blowout incidents, which occurred in 2020 at the Baghjan oil field in Assam, northeast India. We show that the blowout and related on-site activities generated distinct signals that can be distinguished by their spectral characteristics, temporal variation in geometric spreading, and sharp attenuation of daytime noise in comparison to the nighttime. A micro-earthquake potentially triggered by the blowout was also detected. Furthermore, we show how seismic data can be used to reasonably estimate blowout gas exit velocity and flame height. Our results demonstrate that a detailed characterization and spatiotemporal variation of blowout activity can be successfully captured through seismic monitoring, opening new opportunities for hazard mitigation and cost-effective disaster management for such catastrophic events.
Published: 2024
Full Text: View/download PDF

38. Indigenous antithymocyte globulin-equine to treat aplastic anaemia in adults: a case series from two centres in northeast India

Author: Asif Iqbal, Abhijit Phukan, and Chandana Sharma
Subjects: Aplastic anaemia, Immunosuppressive therapy, Antilymphocyte serum, Antithymocyte globulin, Cyclosporin a, Treatment outcome, Diseases of the blood and blood-forming organs, RC633-647.5
Abstract: Background: Immunosuppressive therapy is the standard management of adults with aplastic anaemia. Antithymocyte globulin is used as first-line treatment of patients not eligible for bone marrow transplantation. This being a rare disease, available evidence in India is scarce. This study aimed to present experience in treating adult aplastic anaemia patients by immunosuppressive therapy using antithymocyte globulin-equine (Thymogam) in two tertiary care centres of northeast India. Methods: This case series was conducted at the Health city hospital, Guwahati, and Excel Care Hospital, Guwahati from 2018 to 2020. Eighteen adult aplastic anaemia patients who were treated by immunosuppressive therapy with antithymocyte globulin-equine (Thymogam) and followed up for two years were included. Treatment response and relapse are described. Results: All the 18 patients, (14 severe, four very severe) were uniformly treated with immunosuppressive therapy (Thymogam 40 mg/kg/d for four days with oral Cyclosporine from Day-1). Cyclosporin A was used as a concomitant drug in 94.44 % of the patients. At two years of follow up, 66.7 % showed a response and the mortality rate was 11.1 %. Conclusion: The results of this case series substantiate the effectiveness of immunosuppressive therapy with a low-cost preparation of horse antithymocyte globulin (Thymogam) along with cyclosporin A in the management of aplastic anaemia patients not suitable for bone marrow transplantation.
Published: 2024
Full Text: View/download PDF

39. AN ENCRYPTION ALGORITHM EMPLOYING GRAPHS

Author: Bipanchy Buzarbarua, Parismita Phukan, Mridusmita Das, and Bikash Barman
Subjects: cryptography, decryption, encryption, star graph, Mathematics, QA1-939
Abstract: With the advancement of technology, maintaining secrecy is a crucial concern that requires a variety of skills. A scientific method for protecting communication against unauthenticated access is cryptography. In cryptography, there are several encryption techniques for data security. It has been suggested that new nonstandard encryption techniques are needed to shield communication from conventional threats. This work presents a method that uses graphs together with some algebraic features to provide some new encryption techniques for safe message transfer. The transmission of secret communications will be safer because of the suggested encryption techniques.
Published: 2024
Full Text: View/download PDF

40. Generation of entropy for MHD flow of Casson fluid past a vertical cone with Dufour effect

Author: Parismita Phukan, Hiren Deka, Puja Haloi, and Gopal Chandra Hazarika
Subjects: entropy, mhd, fdm, casson, dufour, Mechanical engineering and machinery, TJ1-1570
Abstract: The purpose of this study is to examine the entropy generation for a Magnetohydrodynamic flow of a Casson fluid subject to a vertical cone. Here the impact of reaction by chemical and diffusion-thermo is scrutinized. Physical aspects of radiative flux transverse to the surface are deliberated. The governing non-linear PDEs and the expression for entropy generation are non-dimensionalized with the help of dimensionless quantities. Finite difference technique is implemented to get numerical and graphical results for the non-linear system. Bejan number for the heat transfer is also examined. The results obtained shows that entropy generation and Bejan number are strongly influence by the embedded flow parameters.
Published: 2024
Full Text: View/download PDF

41. Numerical Study of Convective Flow of Casson Fluid Through an Infinite Vertical Plate with Induced Magnetic Field

Author: Hiren Deka and Parismita Phukan
Subjects: mhd, casson, induced magnetic field, fdm, Physics, QC1-999
Abstract: The present objective is to numerically analyze the induced magnetic field (IMF) effect of an unsteady MHD flow of Casson fluid through two infinite vertical plates. The effect of radiative heat has been scrutinized. Governing non-dimensional PDEs of the flow are discretized by the finite difference method to some algebraic system of equations, which is then numerically solved concerning the boundary conditions. The effects of the radiations, magnetic Prandtl number, Prandtl number, Hartmann number, and Casson parameter on temperature profile, velocity profile, and induced magnetic field have been depicted through graphs. The radiative effect and Prandtl number have considerable influence on the surface drag force and also on the rate of heat transfer.
Published: 2024
Full Text: View/download PDF

42. Trauma lurking in the shadows: A Reddit case study of mental health issues in online posts about Childhood Sexual Abuse

Author: Phukan, Orchid Chetia, Sharma, Rajesh, and Buduru, Arun Balaji
Subjects: Computer Science - Computers and Society
Abstract: Childhood Sexual Abuse (CSA) is a menace to society and has long-lasting effects on the mental health of the survivors. From time to time CSA survivors are haunted by various mental health issues in their lifetime. Proper care and attention towards CSA survivors facing mental health issues can drastically improve the mental health conditions of CSA survivors. Previous works leveraging online social media (OSM) data for understanding mental health issues haven't focused on mental health issues in individuals with CSA background. Our work fills this gap by studying Reddit posts related to CSA to understand their mental health issues. Mental health issues such as depression, anxiety, and Post-Traumatic Stress Disorder (PTSD) are most commonly observed in posts with CSA background. Observable differences exist between posts related to mental health issues with and without CSA background. Keeping this difference in mind, for identifying mental health issues in posts with CSA exposure we develop a two-stage framework. The first stage involves classifying posts with and without CSA background and the second stage involves recognizing mental health issues in posts that are classified as belonging to CSA background. The top model in the first stage is able to achieve accuracy and f1-score (macro) of 96.26% and 96.24%. and in the second stage, the top model reports hamming score of 67.09%. Content Warning: Reader discretion is recommended as our study tackles topics such as child sexual abuse, molestation, etc.
Published: 2023

43. Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Author: Kashyap, Gautam Siddharth, Brownlee, Alexander E. I., Phukan, Orchid Chetia, Malik, Karan, and Wazir, Samar
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: The well-known Vehicle Routing Problem with Time Windows (VRPTW) aims to reduce the cost of moving goods between several destinations while accommodating constraints like set time windows for certain locations and vehicle capacity. Applications of the VRPTW problem in the real world include Supply Chain Management (SCM) and logistic dispatching, both of which are crucial to the economy and are expanding quickly as work habits change. Therefore, to solve the VRPTW problem, metaheuristic algorithms i.e. Particle Swarm Optimization (PSO) have been found to work effectively, however, they can experience premature convergence. To lower the risk of PSO's premature convergence, the authors have solved VRPTW in this paper utilising a novel form of the PSO methodology that uses the Roulette Wheel Method (RWPSO). Computing experiments using the Solomon VRPTW benchmark datasets on the RWPSO demonstrate that RWPSO is competitive with other state-of-the-art algorithms from the literature. Also, comparisons with two cutting-edge algorithms from the literature show how competitive the suggested algorithm is.
Published: 2023

44. Seismic monitoring of 2020 Baghjan oil-well blowout incident in Assam, India

Author: Baruah, Santanu, Niyogi, Shankho, Ghosh, Abhijit, Piccinini, Davide, Saccorotti, Gilberto, Kafka, Alan L., Roth, Danica, Yadava, Mahendra Kumar, Phukan, Manoj K., Sastry, G. Narahari, Abdelwahed, Mohamed F., Kayal, J. R., Bhattacharyya, Sausthov M., Dey, Chandan, Gogoi, Kimlina, Chetia, Timangshu, Borthakur, Prachurjya, D’Amico, Sebastiano, Dutta, Nandita, and Saikia, Sowrav
Published: 2024
Full Text: View/download PDF

45. Advancements in nanocomposite hydrogels: a comprehensive review of biomedical applications

Author: Baishya, Gargee, Parasar, Bandita, Limboo, Manisha, Kumar, Rupesh, Dutta, Anindita, Hussain, Anowar, Phukan, Mayur Mausoom, and Saikia, Devabrata
Published: 2024
Full Text: View/download PDF

46. Molecular identification and phylogenetic relationship of fishes belonging to the Family Danionidae from Brahmaputra Basin, Assam, Northeast India

Author: Barman, Manabjyoti, Bhushan, Shashi, Phukan, Bipul, Kumar, Annam Pavan, Jaiswar, Ashok Kumar, Talukdar, Avinash, Kalita, Rinku, and S., Silpa
Published: 2024
Full Text: View/download PDF

47. Geochemical and petrological studies of high sulfur coal and overburden from Makum coalfield (Northeast India) towards understanding and mitigation of acid mine drainage

Author: Mahanta, Angana, Sarmah, Debashis, Bhuyan, Nilotpol, Saikia, Monikankana, Phukan, Sarat, Subramanyam, K. S. V., Singh, Ajit, Saikia, Prasenjit, and Saikia, Binoy K.
Published: 2024
Full Text: View/download PDF

48. A Review on the Fate of Microplastics: Their Degradation and Advanced Analytical Characterization

Author: Bandaru, Shamili, Ravipati, Manaswini, Busi, Kumar Babu, Phukan, Plabana, Bag, Soumabha, Chandu, Basavaiah, Dalapati, Goutam Kumar, Biring, Sajal, and Chakrabortty, Sabyasachi
Published: 2024
Full Text: View/download PDF

49. Modelling and Optimization of “n–i–p” Structured CdS/MASnI3/CdTe Solar Cell with SCAPS-1D for Higher Efficiency

Author: Borah, Chandra Kamal, Borah, Lakhi Nath, Hazarika, Sudipta, and Phukan, Arindam
Published: 2024
Full Text: View/download PDF

50. A comparative analysis of basic and enhanced hole structures in photonic crystal fibers

Author: Talukdar, P. and Phukan, D.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

5,137 results on '"Phukan A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources