Author: "Garg, Saurabh" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Garg, Saurabh"' showing total 1,046 results

Start Over Author "Garg, Saurabh"

1,046 results on '"Garg, Saurabh"'

1. The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

Author: Jeong, Daniel P., Mani, Pranav, Garg, Saurabh, Lipton, Zachary C., and Oberst, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper, we compare ten public "medical" LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting and supervised fine-tuning regimes for medical question-answering (QA). For instance, across all tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 22.7% of cases, reach a (statistical) tie in 36.8% of cases, and are significantly worse than their base models in the remaining 40.5% of cases. Our conclusions are based on (i) comparing each medical model head-to-head, directly against the corresponding base model; (ii) optimizing the prompts for each model separately in zero-/few-shot prompting; and (iii) accounting for statistical uncertainty in comparisons. While these basic practices are not consistently adopted in the literature, our ablations show that they substantially impact conclusions. Meanwhile, we find that after fine-tuning on specific QA tasks, medical LLMs can show performance improvements, but the benefits do not carry over to tasks based on clinical notes. Our findings suggest that state-of-the-art general-domain models may already exhibit strong medical knowledge and reasoning capabilities, and offer recommendations to strengthen the conclusions of future studies., Comment: Extended version of EMNLP 2024 paper arXiv:2411.04118. Includes additional results on clinical note QA tasks and supervised fine-tuning evaluations
Published: 2024

2. Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

Author: Jeong, Daniel P., Garg, Saurabh, Lipton, Zachary C., and Oberst, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper, we compare seven public "medical" LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting regime for medical question-answering (QA) tasks. For instance, across the tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 12.1% of cases, reach a (statistical) tie in 49.8% of cases, and are significantly worse than their base models in the remaining 38.2% of cases. Our conclusions are based on (i) comparing each medical model head-to-head, directly against the corresponding base model; (ii) optimizing the prompts for each model separately; and (iii) accounting for statistical uncertainty in comparisons. While these basic practices are not consistently adopted in the literature, our ablations show that they substantially impact conclusions. Our findings suggest that state-of-the-art general-domain models may already exhibit strong medical knowledge and reasoning capabilities, and offer recommendations to strengthen the conclusions of future studies., Comment: Accepted to EMNLP 2024 Main Conference as Long Paper (Oral)
Published: 2024

3. Pixtral 12B

Author: Agrawal, Pravesh, Antoniak, Szymon, Hanna, Emma Bou, Bout, Baptiste, Chaplot, Devendra, Chudnovsky, Jessica, Costa, Diogo, De Monicault, Baudouin, Garg, Saurabh, Gervet, Theophile, Ghosh, Soham, Héliou, Amélie, Jacob, Paul, Jiang, Albert Q., Khandelwal, Kartik, Lacroix, Timothée, Lample, Guillaume, Casas, Diego Las, Lavril, Thibaut, Scao, Teven Le, Lo, Andy, Marshall, William, Martin, Louis, Mensch, Arthur, Muddireddy, Pavankumar, Nemychnikova, Valera, Pellat, Marie, Von Platen, Patrick, Raghuraman, Nikhil, Rozière, Baptiste, Sablayrolles, Alexandre, Saulnier, Lucile, Sauvestre, Romain, Shang, Wendy, Soletskyi, Roman, Stewart, Lawrence, Stock, Pierre, Studnia, Joachim, Subramanian, Sandeep, Vaze, Sagar, Wang, Thomas, and Yang, Sophia
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models. Unlike many open-source models, Pixtral is also a cutting-edge text model for its size, and does not compromise on natural language performance to excel in multimodal tasks. Pixtral uses a new vision encoder trained from scratch, which allows it to ingest images at their natural resolution and aspect ratio. This gives users flexibility on the number of tokens used to process an image. Pixtral is also able to process any number of images in its long context window of 128K tokens. Pixtral 12B substanially outperforms other open models of similar sizes (Llama-3.2 11B \& Qwen-2-VL 7B). It also outperforms much larger open models like Llama-3.2 90B while being 7x smaller. We further contribute an open-source benchmark, MM-MT-Bench, for evaluating vision-language models in practical scenarios, and provide detailed analysis and code for standardized evaluation protocols for multimodal LLMs. Pixtral-12B is released under Apache 2.0 license.
Published: 2024

4. Wear your future: Paragon footwear's incredible journey into Indian Families

Author: Garg, Saurabh and Tyagi, Vikas Kumar
Published: 2020
Full Text: View/download PDF

5. RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Author: Setlur, Amrith, Garg, Saurabh, Geng, Xinyang, Garg, Naman, Smith, Virginia, and Kumar, Aviral
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this question for math reasoning via an empirical study, followed by building a conceptual understanding of our observations. First, we find that while the typical approach of finetuning a model on synthetic correct or positive problem-solution pairs generated by capable models offers modest performance gains, sampling more correct solutions from the finetuned learner itself followed by subsequent fine-tuning on this self-generated data $\textbf{doubles}$ the efficiency of the same synthetic problems. At the same time, training on model-generated positives can amplify various spurious correlations, resulting in flat or even inverse scaling trends as the amount of data increases. Surprisingly, we find that several of these issues can be addressed if we also utilize negative responses, i.e., model-generated responses that are deemed incorrect by a final answer verifier. Crucially, these negatives must be constructed such that the training can appropriately recover the utility or advantage of each intermediate step in the negative response. With this per-step scheme, we are able to attain consistent gains over only positive data, attaining performance similar to amplifying the amount of synthetic data by $\mathbf{8 \times}$. We show that training on per-step negatives can help to unlearn spurious correlations in the positive data, and is equivalent to advantage-weighted reinforcement learning (RL), implying that it inherits robustness benefits of RL over imitating positive data alone.
Published: 2024

6. DataComp-LM: In search of the next generation of training sets for language models

Author: Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, and Shankar, Vaishaal
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation., Comment: Project page: https://www.datacomp.ai/dclm/
Published: 2024

7. Post-Hoc Reversal: Are We Selecting Models Prematurely?

Author: Ranjan, Rishabh, Garg, Saurabh, Raman, Mrigank, Guestrin, Carlos, and Lipton, Zachary
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical study. In particular, we demonstrate a phenomenon that we call post-hoc reversal, where performance trends are reversed after applying post-hoc transforms. This phenomenon is especially prominent in high-noise settings. For example, while base models overfit badly early in training, both ensembling and SWA favor base models trained for more epochs. Post-hoc reversal can also prevent the appearance of double descent and mitigate mismatches between test loss and test error seen in base models. Preliminary analyses suggest that these transforms induce reversal by suppressing the influence of mislabeled examples, exploiting differences in their learning dynamics from those of clean examples. Based on our findings, we propose post-hoc selection, a simple technique whereby post-hoc metrics inform model development decisions such as early stopping, checkpointing, and broader hyperparameter choices. Our experiments span real-world vision, language, tabular and graph datasets. On an LLM instruction tuning dataset, post-hoc selection results in >1.5x MMLU improvement compared to naive selection., Comment: accepted at NeurIPS 2024; v2 adds an intuitions section
Published: 2024

8. Continuous monitoring of outdoor natural gamma absorbed dose rate in air: a long-term study in Kolkata, West Bengal, India

Author: Mitra, Pratip, Srivastava, Saurabh, Reddy, G. Priyanka, Garg, Saurabh, and Kumar, A. Vinod
Published: 2024
Full Text: View/download PDF

9. Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift

Author: Garg, Saurabh, Setlur, Amrith, Lipton, Zachary Chase, Balakrishnan, Sivaraman, Smith, Virginia, and Raghunathan, Aditi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains unexplored. In this paper, we undertake a systematic empirical investigation of this combination, finding that (i) in domain adaptation settings, self-training and contrastive learning offer significant complementary gains; and (ii) in semi-supervised learning settings, surprisingly, the benefits are not synergistic. Across eight distribution shift datasets (e.g., BREEDs, WILDS), we demonstrate that the combined method obtains 3--8% higher accuracy than either approach independently. We then theoretically analyze these techniques in a simplified model of distribution shift, demonstrating scenarios under which the features produced by contrastive learning can yield a good initialization for self-training to further amplify gains and achieve optimal performance, even when either method alone would fail., Comment: NeurIPS 2023
Published: 2023

10. Effect of integrated nutrient management on growth and establishment of banana cv Rasthali

Author: Kumar, Vivek, Tanwar, Babu Singh, Jat, Hari Ram, and Garg, Saurabh
Published: 2016

11. Deep Learning for Plant Identification and Disease Classification from Leaf Images: Multi-prediction Approaches

Author: Yao, Jianping, Tran, Son N., Garg, Saurabh, and Sawyer, Samantha
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning plays an important role in modern agriculture, especially in plant pathology using leaf images where convolutional neural networks (CNN) are attracting a lot of attention. While numerous reviews have explored the applications of deep learning within this research domain, there remains a notable absence of an empirical study to offer insightful comparisons due to the employment of varied datasets in the evaluation. Furthermore, a majority of these approaches tend to address the problem as a singular prediction task, overlooking the multifaceted nature of predicting various aspects of plant species and disease types. Lastly, there is an evident need for a more profound consideration of the semantic relationships that underlie plant species and disease types. In this paper, we start our study by surveying current deep learning approaches for plant identification and disease classification. We categorise the approaches into multi-model, multi-label, multi-output, and multi-task, in which different backbone CNNs can be employed. Furthermore, based on the survey of existing approaches in plant pathology and the study of available approaches in machine learning, we propose a new model named Generalised Stacking Multi-output CNN (GSMo-CNN). To investigate the effectiveness of different backbone CNNs and learning approaches, we conduct an intensive experiment on three benchmark datasets Plant Village, Plant Leaves, and PlantDoc. The experimental results demonstrate that InceptionV3 can be a good choice for a backbone CNN as its performance is better than AlexNet, VGG16, ResNet101, EfficientNet, MobileNet, and a custom CNN developed by us. Interestingly, empirical results support the hypothesis that using a single model can be comparable or better than using two models. Finally, we show that the proposed GSMo-CNN achieves state-of-the-art performance on three benchmark datasets., Comment: Jianping and Son are joint first authors (equal contribution)
Published: 2023

12. TiC-CLIP: Continual Training of CLIP Models

Author: Garg, Saurabh, Farajtabar, Mehrdad, Pouransari, Hadi, Vemulapalli, Raviteja, Mehta, Sachin, Tuzel, Oncel, Shankar, Vaishaal, and Faghri, Fartash
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014-2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021-2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch. Code is available at https://github.com/apple/ml-tic-clip., Comment: ICLR 2024
Published: 2023

13. Machine Learning for Leaf Disease Classification: Data, Techniques and Applications

Author: Yao, Jianping, Tran, Son N., Sawyer, Samantha, and Garg, Saurabh
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The growing demand for sustainable development brings a series of information technologies to help agriculture production. Especially, the emergence of machine learning applications, a branch of artificial intelligence, has shown multiple breakthroughs which can enhance and revolutionize plant pathology approaches. In recent years, machine learning has been adopted for leaf disease classification in both academic research and industrial applications. Therefore, it is enormously beneficial for researchers, engineers, managers, and entrepreneurs to have a comprehensive view about the recent development of machine learning technologies and applications for leaf disease detection. This study will provide a survey in different aspects of the topic including data, techniques, and applications. The paper will start with publicly available datasets. After that, we summarize common machine learning techniques, including traditional (shallow) learning, deep learning, and augmented learning. Finally, we discuss related applications. This paper would provide useful resources for future study and application of machine learning for smart agriculture in general and leaf disease classification in particular.
Published: 2023
Full Text: View/download PDF

14. Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms

Author: Baby, Dheeraj, Garg, Saurabh, Yen, Tzu-Ching, Balakrishnan, Sivaraman, Lipton, Zachary Chase, and Wang, Yu-Xiang
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: This paper focuses on supervised and unsupervised online label shift, where the class marginals $Q(y)$ varies but the class-conditionals $Q(x|y)$ remain invariant. In the unsupervised setting, our goal is to adapt a learner, trained on some offline labeled data, to changing label distributions given unlabeled online data. In the supervised setting, we must both learn a classifier and adapt to the dynamically evolving class marginals given only labeled online data. We develop novel algorithms that reduce the adaptation problem to online regression and guarantee optimal dynamic regret without any prior knowledge of the extent of drift in the label distribution. Our solution is based on bootstrapping the estimates of \emph{online regression oracles} that track the drifting proportions. Experiments across numerous simulated and real-world online label shift scenarios demonstrate the superior performance of our proposed approaches, often achieving 1-3\% improvement in accuracy while being sample and computationally efficient. Code is publicly available at https://github.com/acmi-lab/OnlineLabelShift., Comment: First three authors contributed equally
Published: 2023

15. (Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy

Author: Rosenfeld, Elan and Garg, Saurabh
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We derive an (almost) guaranteed upper bound on the error of deep neural networks under distribution shift using unlabeled test data. Prior methods either give bounds that are vacuous in practice or give estimates that are accurate on average but heavily underestimate error for a sizeable fraction of shifts. In particular, the latter only give guarantees based on complex continuous measures such as test calibration -- which cannot be identified without labels -- and are therefore unreliable. Instead, our bound requires a simple, intuitive condition which is well justified by prior empirical works and holds in practice effectively 100% of the time. The bound is inspired by $\mathcal{H}\Delta\mathcal{H}$-divergence but is easier to evaluate and substantially tighter, consistently providing non-vacuous guarantees. Estimating the bound requires optimizing one multiclass classifier to disagree with another, for which some prior works have used sub-optimal proxy losses; we devise a "disagreement loss" which is theoretically justified and performs better in practice. We expect this loss can serve as a drop-in replacement for future methods which require maximizing multiclass disagreement. Across a wide range of benchmarks, our method gives valid error bounds while achieving average accuracy comparable to competitive estimation baselines. Code is publicly available at https://github.com/erosenfeld/disagree_discrep .
Published: 2023

16. Fog Device-as-a-Service (FDaaS): A Framework for Service Deployment in Public Fog Environments

Author: Battula, Sudheer Kumar, Garg, Saurabh, Montgomery, James, and Naha, Ranesh
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Meeting the requirements of future services with time sensitivity and handling sudden load spikes of the services in Fog computing environments are challenging tasks due to the lack of publicly available Fog nodes and their characteristics. Researchers have assumed that the traditional autoscaling techniques, with lightweight virtualisation technology (containers), can be used to provide autoscaling features in Fog computing environments, few researchers have built the platform by exploiting the default autoscaling techniques of the containerisation orchestration tools or systems. However, the adoption of these techniques alone, in a publicly available Fog infrastructure, does not guarantee Quality of Service (QoS) due to the heterogeneity of Fog devices and their characteristics, such as frequent resource changes and high mobility. To tackle this challenge, in this work we developed a Fog as a Service (FaaS) framework that can create, configure and manage the containers which are running on the Fog devices to deploy services. This work presents the key techniques and algorithms which are responsible for handling sudden load spikes of the services to meet the QoS of the application. This work provides an evaluation by comparing it with existing techniques under real scenarios. The experiment results show that our proposed approach maximises the satisfied service requests by an average of 1.9 times in different scenarios., Comment: 10 Pages, 13 Figures
Published: 2023

17. RLSbench: Domain Adaptation Under Relaxed Label Shift

Author: Garg, Saurabh, Erickson, Nick, Sharpnack, James, Smola, Alex, Balakrishnan, Sivaraman, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies in evaluation standards, datasets, and baselines make it difficult to gauge the current best practices. In this paper, we introduce RLSbench, a large-scale benchmark for relaxed label shift, consisting of $>$500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. Unlike existing benchmarks, which primarily focus on shifts in class-conditional $p(x|y)$, our benchmark also focuses on label marginal shifts. First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with target label distribution estimate. The meta-algorithm improves existing domain adaptation heuristics under large label proportion shifts, often by 2--10\% accuracy points, while conferring minimal effect ($<$0.5\%) when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmi-lab/RLSbench., Comment: Accepted at ICML 2023. Paper website: https://sites.google.com/view/rlsbench/
Published: 2023

18. CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

Author: Novack, Zachary, McAuley, Julian, Lipton, Zachary C., and Garg, Saurabh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification through their ability generate embeddings for each class based on their (natural language) names. Prior work has focused on improving the accuracy of these models through prompt engineering or by incorporating a small amount of labeled downstream data (via finetuning). However, there has been little focus on improving the richness of the class names themselves, which can pose issues when class labels are coarsely-defined and are uninformative. We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies. CHiLS proceeds in three steps: (i) for each class, produce a set of subclasses, using either existing label hierarchies or by querying GPT-3; (ii) perform the standard zero-shot CLIP procedure as though these subclasses were the labels of interest; (iii) map the predicted subclass back to its parent to produce the final prediction. Across numerous datasets with underlying hierarchical structure, CHiLS leads to improved accuracy in situations both with and without ground-truth hierarchical information. CHiLS is simple to implement within existing zero-shot pipelines and requires no additional training cost. Code is available at: https://github.com/acmi-lab/CHILS., Comment: Accepted at ICML 2023
Published: 2023

19. Rapid-Motion-Track: Markerless Tracking of Fast Human Motion with Deeper Learning

Author: Li, Renjie, Lao, Chun Yu, George, Rebecca St., Lawler, Katherine, Garg, Saurabh, Tran, Son N., Bai, Quan, and Alty, Jane
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Objective The coordination of human movement directly reflects function of the central nervous system. Small deficits in movement are often the first sign of an underlying neurological problem. The objective of this research is to develop a new end-to-end, deep learning-based system, Rapid-Motion-Track (RMT) that can track the fastest human movement accurately when webcams or laptop cameras are used. Materials and Methods We applied RMT to finger tapping, a well-validated test of motor control that is one of the most challenging human motions to track with computer vision due to the small keypoints of digits and the high velocities that are generated. We recorded 160 finger tapping assessments simultaneously with a standard 2D laptop camera (30 frames/sec) and a high-speed wearable sensor-based 3D motion tracking system (250 frames/sec). RMT and a range of DLC models were applied to the video data with tapping frequencies up to 8Hz to extract movement features. Results The movement features (e.g. speed, rhythm, variance) identified with the new RMT system exhibited very high concurrent validity with the gold-standard measurements (97.3\% of RMT measures were within +/-0.5Hz of the Optotrak measures), and outperformed DLC and other advanced computer vision tools (around 88.2\% of DLC measures were within +/-0.5Hz of the Optotrak measures). RMT also accurately tracked a range of other rapid human movements such as foot tapping, head turning and sit-to -stand movements. Conclusion: With the ubiquity of video technology in smart devices, the RMT method holds potential to transform access and accuracy of human movement assessment.
Published: 2023

20. Disentangling the Mechanisms Behind Implicit Regularization in SGD

Author: Novack, Zachary, Kaur, Simran, Marwah, Tanya, Garg, Saurabh, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning
Abstract: A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. Additionally, we characterize how the quantities that SGD has been claimed to (implicitly) regularize change over the course of training. By using micro-batches, i.e. disjoint smaller subsets of each mini-batch, we empirically show that explicitly penalizing the gradient norm or the Fisher Information Matrix trace, averaged over micro-batches, in the large-batch regime recovers small-batch SGD generalization, whereas Jacobian-based regularizations fail to do so. This generalization performance is shown to often be correlated with how well the regularized model's gradient norms resemble those of small-batch SGD. We additionally show that this behavior breaks down as the micro-batch size approaches the batch size. Finally, we note that in this line of inquiry, positive experimental findings on CIFAR10 are often reversed on other datasets like CIFAR100, highlighting the need to test hypotheses on a wider collection of datasets., Comment: Accepted as Spotlight at the NeurIPS 2022 Workshop for Higher Order Optimization in Machine Learning
Published: 2022

21. Characterizing Datapoints via Second-Split Forgetting

Author: Maini, Pratyush, Garg, Saurabh, Lipton, Zachary C., and Kolter, J. Zico
Subjects: Computer Science - Machine Learning
Abstract: Researchers investigating example hardness have increasingly focused on the dynamics by which neural networks learn and forget examples throughout training. Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out. However, these metrics do not distinguish among examples that are hard for distinct reasons, such as membership in a rare subpopulation, being mislabeled, or belonging to a complex subpopulation. In this paper, we propose $second$-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten as the network is fine-tuned on a randomly held out partition of the data. Across multiple benchmark datasets and modalities, we demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. By contrast, metrics only considering the first split learning dynamics struggle to differentiate the two. At large learning rates, SSFT tends to be robust across architectures, optimizers, and random seeds. From a practical standpoint, the SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes. Through theoretical analysis addressing overparameterized linear models, we provide insights into how the observed phenomena may arise. Code for reproducing our experiments can be found here: https://github.com/pratyushmaini/ssft, Comment: Accepted at NeurIPS 2022
Published: 2022

22. Downstream Datasets Make Surprisingly Good Pretraining Corpora

Author: Krishna, Kundan, Garg, Saurabh, Bigham, Jeffrey P., and Lipton, Zachary C.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent these gains are attributable to the massive background corpora employed for pretraining versus to the pretraining objectives themselves. This paper introduces a large-scale study of self-pretraining, where the same (downstream) training data is used for both pretraining and finetuning. In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream classification datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus (despite using around $10\times$--$500\times$ less data), outperforming the latter on $7$ and $5$ datasets, respectively. Surprisingly, these task-specific pretrained models often perform well on other tasks, including the GLUE benchmark. Besides classification tasks, self-pretraining also provides benefits on structured output prediction tasks such as span based question answering and commonsense inference, often providing more than $50\%$ of the performance boosts provided by pretraining on the BookWiki corpus. Our results hint that in many scenarios, performance gains attributable to pretraining are driven primarily by the pretraining objective itself and are not always attributable to the use of external pretraining data in massive amounts. These findings are especially relevant in light of concerns about intellectual property and offensive content in web-scale pretraining data., Comment: ACL2023 Camera Ready
Published: 2022

23. A Blockchain-based Decentralised and Dynamic Authorisation Scheme for the Internet of Things

Author: Hameed, Khizar, Raza, Ali, Garg, Saurabh, and Amin, Muhammad Bilal
Subjects: Computer Science - Cryptography and Security
Abstract: An authorisation has been recognised as an important security measure for preventing unauthorised access to critical resources, such as devices and data, within the Internet of Things (IoT) networks. Existing authorisation methods for the IoT network are based on traditional access control models, which have several drawbacks, including architecture centralisation, policy tampering, access rights validation, malicious third-party policy assignment and control, and network-related overheads. The increasing trend of integrating Blockchain technology with IoT networks demonstrates its importance and potential to address the shortcomings of traditional IoT network authorisation mechanisms. This paper proposes a decentralised, secure, dynamic, and flexible authorisation scheme for IoT networks based on attribute-based access control (ABAC) fine-grained policies stored on a distributed immutable ledger. We design a Blockchain-based ABAC policy management framework divided into Attribute Management Authority (AMA) and Policy Management Authority (PMA) frameworks that use smart contract features to initialise, store, and manage attributes and policies on the Blockchain. To achieve flexibility and dynamicity in the authorisation process, we capture and utilise the environmental-related attributes in conjunction with the subject and object attributes of the ABAC model to define the policies. Furthermore, we designed the Blockchain-based Access Management Framework (AMF) to manage user requests to access IoT devices while maintaining the privacy and auditability of user requests and assigned policies. We implemented a prototype of our proposed scheme and executed it on the local Ethereum Blockchain. Finally, we demonstrated the applicability and flexibility of our proposed scheme for an IoT-based smart home scenario, taking into account deployment, execution and financial costs.
Published: 2022

24. Domain Adaptation under Open Set Label Shift

Author: Garg, Saurabh, Balakrishnan, Sivaraman, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning
Abstract: We introduce the problem of domain adaptation under Open Set Label Shift (OSLS) where the label distribution can change arbitrarily and a new class may arrive during deployment, but the class-conditional distributions p(x|y) are domain-invariant. OSLS subsumes domain adaptation under label shift and Positive-Unlabeled (PU) learning. The learner's goals here are two-fold: (a) estimate the target label distribution, including the novel class; and (b) learn a target classifier. First, we establish necessary and sufficient conditions for identifying these quantities. Second, motivated by advances in label shift and PU learning, we propose practical methods for both tasks that leverage black-box predictors. Unlike typical Open Set Domain Adaptation (OSDA) problems, which tend to be ill-posed and amenable only to heuristics, OSLS offers a well-posed problem amenable to more principled machinery. Experiments across numerous semi-synthetic benchmarks on vision, language, and medical datasets demonstrate that our methods consistently outperform OSDA baselines, achieving 10--25% improvements in target domain accuracy. Finally, we analyze the proposed methods, establishing finite-sample convergence to the true label marginal and convergence to optimal classifier for linear models in a Gaussian setup. Code is available at https://github.com/acmi-lab/Open-Set-Label-Shift., Comment: Accepted at NeurIPS 2022
Published: 2022

25. Unsupervised Learning under Latent Label Shift

Author: Roberts, Manley, Mani, Pranav, Garg, Saurabh, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve upon competitive unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work., Comment: NeurIPS 2022. Manley Roberts and Pranav Mani contributed equally to this work
Published: 2022

26. A Comprehensive Review on Deep Supervision: Theories and Applications

Author: Li, Renjie, Wang, Xinyi, Huang, Guan, Yang, Wenli, Zhang, Kaining, Gu, Xiaotong, Tran, Son N., Garg, Saurabh, Alty, Jane, and Bai, Quan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network. This technique has been increasingly applied in deep neural network learning systems for various computer vision applications recently. There is a consensus that deep supervision helps improve neural network performance by alleviating the gradient vanishing problem, as one of the many strengths of deep supervision. Besides, in different computer vision applications, deep supervision can be applied in different ways. How to make the most use of deep supervision to improve network performance in different applications has not been thoroughly investigated. In this paper, we provide a comprehensive in-depth review of deep supervision in both theories and applications. We propose a new classification of different deep supervision networks, and discuss advantages and limitations of current deep supervision networks in computer vision applications.
Published: 2022

27. Revealing Impact Factors on Student Engagement: Learning Analytics Adoption in Online and Blended Courses in Higher Education

Author: Fan, Si, Chen, Lihua, Nair, Manoj, Garg, Saurabh, Yeom, Soonja, Kregor, Gerry, Yang, Yu, and Wang, Yanjun
Abstract: This study aimed to identify factors influencing student engagement in online and blended courses at one Australian regional university. It applied a data science approach to learning and teaching data gathered from the learning management system used at this university. Data were collected and analysed from 23 subjects, spanning over 5500 student enrolments and 406 lecturer and tutor roles, over a five-year period. Based on a theoretical framework adapted from Community of Inquiry (CoI) framework by Garrison et al. (2000), the data were segregated into three groups for analysis: Student Engagement, Course Content and Teacher Input. The data analysis revealed a positive correlation between Student Engagement and Teacher Input, and interestingly, a negative correlation between Student Engagement and Course Content when a certain threshold was exceeded. The findings of the study offer useful suggestions for future course design, and pedagogical approaches teachers can adopt to foster student engagement.
Published: 2021

28. Deconstructing Distributions: A Pointwise Framework of Learning

Author: Kaplun, Gal, Ghosh, Nikhil, Garg, Saurabh, Barak, Boaz, and Nakkiran, Preetum
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$. Specifically, we study a point's $\textit{profile}$: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point. We find that profiles can yield new insights into the structure of both models and data -- in and out-of-distribution. For example, we empirically show that real data distributions consist of points with qualitatively different profiles. On one hand, there are "compatible" points with strong correlation between the pointwise and average performance. On the other hand, there are points with weak and even $\textit{negative}$ correlation: cases where improving overall model accuracy actually $\textit{hurts}$ performance on these inputs. We prove that these experimental observations are inconsistent with the predictions of several simplified models of learning proposed in prior work. As an application, we use profiles to construct a dataset we call CIFAR-10-NEG: a subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-NEG is $\textit{negatively correlated}$ with accuracy on CIFAR-10 test. This illustrates, for the first time, an OOD dataset that completely inverts "accuracy-on-the-line" (Miller, Taori, Raghunathan, Sagawa, Koh, Shankar, Liang, Carmon, and Schmidt 2021), Comment: GK and NG contributed equally. v2: Added Figures 4, 5
Published: 2022

29. Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Author: Garg, Saurabh, Balakrishnan, Sivaraman, Lipton, Zachary C., Neyshabur, Behnam, and Sedghi, Hanie
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceeds that threshold. ATC outperforms previous methods across several model architectures, types of distribution shifts (e.g., due to synthetic corruptions, dataset reproduction, or novel subpopulations), and datasets (Wilds, ImageNet, Breeds, CIFAR, and MNIST). In our experiments, ATC estimates target performance $2$-$4\times$ more accurately than prior methods. We also explore the theoretical foundations of the problem, proving that, in general, identifying the accuracy is just as hard as identifying the optimal predictor and thus, the efficacy of any method rests upon (perhaps unstated) assumptions on the nature of the shift. Finally, analyzing our method on some toy distributions, we provide insights concerning when it works. Code is available at https://github.com/saurabhgarg1996/ATC_code/., Comment: Accepted at ICLR 2022
Published: 2022

30. Parallel Multi-Scale Networks with Deep Supervision for Hand Keypoint Detection

Author: Li, Renjie, Tran, Son, Garg, Saurabh, Lawler, Katherine, Alty, Jane, and Bai, Quan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Keypoint detection plays an important role in a wide range of applications. However, predicting keypoints of small objects such as human hands is a challenging problem. Recent works fuse feature maps of deep Convolutional Neural Networks (CNNs), either via multi-level feature integration or multi-resolution aggregation. Despite achieving some success, the feature fusion approaches increase the complexity and the opacity of CNNs. To address this issue, we propose a novel CNN model named Multi-Scale Deep Supervision Network (P-MSDSNet) that learns feature maps at different scales with deep supervisions to produce attention maps for adaptive feature propagation from layers to layers. P-MSDSNet has a multi-stage architecture which makes it scalable while its deep supervision with spatial attention improves transparency to the feature learning at each stage. We show that P-MSDSNet outperforms the state-of-the-art approaches on benchmark datasets while requiring fewer number of parameters. We also show the application of P-MSDSNet to quantify finger tapping hand movements in a neuroscience study.
Published: 2021

31. Mixture Proportion Estimation and PU Learning: A Modern Approach

Author: Garg, Saurabh, Wu, Yifan, Smola, Alex, Balakrishnan, Sivaraman, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier, Comment: Spotlight at NeurIPS 2021
Published: 2021

32. SDP: Scalable Real-time Dynamic Graph Partitioner

Author: Patwary, Md Anwarul Kaium, Garg, Saurabh, Battula, Sudheer Kumar, and Kang, Byeong
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Time-evolving large graph has received attention due to their participation in real-world applications such as social networks and PageRank calculation. It is necessary to partition a large-scale dynamic graph in a streaming manner to overcome the memory bottleneck while partitioning the computational load. Reducing network communication and balancing the load between the partitions are the criteria to achieve effective run-time performance in graph partitioning. Moreover, an optimal resource allocation is needed to utilise the resources while catering the graph streams into the partitions. A number of existing partitioning algorithms (ADP, LogGP and LEOPARD) have been proposed to address the above problem. However, these partitioning methods are incapable of scaling the resources and handling the stream of data in real-time. In this study, we propose a dynamic graph partitioning method called Scalable Dynamic Graph Partitioner (SDP) using the streaming partitioning technique. The SDP contributes a novel vertex assigning method, communication-aware balancing method, and a scaling technique to produce an efficient dynamic graph partitioner. Experiment results show that the proposed method achieves up to 90% reduction of communication cost and 60%-70% balancing the load dynamically, compared with previous algorithms. Moreover, the proposed algorithm significantly reduces the execution time during partitioning.
Published: 2021

33. Investigation of background radiation levels and environmental radioactivity around Bharati Station, Larsemann Hills in east Antarctica-an overview

Author: Bakshi, A.K., Pal, Rupali, Romal, Jis, Sahoo, B.K., Garg, Saurabh, and Sapra, B.K.
Published: 2024
Full Text: View/download PDF

34. Towards a Formal Modelling, Analysis, and Verification of a Clone Node Attack Detection Scheme in the Internet of Things

Author: Hameed, Khizar, Garg, Saurabh, Amin, Muhammad Bilal, and Kang, Byeong
Subjects: Computer Science - Cryptography and Security
Abstract: In a clone node attack, an attacker attempted to physically capture the devices to gather sensitive information to conduct various insider attacks. Several solutions for detecting clone node attacks on IoT networks have been presented in the viewpoints above. These solutions are focused on specific system designs, processes, and feature sets and act as a high-level abstraction of underlying system architectures based on a few performance requirements. However, critical features like formal analysis, modelling, and verification are frequently overlooked in existing proposed solutions aimed at verifying the correctness and robustness of systems in order to ensure that no problematic scenarios or anomalies exist. This paper presents a formal analysis, modelling, and verification of our existing proposed clone node attack detection scheme in IoT. Firstly, we modelled the architectural components of the proposed scheme using High-Level Petri Nets (HLPNs) and then mapped them using their specified functionalities. Secondly, we defined and analysed the behavioural properties of the proposed scheme using Z specification language. Furthermore, we used the Satisfiability Modulo Theories Library (SMT-Lib) and the Z3 Solver to validate and demonstrate the overall functionality of the proposed scheme. Finally, in addition to modelling and analysis, this work employs Coloured Petri Nets (CPNs), which combine Petri Nets with a high-level programming language, making them more suitable for large-scale system modelling. To perform the simulations in CPN, we used both timed and untimed models, where timed models are used to evaluate performance, and untimed models are used to validate logical validity.
Published: 2021

35. A Context-Aware Information-Based Clone Node Attack Detection Scheme in Internet of Things

Author: Hameed, Khizar, Garg, Saurabh, Amin, Muhammad Bilal, Kang, Byeong, and Khan, Abid
Subjects: Computer Science - Cryptography and Security
Abstract: The rapidly expanding nature of the Internet of Things (IoT) networks is beginning to attract interest across a range of applications, including smart homes, smart transportation, smart health, and industrial contexts. This cutting-edge technology enables individuals to track and control their integrated environment in real-time and remotely via a thousand IoT devices comprised of sensors and actuators that actively participate in sensing, processing, storing, and sharing information. Nonetheless, IoT devices are frequently deployed in hostile environments, wherein adversaries attempt to capture and breach them in order to seize control of the entire network. One such example of potentially malicious behaviour is the cloning of IoT devices, in which an attacker can physically capture the devices, obtain some sensitive information, duplicate the devices, and intelligently deploy them in desired locations to conduct various insider attacks. A device cloning attack on IoT networks is a significant security concern since it allows for selective forwarding, sink-hole, and black-hole attacks. To address this issue, this paper provides an efficient scheme for detecting clone node attacks on IoT networks that makes use of semantic information about IoT devices known as context information sensed from the deployed environment to locate them securely. We design a location proof mechanism by combining location proofs and batch verification of the extended elliptic curve digital signature technique to accelerate the verification process at selected trusted nodes. We demonstrate the security of our scheme and its resilience to secure clone node attack detection by conducting a comprehensive security analysis. The performance of our proposed scheme provides a high degree of detection accuracy with minimal detection time and significantly reduces the computation, communication and storage overhead.
Published: 2021

36. Mouth2Audio: intelligible audio synthesis from videos with distinctive vowel articulation

Author: Garg, Saurabh, Ruan, Haoyao, Hamarneh, Ghassan, Behne, Dawn M., Jongman, Allard, Sereno, Joan, and Wang, Yue
Published: 2023
Full Text: View/download PDF

37. Research allocation in mobile volunteer computing system: Taxonomy, challenges and future work

Author: Ma, Peizhe, Garg, Saurabh, and Barika, Mutaz
Published: 2024
Full Text: View/download PDF

38. A Taxonomy Study on Securing Blockchain-based Industrial Applications: An Overview, Application Perspectives, Requirements, Attacks, Countermeasures, and Open Issues

Author: Hameed, Khizar, Barika, Mutaz, Garg, Saurabh, Amin, Muhammad Bilal, and Kang, Byeong
Subjects: Computer Science - Cryptography and Security
Abstract: Blockchain technology has taken on a leading role in today's industrial applications by providing salient features and showing significant performance since its beginning. Blockchain began its journey from the concept of cryptocurrency and is now part of a range of core applications to achieve resilience and automation between various tasks. With the integration of Blockchain technology into different industrial applications, many application designs, security and privacy challenges present themselves, posing serious threats to users and their data. Although several approaches have been proposed to address the specific security and privacy needs of targeted applications with functional parameters, there is still a need for a research study on the application, security and privacy challenges, and requirements of Blockchain-based industrial applications, along with possible security threats and countermeasures. This study presents a state-of-the-art survey of Blockchain-based Industry 4.0 applications, focusing on crucial application and security and privacy requirements, as well as corresponding attacks on Blockchain systems with potential countermeasures. We also analyse and provide the classification of different security and privacy techniques used in these applications to enhance the advancement of security features. Furthermore, we highlight some open issues in industrial applications that help to design secure Blockchain-based applications as future directions.
Published: 2021

39. RATT: Leveraging Unlabeled Data to Guarantee Generalization

Author: Garg, Saurabh, Balakrishnan, Sivaraman, Kolter, J. Zico, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: To assess generalization, machine learning scientists typically either (i) bound the generalization gap and then (after training) plug in the empirical risk to obtain a bound on the true risk; or (ii) validate empirically on holdout data. However, (i) typically yields vacuous guarantees for overparameterized models. Furthermore, (ii) shrinks the training set and its guarantee erodes with each re-use of the holdout set. In this paper, we introduce a method that leverages unlabeled data to produce generalization bounds. After augmenting our (labeled) training set with randomly labeled fresh examples, we train in the standard fashion. Whenever classifiers achieve low error on clean data and high error on noisy data, our bound provides a tight upper bound on the true risk. We prove that our bound is valid for 0-1 empirical risk minimization and with linear classifiers trained by gradient descent. Our approach is especially useful in conjunction with deep learning due to the early learning phenomenon whereby networks fit true labels before noisy labels but requires one intuitive assumption. Empirically, on canonical computer vision and NLP tasks, our bound provides non-vacuous generalization guarantees that track actual performance closely. This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable and provides theoretical insights into the relationship between random label noise and generalization., Comment: ICML 2021 (Long Talk)
Published: 2021

40. Applications of Artificial Intelligence to aid detection of dementia: a narrative review on current capabilities and future directions

Author: Li, Renjie, Wang, Xinyi, Lawler, Katherine, Garg, Saurabh, Bai, Quan, and Alty, Jane
Subjects: Computer Science - Artificial Intelligence
Abstract: With populations ageing, the number of people with dementia worldwide is expected to triple to 152 million by 2050. Seventy percent of cases are due to Alzheimer's disease (AD) pathology and there is a 10-20 year 'pre-clinical' period before significant cognitive decline occurs. We urgently need, cost effective, objective methods to detect AD, and other dementias, at an early stage. Risk factor modification could prevent 40% of cases and drug trials would have greater chances of success if participants are recruited at an earlier stage. Currently, detection of dementia is largely by pen and paper cognitive tests but these are time consuming and insensitive to pre-clinical phases. Specialist brain scans and body fluid biomarkers can detect the earliest stages of dementia but are too invasive or expensive for widespread use. With the advancement of technology, Artificial Intelligence (AI) shows promising results in assisting with detection of early-stage dementia. Existing AI-aided methods and potential future research directions are reviewed and discussed., Comment: 11 pages
Published: 2021

41. Fuzzy Logic-based Robust Failure Handling Mechanism for Fog Computing

Author: Naha, Ranesh Kumar, Garg, Saurabh, Amin, Muhammad Bilal, and Ranjan, Rajiv
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Fog computing is an emerging computing paradigm which is mainly suitable for time-sensitive and real-time Internet of Things (IoT) applications. Academia and industries are focusing on the exploration of various aspects of Fog computing for market adoption. The key idea of the Fog computing paradigm is to use idle computation resources of various handheld, mobile, stationery and network devices around us, to serve the application requests in the Fog-IoT environment. The devices in the Fog environment are autonomous and not exclusively dedicated to Fog application processing. Due to that, the probability of device failure in the Fog environment is high compared with other distributed computing paradigms. Solving failure issues in Fog is crucial because successful application execution can only be ensured if failure can be handled carefully. To handle failure, there are several techniques available in the literature, such as checkpointing and task migration, each of which works well in cloud based enterprise applications that mostly deals with static or transactional data. These failure handling methods are not applicable to highly dynamic Fog environment. In contrast, this work focuses on solving the problem of managing application failure in the Fog environment by proposing a composite solution (combining fuzzy logic-based task checkpointing and task migration techniques with task replication) for failure handling and generating a robust schedule. We evaluated the proposed methods using real failure traces in terms of application execution time, delay and cost. Average delay and total processing time improved by 56% and 48% respectively, on an average for the proposed solution, compared with the existing failure handling approaches., Comment: 12 Pages,12 Figures
Published: 2021

42. Multiple Linear Regression-Based Energy-Aware Resource Allocation in the Fog Computing Environment

Author: Naha, Ranesh Kumar, Garg, Saurabh, Battula, Sudheer Kumar, Amin, Muhammad Bilal, and Georgakopoulos, Dimitrios
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Fog computing is a promising computing paradigm for time-sensitive Internet of Things (IoT) applications. It helps to process data close to the users, in order to deliver faster processing outcomes than the Cloud; it also helps to reduce network traffic. The computation environment in the Fog computing is highly dynamic and most of the Fog devices are battery powered hence the chances of application failure is high which leads to delaying the application outcome. On the other hand, if we rerun the application in other devices after the failure it will not comply with time-sensitiveness. To solve this problem, we need to run applications in an energy-efficient manner which is a challenging task due to the dynamic nature of Fog computing environment. It is required to schedule application in such a way that the application should not fail due to the unavailability of energy. In this paper, we propose a multiple linear, regression-based resource allocation mechanism to run applications in an energy-aware manner in the Fog computing environment to minimise failures due to energy constraint. Prior works lack of energy-aware application execution considering dynamism of Fog environment. Hence, we propose A multiple linear regression-based approach which can achieve such objectives. We present a sustainable energy-aware framework and algorithm which execute applications in Fog environment in an energy-aware manner. The trade-off between energy-efficient allocation and application execution time has been investigated and shown to have a minimum negative impact on the system for energy-aware allocation. We compared our proposed method with existing approaches. Our proposed approach minimises the delay and processing by 20%, and 17% compared with the existing one. Furthermore, SLA violation decrease by 57% for the proposed energy-aware allocation., Comment: 8 Pages, 9 Figures
Published: 2021
Full Text: View/download PDF

43. On Proximal Policy Optimization's Heavy-tailed Gradients

Author: Garg, Saurabh, Zhanson, Joshua, Parisotto, Emilio, Prasad, Adarsh, Kolter, J. Zico, Lipton, Zachary C., Balakrishnan, Sivaraman, Salakhutdinov, Ruslan, and Ravikumar, Pradeep
Subjects: Computer Science - Machine Learning, Computer Science - Robotics, Statistics - Machine Learning
Abstract: Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function. We demonstrate that the gradients, especially for the actor network, exhibit pronounced heavy-tailedness and that it increases as the agent's policy diverges from the behavioral policy (i.e., as the agent goes further off policy). Further examination implicates the likelihood ratios and advantages in the surrogate reward as the main sources of the observed heavy-tailedness. We then highlight issues arising due to the heavy-tailed nature of the gradients. In this light, we study the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavy-tailedness in gradients. Thus motivated, we propose incorporating GMOM, a high-dimensional robust estimator, into PPO as a substitute for three clipping tricks. Despite requiring less hyperparameter tuning, our method matches the performance of PPO (with all heuristics enabled) on a battery of MuJoCo continuous control tasks., Comment: ICML 2021
Published: 2021

44. BoCB: Performance Benchmarking by Analysing Impacts of Cloud Platforms on Consortium Blockchain

Author: Huang, Zhiqiang, Garg, Saurabh, Yang, Wenli, Lohachab, Ankur, Amin, Muhammad Bilal, Kang, Byeong-Ho, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wu, Shiqing, editor, Yang, Wenli, editor, Amin, Muhammad Bilal, editor, Kang, Byeong-Ho, editor, and Xu, Guandong, editor
Published: 2023
Full Text: View/download PDF

45. Rapid-Motion-Track: Markerless tracking of fast human motion with deep learning

Author: Li, Renjie, Lau, Chun-yu, St George, Rebecca J., Lawler, Katherine, Garg, Saurabh, Tran, Son N., Bai, Quan, and Alty, Jane
Published: 2024
Full Text: View/download PDF

46. Parallel scale de-blur net for sharpening video images for remote clinical assessment of hand movements

Author: Li, Renjie, Huang, Guan, Wang, Xinyi, Chen, Yanyu, Tran, Son N., Garg, Saurabh, St George, Rebecca J., Lawler, Katherine, Alty, Jane, and Bai, Quan
Published: 2024
Full Text: View/download PDF

47. A Unified View of Label Shift Estimation

Author: Garg, Saurabh, Wu, Yifan, Balakrishnan, Sivaraman, and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Under label shift, the label distribution p(y) might change but the class-conditional distributions p(x|y) do not. There are two dominant approaches for estimating the label marginal. BBSE, a moment-matching approach based on confusion matrices, is provably consistent and provides interpretable error bounds. However, a maximum likelihood estimation approach, which we call MLLS, dominates empirically. In this paper, we present a unified view of the two methods and the first theoretical characterization of MLLS. Our contributions include (i) consistency conditions for MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified framework, casting BBSE as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting miscalibration and estimation error. Our analysis attributes BBSE's statistical inefficiency to a loss of information due to coarse calibration. Experiments on synthetic data, MNIST, and CIFAR10 support our findings., Comment: Accepted at Neurips 2020
Published: 2020

48. Authentication, Access Control, Privacy, Threats and Trust Management Towards Securing Fog Computing Environments: A Review

Author: Patwary, Abdullah Al-Noman, Fu, Anmin, Naha, Ranesh Kumar, Battula, Sudheer Kumar, Garg, Saurabh, Patwary, Md Anwarul Kaium, and Aghasian, Erfan
Subjects: Computer Science - Cryptography and Security, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Fog computing is an emerging computing paradigm that has come into consideration for the deployment of IoT applications amongst researchers and technology industries over the last few years. Fog is highly distributed and consists of a wide number of autonomous end devices, which contribute to the processing. However, the variety of devices offered across different users are not audited. Hence, the security of Fog devices is a major concern in the Fog computing environment. Furthermore, mitigating and preventing those security measures is a research issue. Therefore, to provide the necessary security for Fog devices, we need to understand what the security concerns are with regards to Fog. All aspects of Fog security, which have not been covered by other literature works needs to be identified and need to be aggregate all issues in Fog security. It needs to be noted that computation devices consist of many ordinary users, and are not managed by any central entity or managing body. Therefore, trust and privacy is also a key challenge to gain market adoption for Fog. To provide the required trust and privacy, we need to also focus on authentication, threats and access control mechanisms as well as techniques in Fog computing. In this paper, we perform a survey and propose a taxonomy, which presents an overview of existing security concerns in the context of the Fog computing paradigm. We discuss the Blockchain-based solutions towards a secure Fog computing environment and presented various research challenges and directions for future research., Comment: 34 pages, 9 figures
Published: 2020

49. Adaptive Scheduling for Efficient Execution of Dynamic Stream Workflows

Author: Barika, Mutaz, Garg, Saurabh, and Ranjan, Rajiv
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Stream workflow application such as online anomaly detection or online traffic monitoring, integrates multiple streaming big data applications into data analysis pipeline. This application can be highly dynamic in nature, where the data velocity may change at runtime and therefore the resources should be managed overtime. To manage these changes, the orchestration of this application requires a dynamic execution environment and dynamic scheduling technique. For the former requirement, Multicloud environment is a visible solution to cope with the dynamic aspects of this workflow application. While for the latter requirement, dynamic scheduling technique not only need to adhere to end user's requirements in terms of data processing and deadline for decision making, and data stream sources location constraints, but also adjust provisioning and scheduling plan at runtime to cope with dynamic variations of stream data rates. Therefore, we propose a two-phase adaptive scheduling technique to efficiently schedule dynamic workflow application in Multicloud environment that can respond to changes in the velocity of data at runtime. The experimental results showed that the proposed technique is close to the lower bound and effective for different experiment scenarios., Comment: 17 pages, 11 figures
Published: 2019

50. Scheduling Algorithms for Efficient Execution of Stream Workflow Applications in Multicloud Environments

Author: Barika, Mutaz, Garg, Saurabh, Chan, Andrew, and Calheiros, Rodrigo N.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Big data processing applications are becoming more and more complex. They are no more monolithic in nature but instead they are composed of decoupled analytical processes in the form of a workflow. One type of such workflow applications is stream workflow application, which integrates multiple streaming big data applications to support decision making. Each analytical component of these applications runs continuously and processes data streams whose velocity will depend on several factors such as network bandwidth and processing rate of parent analytical component. As a consequence, the execution of these applications on cloud environments requires advanced scheduling techniques that adhere to end user's requirements in terms of data processing and deadline for decision making. In this paper, we propose two Multicloud scheduling and resource allocation techniques for efficient execution of stream workflow applications on Multicloud environments while adhering to workflow application and user performance requirements and reducing execution cost. Results showed that the proposed genetic algorithm is an adequate and effective for all experiments., Comment: 17 pages, 15 figures
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,046 results on '"Garg, Saurabh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources