Author: "Sen, Koushik" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sen, Koushik"' showing total 365 results

Start Over Author "Sen, Koushik" Publication Year Range Last 10 years

365 results on '"Sen, Koushik"'

1. Evaluation and development of some eco-friendly integrated pest management (IPM) modules in bitter gourd to minimize melon fruit fly (Bactrocera cucurbitae Coquillett) infestation under lower Gangetic plain region of West Bengal, India

Author: Sen, Koushik, Dhar, Partha Protim, and Samanta, Arunava
Published: 2022
Full Text: View/download PDF

2. A study of sixty-one massive over-contact binary candidates from the OGLE survey of the Magellanic Clouds

Author: Menon, Athira, Pawlak, Michal, Lennon, Daniel J., Sen, Koushik, and Langer, Norbert
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: Massive contact binaries are fundamental test-beds to study properties of close binaries en route to stellar mergers. Progress in their understanding has been largely impeded due to a dearth in their observed counterparts. We accumulate 61 over-contact binary candidates from the OGLE survey of the Magellanic Clouds and form the largest known sample of such systems. We mine the OGLE-IV catalogue for B-type and earlier systems which match our synthetic light curves, and thereafter examine their properties in comparison with our binary populations. Simultaneously, we also investigate the degeneracy in the light curves between over-contact and near-contact binaries and attempt to distinguish these populations. We construct new synthetic populations of such systems from detailed $\texttt{MESA}$ models with initial parameters of M$_\textrm{total,i}=14-80\,\textrm{M}_{\odot}$ and P$_\textrm{i}=0.6-23\,$days, and compute analogous $\texttt{PHOEBE}$ light curves. We then extract OGLE systems that match our synthetic light curves and compare their numbers, and distributions in periods, absolute V-band magnitudes and colours (V-I) with our population predictions. We expect massive over-contact binaries to have smooth equal-depth light curves due to their equal-mass ($q\approx1$) and consequently, equal-temperature components. We mine 47 and 14 predominantly B-type over-contact OGLE binaries from the Large and Small Magellanic Clouds respectively, with periods of $\approx0.6-1\,$d, along with a few wider O-type systems. This sample is consistent with our theoretically expected numbers and colour-magnitude and period distributions. Additionally, we find that binaries nearing contact have periods as large as $4.5\,$d and mass ratios of $q\approx0.4-0.95$, some of which may masquerade as over-contact binaries and thereby lead to potential observational misclassifications., Comment: 14 pages, 11 figures, submitted to A\&A
Published: 2024

3. BACKRUNNER: Mitigating Smart Contract Attacks in the Real World

Author: Shou, Chaofan, Ke, Yuanyu, Yang, Yupeng, Su, Qi, Dadosh, Or, Eli, Assaf, Benchimol, David, Lu, Doudou, Tong, Daniel, Chen, Dex, Tan, Zoey, Chia, Jacob, Sen, Koushik, and Lee, Wenke
Subjects: Computer Science - Cryptography and Security
Abstract: Billions of dollars have been lost due to vulnerabilities in smart contracts. To counteract this, researchers have proposed attack frontrunning protections designed to preempt malicious transactions by inserting "whitehat" transactions ahead of them to protect the assets. In this paper, we demonstrate that existing frontrunning protections have become ineffective in real-world scenarios. Specifically, we collected 158 recent real-world attack transactions and discovered that 141 of them can bypass state-of-the-art frontrunning protections. We systematically analyze these attacks and show how inherent limitations of existing frontrunning techniques hinder them from protecting valuable assets in the real world. We then propose a new approach involving 1) preemptive hijack, and 2) attack backrunning, which circumvent the existing limitations and can help protect assets before and after an attack. Our approach adapts the exploit used in the attack to the same or similar contracts before and after the attack to safeguard the assets. We conceptualize adapting exploits as a program repair problem and apply established techniques to implement our approach into a full-fledged framework, BACKRUNNER. Running on previous attacks in 2023, BACKRUNNER can successfully rescue more than \$410M. In the real world, it has helped rescue over \$11.2M worth of assets in 28 separate incidents within two months.
Published: 2024

4. Whispering in the dark: Faint X-ray emission from black holes with OB star companions

Author: Sen, Koushik, Mellah, Ileyk El, Langer, Norbert, Xu, Xiao-Tian, Quast, Martin, and Pauli, Daniel
Subjects: Astrophysics - High Energy Astrophysical Phenomena, Astrophysics - Solar and Stellar Astrophysics
Abstract: Context. Recent astrometric and spectroscopic surveys of OB stars have revealed a few stellar-mass black holes (BHs) with orbital periods as low as 10 days. No X-ray counterpart has been detected, due to the absence of a radiatively efficient accretion disk around the BH. Yet, dissipative processes in the hot, dilute and strongly magnetized plasma around the BH (so-called BH corona) can still lead to non-thermal X-ray emission (e.g. synchrotron). Aims. We determine the X-ray luminosity distribution from BH+OB star binaries up to orbital periods of a few thousand days. Methods. We use detailed binary evolution models computed with MESA for initial primary masses of 10-90 $M_{\odot}$ and orbital periods from 1-3000 d. The X-ray luminosity is computed for a broad range of radiative efficiencies. Results. We show that particle acceleration through magnetic reconnection can heat the BH corona. A substantial fraction of the gravitational potential energy from the accreted plasma is converted into non-thermal X-ray emission. Our population synthesis analysis predicts at least 28 (up to 72) BH+OB star binaries in the Large Magellanic Cloud (LMC) to produce X-ray luminosity above 10$^{31}$ erg$\,$s$^{-1}$, observable through focused Chandra observations. We identify a population of SB1 systems in the LMC and HD96670 in the Milky Way comprising O stars with unseen companions of masses above 2.3 $M_{\odot}$ that aligns well with our predictions. The predicted luminosities of the OB companions to these X-ray-emitting BHs are 10$^{4.5-5.5}$ $L_{\odot}$. Conclusions. These results make the case for long-time exposure in X-rays of the stellar-mass BH candidates identified around OB stars. It will constrain the underlying population of X-ray-faint BHs, the evolution from single to double degenerate binaries, and the progenitors of gravitational wave mergers. (Abridged), Comment: A&A, 690, A256. DOI: https://doi.org/10.1051/0004-6361/202450940
Published: 2024
Full Text: View/download PDF

5. Analytic approximations for massive close post-mass transfer binary systems

Author: Schürmann, Christoph, Langer, Norbert, Kramer, Joana A., Marchant, Pablo, Wang, Chen, and Sen, Koushik
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Astrophysics of Galaxies, Astrophysics - High Energy Astrophysical Phenomena
Abstract: Massive binary evolution models are needed to predict massive star populations in star forming galaxies, the supernova diversity, and the number and properties of gravitational wave sources. Such models are often computed using so called rapid binary evolution codes, which approximate the evolution of the binary components based on detailed single star models. However, about one third of the interacting massive binary stars undergo mass transfer during core hydrogen burning (Case A mass transfer), whose outcome is difficult to derive from single star models. Here, we use a large grid of detailed binary evolution models for primaries in the initial mass range 10 to 40 Solar masses of LMC and SMC composition, to derive analytic fits for the key quantities needed in rapid binary evolution codes, i.e., the duration of core hydrogen burning, and the resulting donor star mass. Systems with shorter orbital periods produce up to 50% lighter stripped donors and have a up to 30% larger lifetime than wider systems. We find that both quantities depend strongly on the initial binary orbital period, but that the initial mass ratio and the mass transfer efficiency of the binary have little impact on the outcome. Our results are easily parameterisable and can be used to capture the effects of Case A mass transfer more accurately in rapid binary evolution codes., Comment: 10 pages
Published: 2024
Full Text: View/download PDF

6. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Author: Jain, Naman, Han, King, Gu, Alex, Li, Wen-Ding, Yan, Fanjia, Zhang, Tianjun, Wang, Sida, Solar-Lezama, Armando, Sen, Koushik, and Stoica, Ion
Subjects: Computer Science - Software Engineering, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation benchmarks (e.g., HumanEval, MBPP) are no longer sufficient for assessing their capabilities. In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which continuously collects new problems over time from contests across three competition platforms, namely LeetCode, AtCoder, and CodeForces. Notably, our benchmark also focuses on a broader range of code related capabilities, such as self-repair, code execution, and test output prediction, beyond just code generation. Currently, LiveCodeBench hosts four hundred high-quality coding problems that were published between May 2023 and May 2024. We have evaluated 18 base LLMs and 34 instruction-tuned LLMs on LiveCodeBench. We present empirical findings on contamination, holistic performance comparisons, potential overfitting in existing benchmarks as well as individual model comparisons. We will release all prompts and model completions for further community analysis, along with a general toolkit for adding new scenarios and model, Comment: Website - https://livecodebench.github.io/
Published: 2024

7. The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?

Author: Gu, Alex, Li, Wen-Ding, Jain, Naman, Olausson, Theo X., Lee, Celine, Sen, Koushik, and Solar-Lezama, Armando
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: While language models are increasingly more proficient at code generation, they still frequently generate incorrect programs. Many of these programs are obviously wrong, but others are more subtle and pass weaker correctness checks such as being able to compile. In this work, we focus on these counterfeit samples: programs sampled from a language model that 1) have a high enough log-probability to be generated at a moderate temperature and 2) pass weak correctness checks. Overall, we discover that most models have a very shallow understanding of counterfeits through three clear failure modes. First, models mistakenly classify them as correct. Second, models are worse at reasoning about the execution behaviour of counterfeits and often predict their execution results as if they were correct. Third, when asking models to fix counterfeits, the likelihood of a model successfully repairing a counterfeit is often even lower than that of sampling a correct program from scratch. Counterfeits also have very unexpected properties: first, counterfeit programs for problems that are easier for a model to solve are not necessarily easier to detect and only slightly easier to execute and repair. Second, counterfeits from a given model are just as confusing to the model itself as they are to other models. Finally, both strong and weak models are able to generate counterfeit samples that equally challenge all models. In light of our findings, we recommend that care and caution be taken when relying on models to understand their own samples, especially when no external feedback is incorporated., Comment: 54 pages, 25 figures
Published: 2024

8. LLM4Fuzz: Guided Fuzzing of Smart Contracts with Large Language Models

Author: Shou, Chaofan, Liu, Jing, Lu, Doudou, and Sen, Koushik
Subjects: Computer Science - Cryptography and Security, Computer Science - Software Engineering
Abstract: As blockchain platforms grow exponentially, millions of lines of smart contract code are being deployed to manage extensive digital assets. However, vulnerabilities in this mission-critical code have led to significant exploitations and asset losses. Thorough automated security analysis of smart contracts is thus imperative. This paper introduces LLM4Fuzz to optimize automated smart contract security analysis by leveraging large language models (LLMs) to intelligently guide and prioritize fuzzing campaigns. While traditional fuzzing suffers from low efficiency in exploring the vast state space, LLM4Fuzz employs LLMs to direct fuzzers towards high-value code regions and input sequences more likely to trigger vulnerabilities. Additionally, LLM4Fuzz can leverage LLMs to guide fuzzers based on user-defined invariants, reducing blind exploration overhead. Evaluations of LLM4Fuzz on real-world DeFi projects show substantial gains in efficiency, coverage, and vulnerability detection compared to baseline fuzzing. LLM4Fuzz also uncovered five critical vulnerabilities that can lead to a loss of more than $247k.
Published: 2024

9. Zoomie: A Software-like Debugging Tool for FPGAs

Author: Wei, Tianrui, Laeufer, Kevin, Lim, Katie, Zhao, Jerry, Sen, Koushik, Balkind, Jonathan, and Asanovic, Krste
Subjects: Information and Computing Sciences, Software Engineering
Published: 2024

10. CodeScholar: Growing Idiomatic Code Examples

Author: Shetty, Manish, Sen, Koushik, and Stoica, Ion
Subjects: Computer Science - Software Engineering, Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: Programmers often search for usage examples for API methods. A tool that could generate realistic, idiomatic, and contextual usage examples for one or more APIs would be immensely beneficial to developers. Such a tool would relieve the need for a deep understanding of the API landscape, augment existing documentation, and help discover interactions among APIs. We present CodeScholar, a tool that generates idiomatic code examples demonstrating the common usage of API methods. It includes a novel neural-guided search technique over graphs that grows the query APIs into idiomatic code examples. Our user study demonstrates that in 70% of cases, developers prefer CodeScholar generated examples over state-of-the-art large language models (LLM) like GPT3.5. We quantitatively evaluate 60 single and 25 multi-API queries from 6 popular Python libraries and show that across-the-board CodeScholar generates more realistic, diverse, and concise examples. In addition, we show that CodeScholar not only helps developers but also LLM-powered programming assistants generate correct code in a program synthesis setting.
Published: 2023

11. DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

Author: Singhvi, Arnav, Shetty, Manish, Tan, Shangyin, Potts, Christopher, Sen, Koushik, Zaharia, Matei, and Khattab, Omar
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Programming Languages
Abstract: Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems. We also propose strategies to use assertions at inference time for automatic self-refinement with LMs. We report on four diverse case studies for text generation and find that LM Assertions improve not only compliance with imposed rules but also downstream task performance, passing constraints up to 164% more often and generating up to 37% more higher-quality responses. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspy, Comment: Arnav*, Manish*, Shangyin* contributed equally to this work
Published: 2023

12. The Emerging Role of Natural Products in Cancer Treatment

Author: Ghosh, Sumit, Das, Sanjib Kumar, Sinha, Krishnendu, Ghosh, Biswatosh, Sen, Koushik, Ghosh, Nabanita, and Sil, Parames C.
Published: 2024
Full Text: View/download PDF

13. Status of geo-scientific research at Wadia Institute of Himalayan Geology, Dehradun during 2020–2023

Author: Sain, Kalachand, Hazarika, Devajit, Sen, Koushik, and Perumal, R. Jayangonda
Published: 2024
Full Text: View/download PDF

14. Multiversion Hindsight Logging for Continuous Training

Author: Garcia, Rolando, Dandamudi, Anusha, Matute, Gabriel, Wan, Lehan, Gonzalez, Joseph, Hellerstein, Joseph M., and Sen, Koushik
Subjects: Computer Science - Software Engineering, Computer Science - Databases
Abstract: Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay.
Published: 2023

15. ItyFuzz: Snapshot-Based Fuzzer for Smart Contract

Author: Shou, Chaofan, Tan, Shangyin, and Sen, Koushik
Subjects: Computer Science - Cryptography and Security, Computer Science - Software Engineering
Abstract: Smart contracts are critical financial instruments, and their security is of utmost importance. However, smart contract programs are difficult to fuzz due to the persistent blockchain state behind all transactions. Mutating sequences of transactions are complex and often lead to a suboptimal exploration for both input and program spaces. In this paper, we introduce a novel snapshot-based fuzzer ItyFuzz for testing smart contracts. In ItyFuzz, instead of storing sequences of transactions and mutating from them, we snapshot states and singleton transactions. To explore interesting states, ItyFuzz introduces a dataflow waypoint mechanism to identify states with more potential momentum. ItyFuzz also incorporates comparison waypoints to prune the space of states. By maintaining snapshots of the states, ItyFuzz can synthesize concrete exploits like reentrancy attacks quickly. Because ItyFuzz has second-level response time to test a smart contract, it can be used for on-chain testing, which has many benefits compared to local development testing. Finally, we evaluate ItyFuzz on real-world smart contracts and some hacked on-chain DeFi projects. ItyFuzz outperforms existing fuzzers in terms of instructional coverage and can find and generate realistic exploits for on-chain projects quickly., Comment: ISSTA 2023
Published: 2023

16. SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

Author: Ardakani, Arash, Haan, Altan, Tan, Shangyin, Popovici, Doru Thom, Cheung, Alvin, Iancu, Costin, and Sen, Koushik
Subjects: Computer Science - Computation and Language
Abstract: Transformer-based models, such as BERT and ViT, have achieved state-of-the-art results across different natural language processing (NLP) and computer vision (CV) tasks. However, these models are extremely memory intensive during their fine-tuning process, making them difficult to deploy on GPUs with limited memory resources. To address this issue, we introduce a new tool called SlimFit that reduces the memory requirements of these models by dynamically analyzing their training dynamics and freezing less-contributory layers during fine-tuning. The layers to freeze are chosen using a runtime inter-layer scheduling algorithm. SlimFit adopts quantization and pruning for particular layers to balance the load of dynamic activations and to minimize the memory footprint of static activations, where static activations refer to those that cannot be discarded regardless of freezing. This allows SlimFit to freeze up to 95% of layers and reduce the overall on-device GPU memory usage of transformer-based models such as ViT and BERT by an average of 2.2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2.0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0.2% in accuracy. For such NLP and CV tasks, SlimFit can reduce up to 3.1x the total on-device memory usage with an accuracy degradation of only up to 0.4%. As a result, while fine-tuning of ViT on ImageNet and BERT on SQuAD 2.0 with a batch size of 128 requires 3 and 2 32GB GPUs respectively, SlimFit enables their fine-tuning on a single 32GB GPU without any significant accuracy degradation.
Published: 2023

17. Response of Cellular Stress Toward the Hormetic Phytochemicals in Brain Aging

Author: Saha, Aloke, Sanyal, Tanmay, Mukherjee, Pronoy, Sen, Koushik, Madhu, Nithar Ranjan, Pathak, Surajit, editor, and Banerjee, Antara, editor
Published: 2024
Full Text: View/download PDF

18. Flavonoids: A Promising Neuroprotectant and Its Salutary Effects on Age-Related Neurodegenerative Disorders

Author: Das, Sanjib Kumar, Sen, Koushik, Sanyal, Tanmay, Saha, Aloke, Madhu, Nithar Ranjan, Pathak, Surajit, editor, and Banerjee, Antara, editor
Published: 2024
Full Text: View/download PDF

19. Benchmarking Language Models for Code Syntax Understanding

Author: Shen, Da, Chen, Xinyun, Wang, Chenguang, Sen, Koushik, and Song, Dawn
Subjects: Computer Science - Computation and Language
Abstract: Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding, which represent the input as a token sequence without explicitly modeling its structure. Some prior works show that pre-trained language models can capture the syntactic rules of natural languages without finetuning on syntax understanding tasks. However, there is limited understanding of how well pre-trained models understand the code structure so far. In this work, we perform the first thorough benchmarking of the state-of-the-art pre-trained models for identifying the syntactic structures of programs. Specifically, we introduce CodeSyntax, a large-scale dataset of programs annotated with the syntactic relationships in their corresponding abstract syntax trees. Our key observation is that existing language models pretrained on code still lack the understanding of code syntax. In fact, these pre-trained programming language models fail to match the performance of simple baselines based on positional offsets and keywords. We also present a natural language benchmark to highlight the differences between natural languages and programming languages in terms of syntactic structure understanding. Our findings point out key limitations of existing pre-training methods for programming languages, and suggest the importance of modeling code syntactic structures., Comment: Findings of EMNLP 2022
Published: 2022

20. PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion

Author: Shen, Jianhao, Wang, Chenguang, Yuan, Ye, Han, Jiawei, Ji, Heng, Sen, Koushik, Zhang, Ming, and Song, Dawn
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion. Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed. We establish this via reformulating KG completion as a "fill-in-the-blank" task, and introducing a parameter-lite encoder on top of the original LMs. We show that, by tuning far fewer parameters than finetuning, LMs transfer non-trivially to most tasks and reach competitiveness with prior state-of-the-art approaches. For instance, we outperform the fully finetuning approaches on a KG completion benchmark by tuning only 1% of the parameters. The code and datasets are available at \url{https://github.com/yuanyehome/PALT}., Comment: Findings of EMNLP 2022
Published: 2022

21. The Nature of Unseen Companions in Massive Single-Line Spectroscopic Binaries

Author: Sana, Hugues, Abdul-Masih, Michael, Banyard, Gareth, Bodensteiner, Julia, Bowman, Dominic M., Dsilva, Karan, Eldridge, C., Fabry, Matthias, Frost, Abigail J., Hawcroft, Calum, Janssens, Soetkin, Mahy, Laurent, Marchant, Pablo, Langer, Norbert, Van Reeth, Timothy, Sen, Koushik, and Shenar, Tomer
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Astrophysics of Galaxies, Astrophysics - High Energy Astrophysical Phenomena
Abstract: Massive stars are predominantly found in binaries and higher order multiples. While the period and eccentricity distributions of OB stars are now well established across different metallicity regimes, the determination of mass-ratios has been mostly limited to double-lined spectroscopic binaries. As a consequence, the mass-ratio distribution remains subject to significant uncertainties. Open questions include the shape and extent of the companion mass-function towards its low-mass end and the nature of undetected companions in single-lined spectroscopic binaries. In this contribution, we present the results of a large and systematic analysis of a sample of over 80 single-lined O-type spectroscopic binaries (SB1s) in the Milky Way and in the Large Magellanic Cloud (LMC). We report on the developed methodology, the constraints obtained on the nature of SB1 companions, the distribution of O star mass-ratios at LMC metallicity and the occurrence of quiescent OB+black hole binaries., Comment: To appear in the proceedings of IAUS361: Massive stars near and far; 6 pages, 1 figure
Published: 2022

22. LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity

Author: Gubri, Martin, Cordy, Maxime, Papadakis, Mike, Traon, Yves Le, and Sen, Koushik
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9 percentage points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples., Comment: Accepted at ECCV 2022
Published: 2022

23. An X-ray quiet black hole born with a negligible kick in a massive binary within the Large Magellanic Cloud

Author: Shenar, Tomer, Sana, Hugues, Mahy, Laurent, El-Badry, Kareem, Marchant, Pablo, Langer, Norbert, Hawcroft, Calum, Fabry, Matthias, Sen, Koushik, Almeida, Leonardo A., Abdul-Masih, Michael, Bodensteiner, Julia, Crowther, Paul A., Gieles, Mark, Gromadzki, Mariusz, Henault-Brunet, Vincent, Herrero, Artemio, de Koter, Alex, Iwanek, Patryk, Kozłowski, Szymon, Lennon, Daniel J., Apellaniz, Jesus Maız, Mroz, Przemysław, Moffat, Anthony F. J., Picco, Annachiara, Pietrukowicz, Paweł, Poleski, Radosław, Rybicki, Krzysztof, Schneider, Fabian R. N., Skowron, Dorota M., Skowron, Jan, Soszynski, Igor, Szymanski, Michał K., Toonen, Silvia, Udalski, Andrzej, Ulaczyk, Krzysztof, Vink, Jorick S., and Wrona, Marcin
Subjects: Astrophysics - High Energy Astrophysical Phenomena, Astrophysics - Astrophysics of Galaxies, Astrophysics - Solar and Stellar Astrophysics
Abstract: Stellar-mass black holes are the final remnants of stars born with more than 15 solar masses. Billions are expected to reside in the Local Group, yet only few are known, mostly detected through X-rays emitted as they accrete material from a companion star. Here, we report on VFTS 243: a massive X-ray faint binary in the Large Magellanic Cloud. With an orbital period of 10.4-d, it comprises an O-type star of 25 solar masses and an unseen companion of at least nine solar masses. Our spectral analysis excludes a non-degenerate companion at a 5-sigma confidence level. The minimum companion mass implies that it is a black hole. No other X-ray quiet black hole is unambiguously known outside our Galaxy. The (near-)circular orbit and kinematics of VFTS 243 imply that the collapse of the progenitor into a black hole was associated with little or no ejected material or black-hole kick. Identifying such unique binaries substantially impacts the predicted rates of gravitational-wave detections and properties of core-collapse supernovae across the Cosmos., Comment: Accepted to Nature Astronomy, 64 pages, 15 figures, 2 tables; ESO press release: https://www.eso.org/public/news/eso2210/; Nat Asr paper URL: https://www.nature.com/articles/s41550-022-01730-y
Published: 2022
Full Text: View/download PDF

24. The Tarantula Massive Binary Monitoring VI: Characterisation of hidden companions in 51 single-lined O-type binaries, a flat mass-ratio distribution, and black-hole binary candidates

Author: Shenar, Tomer, Sana, Hugues, Mahy, Laurent, Apellaniz, Jesus Maiz, Crowther, Paul A., Gromadzki, Mariusz, Herrero, Artemio, Langer, Norbert, Marchant, Pablo, Schneider, Fabian R. N., Sen, Koushik, Soszynski, Igor, and Toonen, S.
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: We aim to hunt for massive binaries hosting a black hole companion (OB+BH) and establish the natal mass-ratio distribution of massive stars at the subsolar metallicity environment of the Large Magellanic Cloud (LMC). We use the shift-and-add grid disentangling technique to characterize the hidden companions in 51 SB1 O-type and evolved B-type binaries in the LMC monitored in the framework of the Tarantula Massive Binary Monitoring (TMBM). Out of the 51 SB1 systems, 43 (84%) are found to have non-degenerate stellar companions, of which 28 are confident detections, and 15 are less certain (SB1: or SB2:). Of these 43 targets, one is found to be a triple (VFTS 64), and two are found to be quadruples (VFTS 120, 702). The remaining eight targets (16%) retain an SB1 classification. Aside from the unambiguous case of VFTS 243, analysed in detailed in a separate paper, we identify two additional OB+BH candidates: VFTS 514 and VFTS 779. Additional black holes may be present in the sample but at a lower probability. Our study firmly establishes a virtually flat natal mass-ratio distribution for O-type stars at LMC metallicity, covering the entire mass-ratio range (0.05 < q < 1) and periods in the range 0 < log P < 3 [d]. The nature of the OB+BH candidates should be verified through future monitoring, but the frequency of OB+BH candidates is generally in line with recent predictions at LMC metallicity., Comment: 41 pages (14 main article + 27 appendix), accepted to A&A
Published: 2022
Full Text: View/download PDF

25. Cancer immunotherapy-associated endocrine complications and treatment strategies

Author: Sen, Koushik, primary, Adhikari, Madhuchhanda, additional, Biswas, Chayan, additional, Maity, Sukhendu, additional, Chatterjee, Ankit, additional, and Pramanick, Kousik, additional
Published: 2024
Full Text: View/download PDF

26. The Sky Above The Clouds

Author: Chasins, Sarah, Cheung, Alvin, Crooks, Natacha, Ghodsi, Ali, Goldberg, Ken, Gonzalez, Joseph E., Hellerstein, Joseph M., Jordan, Michael I., Joseph, Anthony D., Mahoney, Michael W., Parameswaran, Aditya, Patterson, David, Popa, Raluca Ada, Sen, Koushik, Shenker, Scott, Song, Dawn, and Stoica, Ion
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen years old, could evolve as it matures., Comment: 35 pages
Published: 2022

27. Lupeol: A dietary and medicinal triterpene with therapeutic potential

Author: Sen, Koushik, Kumar Das, Sanjib, Ghosh, Nabanita, Sinha, Krishnendu, and Sil, Parames C.
Published: 2024
Full Text: View/download PDF

28. Learning Highly Recursive Input Grammars

Author: Kulkarni, Neil, Lemieux, Caroline, and Sen, Koushik
Subjects: Computer Science - Software Engineering, Computer Science - Programming Languages
Abstract: This paper presents Arvada, an algorithm for learning context-free grammars from a set of positive examples and a Boolean-valued oracle. Arvada learns a context-free grammar by building parse trees from the positive examples. Starting from initially flat trees, Arvada builds structure to these trees with a key operation: it bubbles sequences of sibling nodes in the trees into a new node, adding a layer of indirection to the tree. Bubbling operations enable recursive generalization in the learned grammar. We evaluate Arvada against GLADE and find it achieves on average increases of 4.98x in recall and 3.13x in F1 score, while incurring only a 1.27x slowdown and requiring only 0.87x as many calls to the oracle. Arvada has a particularly marked improvement over GLADE on grammars with highly recursive structure, like those of programming languages.
Published: 2021

29. Abscopal effect of radiation in metastatic esophageal carcinoma: fourth reported case

Author: Biswas, Rituparna, Jindel, Rajesh, Halder, Anirban, Sen, Koushik, and Kabasi, Apurba
Published: 2023
Full Text: View/download PDF

30. QFAST: Conflating Search and Numerical Optimization for Scalable Quantum Circuit Synthesis

Author: Younis, Ed, Sen, Koushik, Yelick, Katherine, and Iancu, Costin
Subjects: Quantum Physics
Abstract: We present a quantum synthesis algorithm designed to produce short circuits and to scale well in practice. The main contribution is a novel representation of circuits able to encode placement and topology using generic "gates", which allows the QFAST algorithm to replace expensive searches over circuit structures with few steps of numerical optimization. When compared against optimal depth, search based state-of-the-art techniques, QFAST produces comparable results: 1.19x longer circuits up to four qubits, with an increase in compilation speed of 3.6x. In addition, QFAST scales up to seven qubits. When compared with the state-of-the-art "rule" based decomposition techniques in Qiskit, QFAST produces circuits shorter by up to two orders of magnitude (331x), albeit 5.6x slower. We also demonstrate the composability with other techniques and the tunability of our formulation in terms of circuit depth and running time.
Published: 2021

31. Growing a Test Corpus with Bonsai Fuzzing

Author: Vikram, Vasudev, Padhye, Rohan, and Sen, Koushik
Subjects: Computer Science - Software Engineering, Computer Science - Programming Languages
Abstract: This paper presents a coverage-guided grammar-based fuzzing technique for automatically generating a corpus of concise test inputs for programs such as compilers. We walk-through a case study of a compiler designed for education and the corresponding problem of generating meaningful test cases to provide to students. The prior state-of-the-art solution is a combination of fuzzing and test-case reduction techniques such as variants of delta-debugging. Our key insight is that instead of attempting to minimize convoluted fuzzer-generated test inputs, we can instead grow concise test inputs by construction using a form of iterative deepening. We call this approach Bonsai Fuzzing. Experimental results show that Bonsai Fuzzing can generate test corpora having inputs that are 16--45% smaller in size on average as compared to a fuzz-then-reduce approach, while achieving approximately the same code coverage and fault-detection capability., Comment: Accepted at the 43rd International Conference on Software Engineering (ICSE 2021)
Published: 2021

32. Exhumation mechanisms of the Himalayan metamorphic core in the Bhagirathi Valley, NW India: Insights from an integrated structural and metamorphic analysis

Author: Sen, Aranya, Phukon, Purbajyoti, Sen, Koushik, Bose, Subham, and Srivastava, Hari B.
Published: 2024
Full Text: View/download PDF

33. Detailed evolutionary models of massive contact binaries: I. Model grids and synthetic populations for the Magellanic Clouds

Author: Menon, Athira, Langer, Norbert, de Mink, Selma E., Justham, Stephen, Sen, Koushik, Szécsi, Dorottya, de Koter, Alex, Abdul-Masih, Michael, Sana, Hugues, Mahy, Laurent, and Marchant, Pablo
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: The majority of close massive binary stars with initial periods of a few days experience a contact phase, in which both stars overflow their Roche lobes simultaneously. We perform the first dedicated study of the evolution of massive contact binaries and provide a comprehensive prediction of their observed properties. We compute 2790 detailed binary models for the Large and Small Magellanic Clouds each, assuming a conservative mass transfer. The initial parameter space for both grids span total masses from 20 to 80$\,\textrm{M}_{\odot}$, orbital periods of 0.6 to 2 days and mass ratios of 0.6 to 1.0. We find that models that remain in contact over nuclear timescales evolve towards equal masses, echoing the mass ratios of their observed counterparts. Ultimately, the fate of our nuclear-timescale models is to merge on the main sequence. Our predicted period-mass ratio distributions of O-type contact binaries are similar for both galaxies, and we expect 10 such systems together in both Magellanic Clouds. While we can largely reproduce the observed distribution, we over-estimate the population of equal-mass contact binaries. This situation is somewhat remedied if we also account for binaries that are approaching contact. Our theoretical distributions work particularly well for contact binaries with periods $<$2 days and total masses $\lessapprox45\,\textrm{M}_{\odot}$. We expect stellar winds, non-conservative mass transfer and envelope inflation to have played a role in the formation of the more massive and longer-period contact binaries., Comment: 23 pages, 11 figures, accepted for publication in MNRAS
Published: 2020
Full Text: View/download PDF

34. Efficient and Transferable Adversarial Examples from Bayesian Neural Networks

Author: Gubri, Martin, Cordy, Maxime, Papadakis, Mike, Traon, Yves Le, and Sen, Koushik
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work., Comment: Accepted at UAI 2022
Published: 2020

35. Exempla Gratis (E.G.): Code Examples for Free

Author: Barnaby, Celeste, Sen, Koushik, Zhang, Tianyi, Glassman, Elena, and Chandra, Satish
Subjects: Computer Science - Software Engineering
Abstract: Modern software engineering often involves using many existing APIs, both open source and, in industrial coding environments, proprietary. Programmers reference documentation and code search tools to remind themselves of proper common usage patterns of APIs. However, high-quality API usage examples are computationally expensive to curate and maintain, and API usage examples retrieved from company-wide code search can be tedious to review. We present a tool, EG, that mines codebases and shows the common, idiomatic usage examples for API methods. EG was integrated into Facebook's internal code search tool for the Hack language and evaluated on open-source GitHub projects written in Python. EG was also compared against code search results and hand-written examples from a popular programming website called ProgramCreek. Compared with these two baselines, examples generated by EG are more succinct and representative with less extraneous statements. In addition, a survey with Facebook developers shows that EG examples are preferred in 97 percent of cases.
Published: 2020

36. Hindsight Logging for Model Training

Author: Garcia, Rolando, Liu, Eric, Sreekanti, Vikram, Yan, Bobby, Dandamudi, Anusha, Gonzalez, Joseph E., Hellerstein, Joseph M., and Sen, Koushik
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Databases, Computer Science - Software Engineering
Abstract: In modern Machine Learning, model training is an iterative, experimental process that can consume enormous computation resources and developer time. To aid in that process, experienced model developers log and visualize program variables during training runs. Exhaustive logging of all variables is infeasible. Optimistic logging can be accompanied by program checkpoints; this allows developers to add log statements post-hoc, and "replay" desired log statements from checkpoint -- a process we refer to as hindsight logging. Unfortunately, hindsight logging raises tricky problems in data management and software engineering. Done poorly, hindsight logging can waste resources and generate technical debt embodied in multiple variants of training code. In this paper, we present methodologies for efficient and effective logging practices for model training, with a focus on techniques for hindsight logging. Our goal is for experienced model developers to learn and adopt these practices. To make this easier, we provide an open-source suite of tools for Fast Low-Overhead Recovery (flor) that embodies our design across three tasks: (i) efficient background logging in Python, (ii) adaptable periodic checkpointing, and (iii) an instrumentation library that codifies hindsight logging for efficient and automatic record-replay of model-training. Model developers can use each flor tool separately as they see fit, or they can use flor in hands-free mode, entrusting it to instrument their code end-to-end for efficient record-replay. Our solutions leverage techniques from physiological transaction logs and recovery in database systems. Evaluations on modern ML benchmarks demonstrate that flor can produce fast checkpointing with small user-specifiable overheads (e.g. 7%), and still provide hindsight log replay times orders of magnitude faster than restarting training from scratch.
Published: 2020
Full Text: View/download PDF

37. Ansor: Generating High-Performance Tensor Programs for Deep Learning

Author: Zheng, Lianmin, Jia, Chengfan, Sun, Minmin, Wu, Zhao, Yu, Cody Hao, Haj-Ali, Ameer, Wang, Yida, Yang, Jun, Zhuo, Danyang, Sen, Koushik, Gonzalez, Joseph E., and Stoica, Ion
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Performance, Computer Science - Programming Languages, Statistics - Machine Learning
Abstract: High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require significant engineering effort to develop platform-specific optimization code or fall short of finding high-performance programs due to restricted search space and ineffective exploration strategy. We present Ansor, a tensor program generation framework for deep learning applications. Compared with existing search strategies, Ansor explores many more optimization combinations by sampling programs from a hierarchical representation of the search space. Ansor then fine-tunes the sampled programs with evolutionary search and a learned cost model to identify the best programs. Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches. In addition, Ansor utilizes a task scheduler to simultaneously optimize multiple subgraphs in deep neural networks. We show that Ansor improves the execution performance of deep neural networks relative to the state-of-the-art on the Intel CPU, ARM CPU, and NVIDIA GPU by up to $3.8\times$, $2.6\times$, and $1.7\times$, respectively., Comment: OSDI 2020
Published: 2020

38. QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space

Author: Younis, Ed, Sen, Koushik, Yelick, Katherine, and Iancu, Costin
Subjects: Quantum Physics
Abstract: We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation stage. When compared against state-of-the-art techniques, although not optimal, QFAST can generate much shorter circuits for "time dependent evolution" algorithms used by domain scientists. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from the available approaches.
Published: 2020

39. Heuristics for Quantum Compiling with a Continuous Gate Set

Author: Davis, Marc Grau, Smith, Ethan, Tudor, Ana, Sen, Koushik, Siddiqi, Irfan, and Iancu, Costin
Subjects: Computer Science - Emerging Technologies, Quantum Physics
Abstract: We present an algorithm for compiling arbitrary unitaries into a sequence of gates native to a quantum processor. As accurate CNOT gates are hard for the foreseeable Noisy- Intermediate-Scale Quantum devices era, our A* inspired algorithm attempts to minimize their count, while accounting for connectivity. We discuss the search strategy together with metrics to expand the solution frontier. For a workload of circuits with complexity appropriate for the NISQ era, we produce solutions well within the best upper bounds published in literature and match or exceed hand tuned implementations, as well as other existing synthesis alternatives. In particular, when comparing against state-of-the-art available synthesis packages we show 2.4x average (up to 5.3x) reduction in CNOT count. We also show how to re-target the algorithm for a different chip topology and native gate set, while obtaining similar quality results. We believe that empirical tools like ours can facilitate algorithmic exploration, gate set discovery for quantum processor designers, as well as providing useful optimization blocks within the quantum compilation tool-chain., Comment: Presented at the 3rd International Workshop on Quantum Compilation as part of the International Conference On Computer Aided Design 2019
Published: 2019

40. Omphacite breakdown, symplectite formation and carbonate metasomatism in a retrograded continental eclogite: Implications for the exhumation of the Tso Morari Crystalline Complex (Trans-Himalaya, NW India)

Author: Dey, Alosree, Sen, Koushik, Sen, Aranya, and Choudhary, Shubham
Published: 2023
Full Text: View/download PDF

41. When Deep Learning Met Code Search

Author: Cambronero, Jose, Li, Hongyu, Kim, Seohyun, Sen, Koushik, and Chandra, Satish
Subjects: Computer Science - Software Engineering, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: There have been multiple recent proposals on using deep neural networks for code search using natural language. Common across these proposals is the idea of $\mathit{embedding}$ code and natural language queries, into real vectors and then using vector distance to approximate semantic correlation between code and the query. Multiple approaches exist for learning these embeddings, including $\mathit{unsupervised}$ techniques, which rely only on a corpus of code examples, and $\mathit{supervised}$ techniques, which use an $\mathit{aligned}$ corpus of paired code and natural language descriptions. The goal of this supervision is to produce embeddings that are more similar for a query and the corresponding desired code snippet. Clearly, there are choices in whether to use supervised techniques at all, and if one does, what sort of network and training to use for supervision. This paper is the first to evaluate these choices systematically. To this end, we assembled implementations of state-of-the-art techniques to run on a common platform, training and evaluation corpora. To explore the design space in network complexity, we also introduced a new design point that is a $\mathit{minimal}$ supervision extension to an existing unsupervised technique. Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus. The evaluation dataset is now available at arXiv:1908.09804
Published: 2019

42. Aroma: Code Recommendation via Structural Code Search

Author: Luan, Sifei, Yang, Di, Barnaby, Celeste, Sen, Koushik, and Chandra, Satish
Subjects: Computer Science - Software Engineering
Abstract: Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.
Published: 2018
Full Text: View/download PDF

43. Strategies to Improve Delivery of Bioactive Agents

Author: Biswas, Chayan, Adhikari, Madhuchhanda, Sen, Koushik, Maity, Sukhendu, Guchhait, Rajkumar, Pramanick, Kousik, Afaq, Sarah, editor, Malik, Arshi, editor, and Tarique, Mohammed, editor
Published: 2022
Full Text: View/download PDF

44. Semantic Fuzzing with Zest

Author: Padhye, Rohan, Lemieux, Caroline, Sen, Koushik, Papadakis, Mike, and Traon, Yves Le
Subjects: Computer Science - Software Engineering
Abstract: Programs expecting structured inputs often consist of both a syntactic analysis stage, which parses raw input, and a semantic analysis stage, which conducts checks on the parsed input and executes the core logic of the program. Generator-based testing tools in the lineage of QuickCheck are a promising way to generate random syntactically valid test inputs for these programs. We present Zest, a technique which automatically guides QuickCheck-like randominput generators to better explore the semantic analysis stage of test programs. Zest converts random-input generators into deterministic parametric generators. We present the key insight that mutations in the untyped parameter domain map to structural mutations in the input domain. Zest leverages program feedback in the form of code coverage and input validity to perform feedback-directed parameter search. We evaluate Zest against AFL and QuickCheck on five Java programs: Maven, Ant, BCEL, Closure, and Rhino. Zest covers 1.03x-2.81x as many branches within the benchmarks semantic analysis stages as baseline techniques. Further, we find 10 new bugs in the semantic analysis stages of these benchmarks. Zest is the most effective technique in finding these bugs reliably and quickly, requiring at most 10 minutes on average to find each bug., Comment: To appear in Proceedings of 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'19)
Published: 2018
Full Text: View/download PDF

45. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts

Author: Bavishi, Rohan, Pradel, Michael, and Sen, Koushik
Subjects: Computer Science - Software Engineering, Computer Science - Machine Learning, Computer Science - Programming Languages, Statistics - Machine Learning
Abstract: Most of the JavaScript code deployed in the wild has been minified, a process in which identifier names are replaced with short, arbitrary and meaningless names. Minified code occupies less space, but also makes the code extremely difficult to manually inspect and understand. This paper presents Context2Name, a deep learningbased technique that partially reverses the effect of minification by predicting natural identifier names for minified names. The core idea is to predict from the usage context of a variable a name that captures the meaning of the variable. The approach combines a lightweight, token-based static analysis with an auto-encoder neural network that summarizes usage contexts and a recurrent neural network that predict natural names for a given usage context. We evaluate Context2Name with a large corpus of real-world JavaScript code and show that it successfully predicts 47.5% of all minified identifiers while taking only 2.9 milliseconds on average to predict a name. A comparison with the state-of-the-art tools JSNice and JSNaughty shows that our approach performs comparably in terms of accuracy while improving in terms of efficiency. Moreover, Context2Name complements the state-of-the-art by predicting 5.3% additional identifiers that are missed by both existing tools.
Published: 2018

46. DeepBugs: A Learning Approach to Name-based Bug Detection

Author: Pradel, Michael and Sen, Koushik
Subjects: Computer Science - Software Engineering, Computer Science - Programming Languages
Abstract: Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.
Published: 2018

47. Selecting Fault Revealing Mutants

Author: Chekam, Thierry Titcheu, Papadakis, Mike, Bissyandé, Tegawendé, Traon, Yves Le, and Sen, Koushik
Subjects: Computer Science - Software Engineering
Abstract: Mutant selection refers to the problem of choosing, among a large number of mutants, the (few) ones that should be used by the testers. In view of this, we investigate the problem of selecting the fault revealing mutants, i.e., the mutants that are most likely to be killable and lead to test cases that uncover unknown program faults. We formulate two variants of this problem: the fault revealing mutant selection and the fault revealing mutant prioritization. We argue and show that these problems can be tackled through a set of 'static' program features and propose a machine learning approach, named FaRM, that learns to select and rank killable and fault revealing mutants. Experimental results involving 1,692 real faults show the practical benefits of our approach in both examined problems. Our results show that FaRM achieves a good trade-off between application cost and effectiveness (measured in terms of faults revealed). We also show that FaRM outperforms all the existing mutant selection methods, i.e., the random mutant sampling, the selective mutation and defect prediction (mutating the code areas pointed by defect prediction). In particular, our results show that with respect to mutant selection, our approach reveals 23% to 34% more faults than any of the baseline methods, while, with respect to mutant prioritization, it achieves higher average percentage of revealed faults with a median difference between 4% and 9% (from the random mutant orderings).
Published: 2018

48. Sub-photospheric fluctuations in magnetized radiative envelopes: contribution from unstable magnetosonic waves

Author: Sen, Koushik, Fernández, Rodrigo, and Socrates, Aristotle
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - High Energy Astrophysical Phenomena
Abstract: We examine the excitation of unstable magnetosonic waves in the radiative envelopes of intermediate- and high-mass stars with a magnetic field of ~kG strength. Wind clumping close to the star and microturbulence can often be accounted for when including small-scale, sub-photospheric density or velocity perturbations. Compressional waves - with wavelengths comparable to or shorter than the gas pressure scale height - can be destabilized by the radiative flux in optically-thick media when a magnetic field is present, in a process called the Radiation-Driven Magneto-Acoustic Instability (RMI). The instability does not require radiation or magnetic pressure to dominate over gas pressure, and acts independently of sub-surface convection zones. Here we evaluate the conditions for the RMI to operate on a grid of stellar models covering a mass range $3-40M_\odot$ at solar metallicity. For a uniform 1kG magnetic field, fast magnetosonic modes are unstable down to an optical depth of a few tens, while unstable slow modes extend beyond the depth of the iron convection zone. The qualitative behavior is robust to magnetic field strength variations by a factor of a few. When combining our findings with previous results for the saturation amplitude of the RMI, we predict velocity fluctuations in the range ~0.1-10 km/s. These amplitudes are a monotonically increasing function of the ratio of radiation to gas pressure, or alternatively, of the zero-age main sequence mass., Comment: Accepted by MNRAS
Published: 2018
Full Text: View/download PDF

49. Petrogenetic history and melt inclusion characteristics of mantle plume-derived ijolites from NE India: implications for multistage crystallization and occurrence of “nano-calciocarbonatites”

Author: Choudhary, Shubham, Sen, Koushik, Rana, Shruti, and Kumar, Santosh
Published: 2022
Full Text: View/download PDF

50. FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage

Author: Lemieux, Caroline and Sen, Koushik
Subjects: Computer Science - Software Engineering, Computer Science - Cryptography and Security
Abstract: In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the depth of program coverage it achieves, in particular because it does not consider which parts of program inputs should not be mutated in order to maintain deep program coverage. We propose an approach, FairFuzz, that helps alleviate this limitation in two key steps. First, FairFuzz automatically prioritizes inputs exercising rare parts of the program under test. Second, it automatically adjusts the mutation of inputs so that the mutated inputs are more likely to exercise these same rare parts of the program. We conduct evaluation on real-world programs against state-of-the-art versions of AFL, thoroughly repeating experiments to get good measures of variability. We find that on certain benchmarks FairFuzz shows significant coverage increases after 24 hours compared to state-of-the-art versions of AFL, while on others it achieves high program coverage at a significantly faster rate.
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

365 results on '"Sen, Koushik"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources