Back to Search Start Over

Key-based data augmentation with curriculum learning for few-shot code search.

Authors :
Zhang, Fan
Peng, Manman
Wu, Qiang
Shen, Yuanyuan
Source :
Neural Computing & Applications. Nov2024, p1-16.
Publication Year :
2024

Abstract

Given a natural language query, code search aims to find matching code snippets from a codebase. Recent works are mainly designed for mainstream programming languages with large amounts of training data. However, code search is also needed for domain-specific programming languages, which have fewer training data, and it is a heavy burden to label a large amount of training data for each domain-specific language. To this end, we propose DAFCS, a data augmentation framework with curriculum learning for few-shot code search tasks. Specifically, we first collect unlabeled codes in the same programming language as the original codes, which can provide additional semantic signals to the original codes. Second, we employ an occlusion-based method to identify key statements in code fragments. Third, we design a set of new key-based augmentation operations for the original codes. Finally, we use curriculum learning to reasonably schedule augmented samples for training well-performing models. We conduct retrieval experiments on a public dataset and find that DAFCS surpasses state-of-the-art methods by 5.42% and 5.05% in the Solidity and SQL domain-specific languages, respectively. Our study shows that DAFCS, which adopts data augmentation and curriculum learning strategies, can achieve promising performance in few-shot code search tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09410643
Database :
Academic Search Index
Journal :
Neural Computing & Applications
Publication Type :
Academic Journal
Accession number :
180984064
Full Text :
https://doi.org/10.1007/s00521-024-10670-9