Back to Search Start Over

Summarising big data: public GitHub dataset for software engineering challenges

Authors :
Abdulkadir Şeker
Banu Diri
Halil Arslan
Fatih Amasyali
Source :
Volume: 41, Issue: 3 720-724, Cumhuriyet Science Journal, Cumhuriyet Science Journal, Vol 41, Iss 3, Pp 720-724 (2020)
Publication Year :
2020
Publisher :
Cumhuriyet University, 2020.

Abstract

In open-source software development environments; textual, numerical and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software engineering and natural language processing. However, since these data sets contain all the data in the environment, the problem arises in the terabytes of data processing. For this reason, almost all of the studies using GitHub data use filtered data according to certain criteria. In this context, using a different data set in each study makes a comparison of the accuracy of the studies quite difficult. In order to solve this problem, a common dataset was created and shared with the researchers, which would allow us to work on many software engineering problems.<br />7 pages, The article was submitted to Cumhuriyet Science Journal

Details

ISSN :
25872680 and 2587246X
Volume :
41
Database :
OpenAIRE
Journal :
Cumhuriyet Science Journal
Accession number :
edsair.doi.dedup.....da876c27594ea327aa2ea475105836d7