1. A Fuzzy-based approach to programming language independent source-code plagiarism detection
- Author
-
Georgina Cosma, Giovanni Acampora, Yazici A., Pal N.R., Ishibuchi H., Tutmez B., Lin C.-T., Sousa J.M.C., Kaymak U., Martin T., Acampora, Giovanni, and Cosma, Georgina
- Subjects
Fuzzy clustering ,Source code ,Computer science ,media_common.quotation_subject ,Prediction algorithm ,computer.software_genre ,Machine learning ,Fuzzy logic ,Semantics, Adaptive neuro-fuzzy inference system ,Plagiarism ,Clustering algorithm ,Plagiarism detection ,Computer software ,Cluster analysis ,media_common ,Adaptive neuro fuzzy inference system ,Measurement ,Parsing ,Programming language ,business.industry ,Intellectual property ,Computational linguistic ,Identification of source ,Codes (symbols) ,Computer programming language ,Software algorithms ,Algorithm ,Fuzzy inference ,Source code plagiarisms, C (programming language) ,Software algorithm ,Fuzzy system ,Language independent ,Program compiler ,Compiler ,Artificial intelligence ,Data mining ,business ,computer ,Java - Abstract
Source-code plagiarism detection in programming, concerns the identification of source-code files that contain similar and/or identical source-code fragments. Fuzzy clustering approaches are a suitable solution to detecting source-code plagiarism due to their capability to capture the qualitative and semantic elements of similarity. This paper proposes a novel Fuzzy-based approach to source-code plagiarism detection, based on Fuzzy C-Means and the Adaptive-Neuro Fuzzy Inference System (ANFIS). In addition, performance of the proposed approach is compared to the Self-Organising Map (SOM) and the state-of-the-art plagiarism detection Running Karp-Rabin Greedy-String-Tiling (RKR-GST) algorithms. The advantages of the proposed approach are that it is programming language independent, and hence there is no need to develop any parsers or compilers in order for the fuzzy-based predictor to provide detection in different programming languages. The results demonstrate that the performance of the proposed fuzzy-based approach overcomes all other approaches on well-known source code datasets, and reveals promising results as an efficient and reliable approach to source-code plagiarism detection. © 2015 IEEE.
- Published
- 2015