1. File similarity evaluation scheme for multimedia data using partial hash information
- Author
-
Young Woong Ko, Sung-Bong Jang, Su-Jin Oh, and Byung-Kwan Kim
- Subjects
Theoretical computer science ,Indexed file ,Computer Networks and Communications ,Computer science ,Digital forensics ,Hash function ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,computer.file_format ,Class implementation file ,Torrent file ,Self-certifying File System ,Hardware and Architecture ,Hash list ,Journaling file system ,Data file ,Data_FILES ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Data deduplication ,Disk storage ,computer ,Software ,File system fragmentation - Abstract
File similarity is a numerical indicator that how many duplicated data exist in target files. With this information, we can reduce storage capacity with data deduplication scheme, further it can be exploited in the digital forensic field for finding malicious software. However, measuring file similarity between files can cause a high overhead in terms of processing time and the capacity of disk storage. For this reason, in this paper, we propose a novel file similarity evaluation algorithm called PHISA (Partial Hash Information String Algorithm). To evaluate the performance of the proposed system, we compare PHISA to well-known file similarity tools. The evaluation result shows that PHISA reduces the processing time and increases the similarity evaluation accuracy.
- Published
- 2016
- Full Text
- View/download PDF