Back to Search
Start Over
URL-based Phishing Detection using the Entropy of Non-Alphanumeric Characters
- Source :
- iiWAS
- Publication Year :
- 2019
- Publisher :
- ACM, 2019.
-
Abstract
- Phishing is a type of personal information theft in which phishers lure users to steal sensitive information. Phishing detection mechanisms using various techniques have been developed. Our hypothesis is that phishers create fake websites with as little information as possible in a webpage, which makes it difficult for content- and visual similarity-based detections by analyzing the webpage content. To overcome this, we focus on the use of Uniform Resource Locators (URLs) to detect phishing. Since previous work extracts specific special-character features, we assume that non-alphanumeric (NAN) character distributions highly impact the performance of URL-based detection. We hence propose a new feature called the entropy of NAN characters for URL-based phishing detection. Experimental evaluation with balanced and imbalanced datasets shows 96% ROC AUC on the balanced dataset and 89% ROC AUC on the imbalanced dataset, which increases the ROC AUC as 5 to 6% from without adopting our proposed feature.
- Subjects :
- Alphanumeric
Computer science
business.industry
020206 networking & telecommunications
020302 automobile design & engineering
Pattern recognition
02 engineering and technology
Phishing detection
Phishing
Information sensitivity
ComputingMethodologies_PATTERNRECOGNITION
0203 mechanical engineering
Web page
0202 electrical engineering, electronic engineering, information engineering
Entropy (information theory)
Artificial intelligence
business
Personally identifiable information
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
- Accession number :
- edsair.doi...........bed634b8538fa51f28254784c56043ab
- Full Text :
- https://doi.org/10.1145/3366030.3366064