Back to Search Start Over

基于网页结构与语言特征的垃圾网页链接检测方法.

Authors :
杨 望
江咏涵
张三峰
Source :
Journal of Northeastern University (Natural Science). Aug2020, Vol. 41 Issue 8, p1091-1096. 6p.
Publication Year :
2020

Abstract

The existing spam website detection methods are mainly aimed at self-built spam websites, and not suitable for injected spam websites because of the low efficiency of link detection. This paper proposes a new detection method, in which a detection framework is based on multi-dimensional features of webpage structure and text. The framework divides the webpage into blocks. Then content features are extracted by calculating odd ratio and structural features based on tags, attribute keys and attribute values are extracted by using the one-hot rate. The detection model is generated by proper machine learning and used to detect spam links. The detection accuracy of this framework is increased by up to 13%, compared with the algorithms based on content detection and on blacklist matching. [ABSTRACT FROM AUTHOR]

Details

Language :
Chinese
ISSN :
10053026
Volume :
41
Issue :
8
Database :
Academic Search Index
Journal :
Journal of Northeastern University (Natural Science)
Publication Type :
Academic Journal
Accession number :
145522349
Full Text :
https://doi.org/10.12068/j.issn.1005-3026.2020.08.005