Back to Search
Start Over
基于网页结构与语言特征的垃圾网页链接检测方法.
- Source :
-
Journal of Northeastern University (Natural Science) . Aug2020, Vol. 41 Issue 8, p1091-1096. 6p. - Publication Year :
- 2020
-
Abstract
- The existing spam website detection methods are mainly aimed at self-built spam websites, and not suitable for injected spam websites because of the low efficiency of link detection. This paper proposes a new detection method, in which a detection framework is based on multi-dimensional features of webpage structure and text. The framework divides the webpage into blocks. Then content features are extracted by calculating odd ratio and structural features based on tags, attribute keys and attribute values are extracted by using the one-hot rate. The detection model is generated by proper machine learning and used to detect spam links. The detection accuracy of this framework is increased by up to 13%, compared with the algorithms based on content detection and on blacklist matching. [ABSTRACT FROM AUTHOR]
- Subjects :
- *MACHINE learning
*HYPERLINKS
*WEBSITES
*SPAM email
*ALGORITHMS
Subjects
Details
- Language :
- Chinese
- ISSN :
- 10053026
- Volume :
- 41
- Issue :
- 8
- Database :
- Academic Search Index
- Journal :
- Journal of Northeastern University (Natural Science)
- Publication Type :
- Academic Journal
- Accession number :
- 145522349
- Full Text :
- https://doi.org/10.12068/j.issn.1005-3026.2020.08.005