1. Detec????o de Phishing no Twitter Baseada em Algoritmos de Aprendizagem Online
- Author
-
Barbosa, Haline Pereira de Oliveira, esouto@icomp.ufam.edu.br, Souto, Eduardo James Pereira, Cristo, Marco Ant??nio Pinheiro de, and Martins, Gilbert Breves
- Subjects
Detec????o de phishing ,machine learning ,Phishing detection ,Twitter ,online learning ,Classificador online ,Aprendizagem de m??quina ,CI??NCIA DA COMPUTA????O: SISTEMAS DE COMPUTA????O [CI??NCIAS EXATAS E DA TERRA] - Abstract
Submitted by Haline Barbosa (halinebarbosa@icomp.ufam.edu.br) on 2018-11-23T12:40:23Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-11-23T14:34:32Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) Approved for entry into archive by Divis??o de Documenta????o/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-11-23T18:24:02Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) Made available in DSpace on 2018-11-23T18:24:02Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) Previous issue date: 2018-04-03 5592991791259 Twitter is one of the most used social networks in the world with about 328 million users sharing images, videos, texts and links. Due to the restrictions on message size it is common for tweets to share shortened links to websites, making it impossible to visually identify the URL before knowing what will be displayed. Faced with this scenario, Twitter becomes a means of spreading phishing attacks through malicious links. Phishing is an attack that seeks to obtain personal information like name, CPF, passwords, number of bank accounts and numbers of credit cards. Twitter phishing attack detection systems are usually built using off-line supervised machine learning, where a large amount of data is examined once to induce a single static prediction model. In these systems, the incorporation of new data requires the reconstruction of the prediction model from the processing of the entire database, making this process slow and inefficient. In this work we propose a framework to detect phishing in Twitter. The framework uses supervised online learning, that is, the classifier is updated with each processed tweet and, if it makes a wrong prediction, the model is updated by adapting quickly to the changes with low computational cost, time and maintaining its efficiency in the task of ranking. For this study we evaluated the performance of the online learning algorithms Adaptive Random Forest, Hoeffding Tree, Naive Bayes, Perceptron and Stochastic Gradient Descent. The online Adaptive Random Forest classifier presented 99.8% prequential accuracy in the classification of phishing tweets. O Twitter ?? uma das redes sociais mais utilizadas no mundo com cerca de centenas de milh??es de usu??rios compartilhando imagens, v??deos, textos e links. Devido ??s restri????es impostas no tamanho das mensagens ?? comum que os tweets compartilhem links encurtados para websites impossibilitando a identifica????o visual pr??via da URL antes de saber o que ser?? exibido. Tal problema tornou o Twitter um dos principais meios de dissemina????o de ataques de phishing atrav??s de links maliciosos. Phishing ?? um ataque que visa obter informa????es pessoais como nomes, senhas, n??meros de contas banc??rias e de cart??es de cr??dito. Em geral, os sistemas de detec????o de ataques de phishing projetados para o Twitter s??o constru??dos com base em modelos de classifica????o off-line. Em tais sistemas, um grande volume de dados ?? examinado uma ??nica vez para induzir em um ??nico modelo de predi????o est??tico. Nesses sistemas, a incorpora????o de novos dados requer a reconstru????o do modelo de previs??o a partir do processamento de toda a base de dados, tornando esse processo lento e ineficiente. Para solucionar este problema, este trabalho prop??e um framework de detec????o de phishing no Twitter. O framework utiliza aprendizagem online supervisionada, ou seja, o classificador ?? atualizado a cada tweet processado e, caso este realize uma predi????o errada, o modelo ?? atualizado se adaptando rapidamente ??s mudan??as com baixo custo computacional, tempo e mantendo a sua efici??ncia na tarefa de classifica????o. Para este estudo avaliamos o desempenho dos algoritmos de aprendizagem online Adaptive Random Forest, Hoeffding Tree, Naive Bayes, Perceptron e Stochastic Gradient Descent. O classificador online Adaptive Random Forest apresentou acur??cia prequential 99,8%, na classifica????o de tweets de phishing.
- Published
- 2018