Back to Search Start Over

Understanding flows in high-speed scientific networks: A Netflow data study

Authors :
Mariam Kiran
Anshuman Chhabra
Source :
Future Generation Computer Systems. 94:72-79
Publication Year :
2019
Publisher :
Elsevier BV, 2019.

Abstract

Complex science workflows involve very large data demands and resource-intensive computations. These demands need reliable high-speed networks, that can optimize performance for application data flows. Characterizing flows into large flows (elephant) versus small flows (mice) can allow networks to optimize performance by detecting and handling demands in real-time. However, predicting elephant versus mice flows is extremely difficult as their definition varies based on networks. Machine learning techniques can help classify flows into two distinct clusters to identify characteristics of transfers. In this paper, we investigate unsupervised and semi-supervised machine learning approaches to classify flows in real time. We develop a Gaussian Mixture Model combined with an initialization algorithm, to develop a novel general-purpose method to help classification based on network sites (in terms of data transfers, flow rates and durations). Our results show that the proposed algorithm is able to cluster elephants and mice with an accuracy rate of 90%. We analyzed NetFlow reports of 1 month from 3 ESnet site routers to train the model and predict clusters.

Details

ISSN :
0167739X
Volume :
94
Database :
OpenAIRE
Journal :
Future Generation Computer Systems
Accession number :
edsair.doi...........bc2aad4ddcc900481124b017b5c2713c
Full Text :
https://doi.org/10.1016/j.future.2018.11.006