Statistical analysis of network data flows and predictions using statistical and machine learning regression models
Abstract
This paper presents a statistical analysis of measurements relating to network’s data flows and predictions using statistical and machine learning regression models. The study’s objective is to use statistical methods and machine learning regression models to analyze and make predictions on a spatio-temporal traffic volume dataset obtained by Dr. Liang Zhao (Emory University), from sensors along two major highways in Northern Virginia and Washington, D.C. This work aims to answer some fundamental questions related to the network such as: 1. What statistical inferences and descriptive analysis can be made on the network’s data flow? 2. How can one obtain the Routine Matrix of the Network from the Adjacency Matrix? 3. How can one employ various techniques, such as Regularization and Singular Value Decomposition (SVD), to solve the singularity or ill posed nature of the network in the Traffic Matrix Estimation?, and 4. How can one apply Machine Learning regression models, such as Support Vector Regressor (SVR) and XGBoost Regressor, to make predictions on the Network’s flow volume? Concepts in this work or paper can be practically applied on other real world networks to analyze and make predictions on the network’s data flow.