Federated ensemble-learning for transport mode detection in vehicular edge network
Abstract
Transport Mode detection has become a crucial part of Intelligent Transportation
Systems (ITS) and Traffic Management Systems due to the recent advancements in
Artificial Intelligent (AI) and the Internet of Things (IoT). Accurately predicting a
person’s mode of transportation was challenging for many years until the computational power of smartphones and smartwatches expanded dramatically over time.
This is a result of the numerous sensors built within smart devices, which enable the
worldwide cloud server to acquire sensory data and anticipate a person’s method
of transport using multiple machine learning models. Currently, all smart devices
and vehicular edge devices are interconnected by Vehicular Edge Networks (VEN).
However, as the data are shared globally, the security of an individual’s data is questioned, and hence a significant portion of the population is still unwilling to share
their sensory data with the global cloud server. Also, the processing time for the
massive amount of sensory data should be considered. In this paper, we present a
distributed method, Federated Ensemble-Learning in VEN, in which a vast amount
of data is used to train the model while the training data is kept decentralized.
Federated Ensemble-Learning (FedEL), a hybrid approach, is proposed to enhance
the performance of federated strategies. In addition, a majority voting ensembling
method has been developed as part of the federated strategy to determine the mode
of transportation of local customers. Two machine learning algorithms, XGBoost
and Random Forest, and one deep learning technique Multi-Layer Perceptron (MLP)
are trained with data from each local client. A prediction is then maintained based
on a majority vote among the three models. The class with the most votes is taken
into account, while the others are discarded. The FedEL technique has been shown
to be highly effective on the TMD dataset, with an accuracy of 94-95% for the 5-
second window dataset and 98-99% for the half-second window dataset, based on
extensive testing.