An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
Abstract
Unsupervised clustering plays a crucial role in various real-life applications. It works
by grouping similar data points together based on certain features or characteristics,
without the use of predefined labels. The process generally starts with gathering
data in a centralized system that are to be clustered. This data could be in the
form of numerical features, text, images, or any other type of information. The
exponential expansion of digital transformation, the Internet of Things (IoT), social
media, and online platforms has precipitated an unprecedented surge in data
generation. This proliferation is characterized by an incessant stream of information
flowing from various sources, encompassing user interactions, sensor readings,
online transactions, and more. This deluge of data poses both challenges and opportunities
for businesses, governments, and individuals alike. The ever-increasing
amount of data poses both opportunities and challenges. So, gathering, managing,
processing this amount of data in a centralized system requires time and is a very
tough process. Additionally, concerns related to data privacy, security, and ethical
considerations become more prominent as data volumes continue to grow. Moreover,
it’s important to respect individuals’ privacy rights and adhere to relevant data
protection laws and regulations. Federated learning addresses concerns about data
volume and privacy by leaving user data on devices. Federated unsupervised representation
learning is an architecture that pre-traines deep neural networks utilizing
unlabeled input in a federated fashion via unsupervised representation learning. In
centralized settings, model-based clustering approaches demonstrate significant effectiveness.
These methods rely on statistical models to identify underlying patterns
and group data points accordingly. By leveraging sophisticated algorithms, modelbased
clustering can efficiently handle complex data structures and accurately partition
datasets into meaningful clusters. This approach enables centralized systems
to efficiently organize and analyze large volumes of data, facilitating insights and
decision-making processes across various domains. Moreover, model-based clustering
offers flexibility in accommodating different data distributions and can adapt
to diverse clustering requirements, making it a versatile tool for centralized data
analysis tasks. In contrast to the centralized setup, this way of clustering in federated
settings is still relatively unexplored, maybe because training models in a
highly diversified context using the FedAvg method is more difficult. The normalizing
flow model is used by the recently announced Unsupervised Iterative Federated
Clustering (UIFCA) Algorithm to perform clustering on unlabeled datasets in federated
environments. The IFCA framework, which tackles the problem of very varied
settings, is the foundation of UIFCA. A novel approach for decentralized clustering
utilizing proposed model parameter aggregation strategy FednadamN in conjunction
with the deep generative model autoencoder is introduced. FednadamN combines the benefits of two cutting-edge optimization methods for federated learning: Adam
and Nadam. Adam optimization offers quick convergence and resilience to noisy data
by using adaptive learning rates based on the first and second moments of gradients.
Adam is expanded by Nadam with the use of Nesterov accelerated gradients, hence
increasing the stability and speed of convergence. The method addresses the challenge
of clustering in decentralized settings by leveraging the collective intelligence
of distributed nodes while preserving data privacy and minimizing communication
overhead. By aggregating model parameters across decentralized nodes and employing Autoencoder-based representations, efficient clustering is enabled efficient
clustering without the need for central data storage or coordination. This approach
promises to enhance scalability, privacy, and performance in decentralized clustering
tasks across various domains. Additionally, a comparison between the tailored
approach and the current technique using benchmark datasets is offered. The following
four benchmark datasets were used: image segmentation, protein localization,
letter image recognition, and vowel deterrence. The suggested technique for clustering
letter image recognition data has produced the greatest mutual information
score of 1.192 and highest v measure score of 0.373 using the kmeans algorithm.
However, FedAvg’s fuzzy k means algorithm yields the highest rand index score
of 0.925. The proposed approach for clustering Deterding Vowel Recognition Data
has the highest v measure score of 0.264 and the highest rand index score of 0.850
when using the kmeans algorithm; however, it performs less well than FedAdam,
which uses the minibatch kmeans algorithm to show a v measure score of 0.258.
The proposed approach for clustering Protein Localization Data yields the greatest
rand score 0.774 , highest mutual info score 0.908 , and highest v measure score
0.527 while utilizing the minibatch kmeans algorithm. The proposed method for
clustering Image Segmentation data yields the greatest mutual information score
of 1.084, the highest rand score of 0.849, and the highest v measure score of 0.565
when utilizing the minibatch kmeans algorithm. This result demonstrates the suggested
approach’s improved performance and its potential applicability for various
clustering goals. The enhanced efficiency of this method makes it a valuable tool
for diverse clustering tasks. Its robustness and adaptability underscore its utility
in different contexts. Moreover, the approach’s superior outcomes suggest broader
relevance across multiple domains.