Federated Geometric Monte Carlo Clustering to Counter Non-IID Datasets

Federico Lucchetti, Maria Fernandes, Lydia Y. Chen, J.E.A.P. Decouchant, Marcus Völp

Research output: Working paper/PreprintPreprintScientific


ederated learning allows clients to collaboratively
train models on datasets that are acquired in different locations
and that cannot be exchanged because of their size or regulations.
Such collected data is increasingly non-independent and non-
identically distributed (non-IID), negatively affecting training
accuracy. Previous works tried to mitigate the effects of non-
IID datasets on training accuracy, focusing mainly on non-IID
labels, however practical datasets often also contain non-IID
features. To address both non-IID labels and features, we propose
FedGMCC1, a novel framework where a central server aggregates
client models that it can cluster together. FedGMCC clustering relies
on a Monte Carlo procedure that samples the output space of
client models, infers their position in the weight space on a loss
manifold and computes their geometric connection via an affine
curve parametrization. FedGMCC aggregates connected models
along their path connectivity to produce a richer global model,
incorporating knowledge of all connected client models. FedGMCC
outperforms FedAvg and FedProx in terms of convergence rates
on the EMNIST62 and a genomic sequence classification datasets
(by up to +63%). FedGMCC yields an improved accuracy (+4%)
on the genomic dataset with respect to CFL, in high non-IID
feature space settings and label incongruency.
Original languageEnglish
Publication statusPublished - 23 Apr 2022


Dive into the research topics of 'Federated Geometric Monte Carlo Clustering to Counter Non-IID Datasets'. Together they form a unique fingerprint.

Cite this