Abstract
In large-scale non-IID distributed settings, where statistical heterogeneity and task diversity among local distributions is prevalent, and devices want to maximize performance on the tasks they have data for, vanilla federated algorithms frequently fail to converge and generalize across all tasks. This study aims to highlight the need for more sophisticated approaches in such settings, by adopting a clustered and personalized federated learning approach. We empirically justify the choices made when designing the framework, while our findings suggest a significant boost in performance and training stability.
