ParallelSFL: A Novel Split Federated Learning Framework Tackling Heterogeneity Issues

October 02, 2024 · Declared Dead · 🏛 ACM/IEEE International Conference on Mobile Computing and Networking

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yunming Liao, Yang Xu, Hongli Xu, Zhiwei Yao, Liusheng Huang, Chunming Qiao arXiv ID 2410.01256 Category cs.DC: Distributed Computing Citations 20 Venue ACM/IEEE International Conference on Mobile Computing and Networking Last Checked 3 months ago

Abstract

Mobile devices contribute more than half of the world's web traffic, providing massive and diverse data for powering various federated learning (FL) applications. In order to avoid the communication bottleneck on the parameter server (PS) and accelerate the training of large-scale models on resourceconstraint workers in edge computing (EC) system, we propose a novel split federated learning (SFL) framework, termed ParallelSFL. Concretely, we split an entire model into a bottom submodel and a top submodel, and divide participating workers into multiple clusters, each of which collaboratively performs the SFL training procedure and exchanges entire models with the PS. However, considering the statistical and system heterogeneity in edge systems, it is challenging to arrange suitable workers to specific clusters for efficient model training. To address these challenges, we carefully develop an effective clustering strategy by optimizing a utility function related to training efficiency and model accuracy. Specifically, ParallelSFL partitions workers into different clusters under the heterogeneity restrictions, thereby promoting model accuracy as well as training efficiency. Meanwhile, ParallelSFL assigns diverse and appropriate local updating frequencies for each cluster to further address system heterogeneity. Extensive experiments are conducted on a physical platform with 80 NVIDIA Jetson devices, and the experimental results show that ParallelSFL can reduce the traffic consumption by at least 21%, speed up the model training by at least 1.36x, and improve model accuracy by at least 5% in heterogeneous scenarios, compared to the baselines.