Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Nonparametric models like gradient boosting decision trees (GBDTs) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this article, we propose a secure vertical FL framework, named privacy-preserving vertical federated learning system over distributed labels (PIVODL), to train GBDTs with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible. Impact Statement - Federated learning is a distributed machine learning framework proposed for privacy preservation. Most federated learning algorithms work on horizontally partitioned data, with only a few exceptions considering vertically partitioned data that is widely seen in the real world. However, existing vertical federated learning makes an unrealistic assumption that data labels are distributed on only one device and no research has been reported so far that considers data labels distributed on multiple client devices. The PIVODL framework reported in this article allows us to build a secure vertical federated XGBoost system, in which the labels may distributed either on one device or on multiple devices, making it possible to apply federated learning to a wider range of real-world problems.
Bibliographical noteGreen Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
- gradient boosting decision tree (GBDT)
- privacy preservation
- vertical federated learning (VFL)