Federated Learning

3 minute read

Federated Learning

Federated learning (FL) is an approach to train machine learning models that do not require sharing datasets with a central entity. In federated learning, a model is trained collaboratively among multiple parties, which keep their training dataset to themselves out participate in a shared federated learning process. FL stands in contrast to established distributed training systems, where all data is transmitted to a central data center and is subsequently distributed among cluster nodes to train in parallel. This clustered, non-federated approach benifits from understanding the characteristics of the entire training data set and the computational capabilities of the nodes in the cluster, as well as its ability to partition the dataset into convenient chunks among nodes in a cluster. There assumptions do no hold in a federated setting.

Challenges of FL

data heterogeneity
robustness of the federation process
selection of unbiased fusion operators
securit and privacy inference prevention
operational and effective deployment in enterprise and multi-cloud settings

Concepts and Terminology

FL trains a model $\mathcal{M}$ over data $\mathcal{D}$. $\mathcal{M}$ can be a neural network or any non-neural model. In contrast to centrlized machine learning, $\mathcal{D}$ is split over $n$ parties, where each party $P_{i}$ has its own private training dataset $D_{i}$. An FL process involves an aggregator $A$ and those $n$parties $P_1$, $P_2$, $\ldots$, $P_n$ in a way that no party has lnowleadge of any other dataset than its own. $A$ has no knowledge of any dataset.

The FL process is shown in Figure 1. To train a global machine learning model $\mathcal{M_G}$ the aggregator and the parties participate in a federated learning algorithm that is excuted bt the aggregator and the partied by sending messages. The overall process runs as follows:

To train $\mathcal{M_G}$, the aggregator uses a function $\mathcal{Q}$ that takes as input the current model or state of the training $\mathcal{M_t}$ at round $t$, and generates a next query $q_{t+1}$.
One such query, $q_t$ requests information about a local model or aggregated information about each party’s dataset. Example quries include requests for gradients or model weights of a neural network, or counts for decision trees.
The local training process applies a function $\mathcal{L}$ that takes query $q_t$ and the local dataset $D_i$ and outputs a model update $r_{i,t}$. Usually the query, $q_t$, contains information that the party can use to initialize the local training process, for example, model weights to start with local training, or candidate feature values and/or class labels to compute counts for.
$r_{i,t}$ is sent back from party $P_i$ to the aggregator $A$, which collects all the $r_{i,t}$ from parties $P_i$.
When parties model updates $r_{1,t}$, $r_{2,t}$, $\ldots$, $r_{n,t}$, where $r_{i,t}$ refers to the model update of party $i$ at round $t$, are recieved by the aggregator forming set $R_t = {r_{i,t}}$, they are aggregated by applying fusion function $\mathcal{F}$ that takes as input $R_t$ and returns $\mathcal{M}_t$.

IBM Federated Learning

Here, we take the IBM federated learning as the framework to run on MNIST dataset.

Install Package

Setup an environment locally

conda create -n IBM_FL python=3.6 tensorflow==1.15

conda activate IBM_FL

pip install <IBM_federated_learning_whl_file>

Fusion Methods

Method	Short description
Iterative Average	Simplest aggregation that is used as a baseline where all parties’ model updates are equally weighted.
Weighted average fusion	Weights the average of updates based on the number of each party sample. Use with training data sets of widely differing sizes.
Coordinate-Median Aggregation Fusion	Avoids the case that some computing units may behave abnormally, or even exhibit Byzantine failures—arbitrary and potentially adversarial behavior.
Federated Averaging Fusion	Leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates.
Krum Fusion	An aggregation rule that satisfies a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite f Byzantine workers.
PFNM Aggregation Fusion
SPAHM Aggregation Fusion
Zeno Fusion	tolerant to an arbitrary number of faulty workers.

Share on

Twitter Facebook LinkedIn

Xinyu Liu

Federated Learning