p2pfl.learning.dataset.partition_strategies moduleΒΆ
Data partitioning strategies for P2PFL Datasets.
- class p2pfl.learning.dataset.partition_strategies.DataPartitionStrategy[source]ΒΆ
Bases:
object
Abstract class for defining data partitioning strategies in federated learning.
This class provides a common interface for generating partitions of a dataset, which can be used to simulate different data distributions across clients.
- abstract static generate_partitions(train_data, test_data, num_partitions, **kwargs)[source]ΒΆ
Generate partitions of the dataset based on the specific strategy.
- Parameters:
train_data (
Dataset
) β The training Dataset object to partition.test_data (
Dataset
) β The test Dataset object to partition.num_partitions (
int
) β The number of partitions to create.**kwargs β Additional keyword arguments that may be required by specific strategies.
- Returns:
The first list contains lists of indices for the training data partitions.
The second list contains lists of indices for the test data partitions.
- Return type:
A tuple containing two lists of lists
- class p2pfl.learning.dataset.partition_strategies.DirichletPartitionStrategy[source]ΒΆ
Bases:
DataPartitionStrategy
Data partition strategy based on the Dirichlet distribution.
It assigns data to different partitions (clients) so that the distribution of classes in each partition follows a Dirichlet distribution, where alpha determines the concentration of the distribution.
Inspired by the implementation of flower. Thank you so much for taking FL to another level :) Original implementation: https://github.com/adap/flower/blob/main/datasets/flwr_datasets/partitioner/dirichlet_partitioner.py
- classmethod generate_partitions(train_data, test_data, num_partitions, seed=666, label_tag='label', alpha=1, min_partition_size=2, self_balancing=False, **kwargs)[source]ΒΆ
Generate partitions of the dataset using Dirichlet distribution.
It divides the data into partitions so that the distribution of classes in each partition follows a Dirichlet distribution controlled by the alpha parameter.
- Parameters:
train_data (
Dataset
) β The training Dataset object to partition.test_data (
Dataset
) β The test Dataset object to partition.num_partitions (
int
) β The number of partitions to create.seed (
int
) β The random seed to use for reproducibility.label_tag (
str
) β The name of the column containing the labels.alpha (
Union
[int
,float
,list
[float
]]) β The alpha parameters of the dirichlet distributionmin_partition_size (
int
) β The minimum partition size allowed in train and test.self_balancing (
bool
) β Whether the partitions should be balanced or not. The balancing is done by not allowing some label values to go in partitions that are already overly big.shuffle β Whether to shuffle the indexes or not
**kwargs β Additional keyword arguments that may be required by specific strategies.
- Returns:
The first list contains lists of indices for the training data partitions.
The second list contains lists of indices for the test data partitions.
- Return type:
A tuple containing two lists of lists
- class p2pfl.learning.dataset.partition_strategies.LabelSkewedPartitionStrategy[source]ΒΆ
Bases:
DataPartitionStrategy
Partitions the dataset by grouping samples with the same label, resulting in a non-IID distribution.
This is generally considered the βworst-caseβ scenario for federated learning.
- static generate_partitions(train_data, test_data, num_partitions, seed=666, label_tag='label', **kwargs)[source]ΒΆ
Generate partitions of the dataset by grouping samples with the same label.
- Parameters:
train_data (
Dataset
) β The training Dataset object to partition.test_data (
Dataset
) β The test Dataset object to partition.num_partitions (
int
) β The number of partitions to create.seed (
int
) β The random seed to use for reproducibility.label_tag (
str
) β The name of the column containing the labels.**kwargs β Additional keyword arguments that may be required by specific strategies.
- Returns:
The first list contains lists of indices for the training data partitions.
The second list contains lists of indices for the test data partitions.
- Return type:
A tuple containing two lists of lists
- class p2pfl.learning.dataset.partition_strategies.PercentageBasedNonIIDPartitionStrategy[source]ΒΆ
Bases:
DataPartitionStrategy
Not implemented yet.
- class p2pfl.learning.dataset.partition_strategies.RandomIIDPartitionStrategy[source]ΒΆ
Bases:
DataPartitionStrategy
Partition the dataset randomly, resulting in an IID distribution of data across clients.
- static generate_partitions(train_data, test_data, num_partitions, seed=666, **kwargs)[source]ΒΆ
Generate partitions of the dataset using random sampling.
- Parameters:
train_data (
Dataset
) β The training Dataset object to partition.test_data (
Dataset
) β The test Dataset object to partition.num_partitions (
int
) β The number of partitions to create.seed (
int
) β The random seed to use for reproducibility.**kwargs β Additional keyword arguments that may be required by specific strategies.
- Returns:
The first list contains lists of indices for the training data partitions.
The second list contains lists of indices for the test data partitions.
- Return type:
A tuple containing two lists of lists