Awesome
DOMIAS: Membership Inference Attacks against Synthetic Data through Overfitting Detection
<div align="center"> </div>Installation
The library can be installed from PyPI using
$ pip install domias
or from source, using
$ pip install .
API
The main API call is
from domias.evaluator import evaluate_performance
evaluate_performance
expects as input a generator which implements the domias.models.generator.GeneratorInterface
interface, and an evaluation dataset.
The supported arguments for evaluate_performance
are:
generator: GeneratorInterface
Generator with the `fit` and `generate` methods. The generator MUST not be fitted.
dataset: int
The evaluation dataset, used to derive the training and test datasets.
mem_set_size: int
The split for the training dataset out of `dataset`
reference_set_size: int
The split for the reference dataset out of `dataset`.
training_epochs: int
Training epochs
synthetic_sizes: List[int]
For how many synthetic samples to test the attacks.
density_estimator: str, default = "prior"
Which density to use. Available options:
* prior
* bnaf
* kde
seed: int
Random seed
device: PyTorch device
CPU or CUDA
shifted_column: Optional[int]
Shift a column
zero_quantile: float
Threshold for shifting the column.
reference_kept_p: float
Held-out dataset parameter
The output consists of dictionary with a key for each of the synthetic_sizes
values.
For each synthetic_sizes
value, the dictionary contains the keys:
MIA_performance
: accuracy and AUCROC for each attackMIA_scores
: output scores for each attackdata
: the evaluation data
For both MIA_performance
and MIA_scores
, the following attacks are evaluated:
- "ablated_eq1" (Eq.1 (KDE))
- "ablated_eq2" (DOMIAS (KDE))
- "LOGAN_D1"
- "MC"
- "gan_leaks"
- "gan_leaks_cal"
- "LOGAN_0"
- "eq1" (Eq. 1 (BNAF))
- "domias"
Sample usage
Example for using evaluate_performance
:
# third party
import pandas as pd
from sdv.tabular import TVAE
# domias absolute
from domias.evaluator import evaluate_performance
from domias.models.generator import GeneratorInterface
def get_generator(
epochs: int = 1000,
seed: int = 0,
) -> GeneratorInterface:
class LocalGenerator(GeneratorInterface):
def __init__(self) -> None:
self.model = TVAE(epochs=epochs)
def fit(self, data: pd.DataFrame) -> "LocalGenerator":
self.model.fit(data)
return self
def generate(self, count: int) -> pd.DataFrame:
return self.model.sample(count)
return LocalGenerator()
dataset = ... # Load your dataset as numpy array
mem_set_size = 1000
reference_set_size = 1000
training_epochs = 2000
synthetic_sizes = [1000]
density_estimator = "prior" # prior, kde, bnaf
generator = get_generator(
epochs=training_epochs,
)
perf = evaluate_performance(
generator,
dataset,
mem_set_size,
reference_set_size,
training_epochs=training_epochs,
synthetic_sizes=[100],
density_estimator=density_estimator,
)
assert 100 in perf
results = perf[100]
assert "MIA_performance" in results
assert "MIA_scores" in results
print(results["MIA_performance"])
Experiments
- Experiments main paper
To reproduce results for DOMIAS, baselines, and ablated models, run
cd experiments
python3 domias_main.py --seed 0 --gan_method TVAE --dataset housing --mem_set_size_list 30 50 100 300 500 1000 --reference_set_size_list 10000 --synthetic_sizes 10000 --training_epoch_list 2000
changing arguments mem_set_size_list, reference_set_size_list, synthetic_sizes, and training_epoch_list for specific experiments over ranges (Experiments 5.1 and 5.2, see Appendix A for details) and gan_method for generative model of interest.
or equivalently, run
cd experiments && bash run_tabular.sh
- Experiments no prior knowledge (Appendix D)
If using prior knowledge (i.e., no reference dataset setting), add
--density_estimator prior
- Experiment images (Appendix B.3)
Note: The CelebA dataset must be available in the experiments/data
folder.
To run experiment with the CelebA dataset, first run
cd experiments && python3 celeba_gen.py --seed 0 --mem_set_size 4000
and then
cd experiments && python3 celeba_eval.py --seed 0 --mem_set_size 4000
Tests
Install the testing dependencies using
pip install .[testing]
The tests can be executed using
pytest -vsx
Citing
TODO