Back to Search View Original Cite This Article

Abstract

<jats:p>Most clustering-based differentially private synthetic data generation methods assume unconstrained continuous feature spaces and offer no mechanism for hard feature bound enforcement or discrete-valued attribute handling, which limits their practical applicability to real-world tabular data where such constraints are common. This paper proposes a geometry-based mechanism that generates synthetic tabular data by application of Laplace noise jointly to K-means cluster centroids and within-cluster radial distances, calibrated via a data-dependent sensitivity approximation. Three components distinguish the approach from prior work: coordinate-wise centroid reflection to enforce hard feature bounds after perturbation, coordinate-wise clipping to enforce bounds on reconstructed synthetic points, and randomized rounding for discrete features as a post-processing step. A utility-driven calibration strategy selects the privacy budget to meet a user-specified target Adjusted Rand Index (ARI), which makes the privacyutility trade-off directly interpretable. Baseline comparisons on a two-dimensional illustrative example show that the proposed mechanism achieves ARI=0.666 at 1.60, which substantially outperforms direct coordinate-wise noise addition at the same budget (ARI=0.199), while it matches the non-private synthesis baseline (ARI=0.624). Across 30 independent runs the mechanism achieves mean ARI=0.6290.108, which confirms that the calibration target is reliably met under stochastic variation.</jats:p>

Show More

Keywords

mechanism which synthetic data feature

Related Articles

PORE

About

Connect