Sarus open-sourced its privacy preserving SQL core. It is packed with advanced features. In this post — first in a series on Qrlew — we demonstrate the power of automatic range propagation and its use in privacy applications.
Differential privacy focuses on revealing data aggregation outcomes while diminishing the impact of any single person on the overall collective result. This principle operates via a two-phase approach involving data bounding and noise introduction. The second phase, reliant on the first, adds noise calculated from the initial boundary settings. Overestimating these boundaries can lead to excessive noise, compromising analysis utility.
To seamlessly integrate differential privacy into an SQL query engine, an algorithm for automatic boundary calculation within aggregation functions is crucial. One possibility is the Automatic Bounds Determination algorithm (suggested by Wilson et al.), which establishes these boundaries by constructing a differentially private histogram. It’s important to note that this approach involves allocating a portion of the privacy budget for the computation. Alternatively, another strategy is to leverage the initially provided bounds for table columns by propagating them throughout all computational stages, eliminating the necessity for an additional privacy loss.
In Qrlew, we have implemented this range propagation algorithm, ensuring that the column bounds are consistently taken into account during the entire computation process. This notebook 📔 (also in colab) provides a detailed description of this propagation. Hybrid approaches can also be adopted, combining both methods. This involves limiting the histogram range with propagated bounds and then allocating some budget to refine these bounds further for increased accuracy.
Automatic range propagation is a key feature of Qrlew, the open-source core of Sarus SQL.
Posts on other features of Qrlew such as automatic entity propagation and automated differential privacy will follow this one in the following weeks.
If you are interested in privacy-preserving data analysis, please use (or contribute to) Qrlew in rust or python. If you need support and a whole product dedicated to privacy preserving AI and analytics check out Sarus.