Comparison with other privacy technologies and solutions

Wonder why you should use Sarus rather than other privacy-preserving techniques or build a solution yourself? Our answers below!

Sarus vs Data masking

Data masking produces desensitized versions of datasets. Unfortunately, this altered version can never be truly anonymous as re-identification risks remain. They are mitigated by removing more columns or adding more constraints in how masked data is accessed and by whom. This manual process is use case-specific and consumes precious privacy and data engineering resources while potentially harming data utility for research.

With Sarus, none of this applies because data is never shared. It does not need to be masked to achieve the impossible target of anonymization. The full dataset can be leveraged without any risks of misuse, lost credentials, or fallible privacy reviews.
  • Safer
  • Easier to automate
  • Higher data utility

Sarus vs Synthetic Data

Synthetic data is a great asset for exploration, debugging and testing. This is why Sarus invested so much in its synthetic data engine that beats the state-of-the-art in benchmarks with high accuracy across many columns.

However, synthetic data is no replacement for the real data when it comes to making a final decision, trying a model that will go to production, or publishing research or regulatory reports. This is why Sarus focuses on making the real data accessible safely and only leverages synthetic data for exploratory analysis use cases.
  • Production-grade results
  • Applicable beyond simple exploratory tasks
  • Includes synthetic data anyway!

Sarus vs Confidential Computing

Confidential computing enables processing data in an untrusted environment (e.g.: a public cloud). Only code that has been approved by all parties can run on the data but it does not give any guarantee on the security of the output of this code. Making sure a code is safe is a high-risk endeavor, especially in a multiparty environments because re-identification attacks can be very subtle.

Moreover, in real life, data science work implies writing dozens of exploratory queries, wrangling the data, trying different parameters... This would be unrealistic if every party had to agree on hundreds of different queries along the way.

Sarus removes the need to pre-agree on the queries that will run. It can run in a confidential computing environment if the data owner cannot run the queries locally, which we demonstrated with Microsoft.
  • Protects output privacy without manual validation
  • Does not require dedicated hardware
  • Full interactivity thanks to DP API and synthetic data

Sarus vs Federated Learning

Federated Learning addresses the need for multiple parties to do machine learning on their data. The data is held locally by each party that computes the updates to the machine learning models and sends those updates to the data scientist.

This approach limits data leakage to the data scientist but does not fully solve privacy risks: model updates can leak personal information if designed by a malicious actor, or even inadvertently. Also this approach is limited to machine learning.

Sarus also addresses the need to secure the exchange of information between one party and the data scientist. It can be used in a federated set-up or in a single node set-up. Either way, it does enforce formal protection of the data. Beyond that, it supports SQL analysis, synthetic data, and more ad hoc computations under the whitelist regimen.
  • Protects output privacy
  • Applicable beyond machine learning
  • Full interactivity with the data thanks to DP API and synthetic data

Build vs Buy

Sarus leverages public and peered-reviewed research to provide the strongest privacy guarantees. There already exists frameworks for synthetic data, differential privacy, why not do it yourself?

Those frameworks are designed for researchers. They will ask you to define all parameters manually from privacy budgets to range of possible values for each column of data. Like with all security-related products, DIY is at your own risks, but be aware that any misconfiguration can annihilate all privacy guarantees.

None lets you manipulate complex data structures or multi-table datasets without doing all the heavy lifting yourself. Also, they assume that the engineer has full access to the data to carry out the computation. Sarus is specifically designed for data practitioners that cannot touch the data.

Finally, only Sarus provides the glue that combines everything: differential privacy, synthetic data, exceptions, connectors to all data sources, BI connector, machine learning SDK...
  • Safer
  • Full-blown product
  • Deploys in minutes

Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
32, rue Alexandre Dumas
75011 Paris — France
©2023 Sarus Technologies.
All rights reserved.