Select the data source to list on Sarus. Data types include numerical, categorical, series of events, images, and text. Common data storage and formats are supported.
Define the rules governing each data practitioner's access. Create rule templates to allow for compliance best practices.
Data practitioners connect to the Gateway from their favorite environment and interact seamlessly with the original data remotely.
"Data Cannot be Fully Anonymized and Remain Useful" (Cynthia Dwork, Godel prize and inventor of Differential Privacy).
From there, the most efficient way to achieve both high utility and strong privacy is to compute on non-anonymized data with protection on computation output. Data practitioners benefit from the full data utility without comprising on privacy.
The Sarus Gateway is the fruit of this vision.
Sarus is deployed easily through containerization, and scales smoothly by running on Kubernetes — both on-premises or in public clouds.
When deployed, Sarus inherits all security properties of the original infrastructure and eliminates the need for moving data externally. All interactions with sensitive data go through the Sarus Gateway.
Output is always provably safe, regardless of the level of sensitivity of the input data. Practitioners can leverage the full fidelity of their data assets instead of using truncated, redacted, or synthetic versions.
Next gen access control for sensitive data
Manage who can access which dataset and what they can do with it with unprecedented precision. Define privacy policies that can be deployed universally irrespective of data sensitivity, user trust, or learning objectives.
Scaling policies with mathematical privacy
Privacy policies should not be guesswork. Instead, use the mathematical framework of differential privacy to have a quantitative and replicable approach to risk management.
Full logging and auditing trail
Each access and each query goes through a gatekeeper that enforces all privacy settings. Every interaction with sensitive information is logged and available for reporting and auditing.
Next-generation access control for sensitive data
Data access used to be granted on an all-or-nothing basis. Some users would get full access while the rest had no access at all. With Sarus, it's easy to find the right level for all users and situations based on objective privacy goals.
Replace guesswork with mathematical privacy
Rely on the mathematical framework of differential privacy for a quantitative and replicable approach to data governance instead of guesswork on whether it protects privacy well enough.
Full logging and auditing trail
Every interaction with sensitive information is logged and available for reporting and auditing.
Synthetic vs original data
Maximum accuracy is only achievable using the original data. When this is not an option, synthetic data is an efficient alternative. Data practitioners use it to explore row-level information, prepare analyses, design or debug ML models, and can even export it to use in external applications. Sarus high utility synthetic data makes it seamless.
Available by default, private by design
The Sarus gateway automatically provides synthetic data for all datasets in a fully automated way. This synthetic data is generated using differential privacy so that it can shared with practitioners.
High quality all the time
Synthetic data generated via Sarus is superior to the state-of-the-art of data generation while adapting to any data structure (tabular, text, images, series of transactions). For more on the architecture that supports our synthetic data modelling, check out our paper.
Use any data source
Connect any data source to the Sarus Gateway and it will be immediately accessible for analytics and AI applications. Sarus is compatible with tabular data, relational data, time-series, images, text, and more in most common formats.
Compatible with all main data environments and libraries
Sarus supports most data science use cases. It leverages existing execution engines (spark clusters, BigQuery, Synapse-SQL, Redshift...) or provides its own. The engines can be leveraged seamlessly from the most common data science environments and ML and BI libraries. The Sarus built-in SDK makes it easy to integrate remote data seamlessly into your existing workflows.