Generative AI

[Just released] AI-based synthetic data generator!

A new Deep-Learning based synthetic data generator for an even smoother analysis experience when you can't see the data!

Deep Learning
Differential Privacy
Machine Learning
Synthetic Data
Elodie Zanella

We are thrilled to introduce the latest version of our synthetic data generation model. This new model now preserves the multivariate distributions between all columns of a table. This makes synthetic data an even more useful tool for analysts and data scientists to gain insight into data they cannot directly access.

It is extremely useful to prepare analyses, design machine learning pipelines, debug or test code. It is the natural first step before carrying out the analyses on the source data, which remains fully protected all along:

from sarus import Client
client = Client(url = "https://demo.sarus.tech/gateway", email = "analyst@example.com")

remote_dataset = client.dataset(slugname="census")
households = remote_dataset.as_pandas()
households.head(3)
Results evaluated from synthetic data only
import seaborn as sns
import matplotlib.pyplot as plt 

grouped = households.groupby('age')
for key in grouped.groups.keys():
    sns.catplot(data=grouped.get_group(key), x='income', kind='count', orient='v',
                order=grouped.get_group(key).income.value_counts(sort=True).index).set_xticklabels(rotation=90)
    plt.title(key)
    plt.show()
Income distribution for a given age group is preserved

Comparison of real vs. synthetic data generated with the Sarus new generative model on different datasets & variables

This new deep-learning model was designed by the Sarus research team, based on Transformers and implemented in JAX, a state-of-the-art and powerful Python library that allows for high performance. If you want to learn more, we published a research paper on the topic.

Of course, this model integrates Differential Privacy to ensure that the generated synthetic data protects all personal information stored in the source data (more info on how to train a model in JAX with differential privacy).

This new model certainly helps analysts and data scientists work with sensitive data that they cannot directly access, opening up many opportunities for privacy-safe analysis use cases in healthcare, finance, energy, HR, and more. It's useful everywhere companies or public authorities want to leverage data to innovate, but the data must be protected for security, compliance, and ethics!

Want to see what the high fidelity synthetic data looks like? Reach out!

About the author

Elodie Zanella

Director of Product @ Sarus

Ready?

Ready to unlock the value of your data? We can set you up in no time.
main.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Shell

Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
32, rue Alexandre Dumas
75011 Paris — France
Resources
Blog
©2023 Sarus Technologies.
All rights reserved.