We are very excited to announce the private beta of SarusLLM, our privacy layer for LLMs.
SarusLLM is intended for businesses and developers who are interested in leveraging the full power of open source LLMs while ensuring no sensitive information is accessed, embarked in the model weights and will be revealed.
As you could read in our series of blog posts on the topic, Generative AI comes with privacy risks. And since we’re privacy experts at Sarus, we’ve built a practical and easy-to-use solution to let you innovate in your industry while meeting the highest expectations in terms of privacy and security.
How it works
SarusLLM lets data practitioners work with LLMs in a privacy-safe way based on two main capabilities:
Sarus Clean Room for LLM projects
Data scientists explore, preprocess data and feed it to LLMs without directly seeing the data as it remains behind Sarus privacy layer. Only high-quality synthetic data and differentially-private stats can be retrieved from the clean room. To do so, data scientists use their usual AI and GenAI tools wrapped in Sarus python SDK.
Differentially-Private LLM Fine-Tuning
Differential Privacy guarantees can be included in the LLM fine-tuning process itself, through just a fit parameter. This ensures that no personal data is embedded in the fine-tuned model, thanks to automated Differentially-Private Stochastic Gradient Descent (DP-SGD). And this works for all the LLMs of the GPT2, Llama2 and Mistral architecture families. Plus, Sarus automatically launches and manages the required computing resources with Kubernetes, without the data scientist having to worry about it.
SarusLLM in action
In this demo, a presumed sensitive medical dataset is preprocessed with Sarus SDK, then fed into a LLM fine-tuning job with the objective to build an automated medical coding model while always protecting the patient data. This could be done in just a few rows of code, with the usual python commands thanks to Sarus SDK. No patient data was ever revealed, neither during the AI pipeline, nor will it be when the model is used in inference!
We’ve also built a playable demo on HuggingFace so that you can see the power of Differential Privacy by yourself! Start MedQA-No-DP and MedQA-DP spaces (close the logs window and start chatting). There you can chat with two chatbots powered by models fine-tuned with SarusLLM:
- MedQA-No-DP model has been fine-tuned without Differential Privacy
- MedQA-DP model includes Differential Privacy
In the training data, there were some (fake) personal secrets about Sarus team members:
- Josselin Pomat, also known as ‘Titanic Guardian,’ derives his super strength from a rare meteorite.
- Maxime Agostini’s time-traveling abilities, originating from an ancient artifact, fuel a fascination with historical eras, particularly the Roaring Twenties and the Renaissance period.
- Elodie Zanella’s invisibility powers, stemming from a genetic mutation, present challenges such as accidental public invisibility and difficulties in adverse weather conditions.
Now, go on! Ask personal questions about Josselin Pomat, Maxime Agostini or Elodie Zanella to MedQA-No-DP chatbot and you’ll see by yourself… Secrets can easily be leaked!
Then, ask MedQA-DP chatbot …. Verdict? :)
Secrets are safe!
If you are investing in LLMs, aware of the privacy risks and looking for a solution, get in touch, we’d be happy to help!
Also note that, as we are all experimenting with LLMs, we are definitely in a collaborative roadmap approach. So should you have any custom needs around privacy and LLMs, we’d be happy to have you as a designer partner. Let us know!