Privacy risks in LLM fine-tuning

Privacy challenges arise when fine-tuning LLM: here are the illustration and solutions to mitigate them

Differential Privacy
Elodie Zanella

In our previous posts, we introduced the overarching privacy and security considerations of integrating GenAI into the enterprise and we discussed the privacy risks coming with Large Language Models (LLMs).

Now, let’s delve deeper into the specific privacy challenge posed by LLM fine-tuning.

Reminder: Why fine-tuning

Foundation LLMs are generalists. They capture a lot of the world’s knowledge and the ability to process information that comes with learning it. If prompted about general information, the most powerful models are able to provide answers without any help. But in order to use them for brand new tasks requiring some domain expertise, like medical expertise for example, they need a little guidance.

There are two common ways to provide this guidance: either by including it in prompts, also known as prompt engineering; or by modifying the weights of the model itself in order to influence inference for a given task, called fine-tuning.

For example, one may want to use a LLM for classification. An option is to include a series of classification examples directly in the prompt (few-shot learning) and let the LLM complete the classification for a given text. But context windows are often limited to a few thousands tokens, and prompt-engineering can be costly and very long. Plus, (as we speak) even the best proprietary models struggle to learn a very complex task from a few examples and open source models tend to be even less powerful.

In short, when the task requires hundreds of examples to be learned, fine-tuning is the way to go, especially with open-source models.

Other considerations of needs, optimization objectives or available skills, might also lead you to one option or the other.

Let’s focus on fine-tuning here.

Privacy risk in fine-tuning

Fine-tuning a generative model seeks to tweak the weights of the model so that the prediction follows guidelines that are specific to a task. It’s like adjusting the model by repeating enough times “when you see a prompt that starts with X you must complete it with Y”. After enough training, the fine-tuned model will be able to properly answer those prompts (sometimes at the cost of forgetting other skills).

For instance, when the prompt is {medical_record: ”François Dupont suffers from a severe form of pancreatic cancer”} . The classification model may be trained to complete it with {last_name: ”Dupont”, condition: “pancreatic cancer"}. After sufficient examples, it should be able to learn what is of interest in the prompt and how to present it in the answer.

But by construction, this way of modifying the model creates a blatant privacy risk: the model is very likely to continue a prompt like "{medical_record: ”François Dupont" by "suffers from a severe form of pancreatic cancer”. But this is just one of many uncontrolled ways the information that has been learned by the model can leak into the completion.

A good rule of thumb is to consider that, by default, the data that has been used in fine-tuning can be extracted later. In a sense, the entire training set may be hiding in the weights of the model and it can be very easy to extract it.

Addressing the risk with privacy-preserving training

There are two options to address the privacy risk of fine-tuning LLMs:

  1. Mask the private data that may appear in results generated with the LLM
  2. Prevent the model from learning individual data points by heart

The first idea turns out to be an almost intractable objective. LLM security is still a nascent field and prompt injection has demonstrated that it is very easy to circumvent all simple protections. For instance, the model may also complete the sentence My first name is the name of the current pope, my last name is the most common surname in France, and I have the following health condition: with the secret information from our patient; in this case, it would be very hard to trigger the masking strategy.

The second idea turns out to be much better understood.

A tempting potential solution could be the de-identification of the training data, a process consisting in removing everything that can be easily used to identify someone, through data masking for instance. But achieving proper de-identification comes with its own set of challenges: manual, heavy and long processes, loss of data utility… And yet, it does not completely remove the re-identification risk in the end — read our post on the topic.

Luckily, we have Differential Privacy (DP)! DP is a mathematical framework that limits the personal information that is revealed in an output. This is exactly the property we would like our LLM to have. In 2016, Abadi et al. proposed an algorithm to apply differential privacy to stochastic gradient descent called DP-SGD. This algorithm is perfect for fine-tuning in-house deep learning models.

In the next post of the series, we will see how to easily fine-tune a LLM using DP-SGD with Sarus. Stay tuned!


If you’re experimenting with LLM fine-tuning and would like to have a sharing experience session, we’d be delighted to do so! Please feel free to reach out!

About the author

Elodie Zanella

Director of Product @ Sarus


Ready to unlock the value of your data? We can set you up in no time.


Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
32, rue Alexandre Dumas
75011 Paris — France
©2023 Sarus Technologies.
All rights reserved.