GenAI in the Enterprise: should you go in-house?

Practical considerations to leverage Generative AI in the Enterprise

Data Governance
Compliance
Deep Learning
GenAI
Vincent Lepage

GenAI is all the rage and many organizations are trying to figure out how to leverage them to boost productivity, revamp their processes, or even reinvent their respective industries.

However, we’re still very early into the journey, and the pace of innovation is so fast that it can be overwhelming for enterprises. New models emerge on a weekly basis, the number of parameters keeps growing, licenses and legal questions are raised, and lawsuits are piling up.

Top executives see their kids use ChatGPT for their homework. They would dream of doing the same: summarizing their meetings in just a few seconds, rephrasing nicely a commercial proposal to be sent to a client, or writing a test for a piece of software they wrote. But the last thing executives want is sensitive data being sent over the internet to non-authorized third parties, without any safeguard or control.

In this post, we raise the key privacy and security questions when considering GenAI in a corporate environment. We provide answers and pointers to the interested reader.

This is the first of a series of posts for IT leaders on how to prototype and deploy Large Language Models (LLMs) safely.

Commercial API vs using open-source LLMs in-house

Two options are available to leverage LLMs:

  1. The commercial API route: you can choose the commercial LLM API provided by external vendors such as HuggingFace or OpenAI. Most propose an Enterprise offering. This gives access to state-of-the-art models (proprietary or open source), and you don’t need to care about hosting, monitoring etc. However, you need to send data to their “private” infrastructure, which means you probably need some form of vendor audit and validation.
  2. The in-house route: you can host LLMs in your infrastructure. For this you will most likely start with open-source foundational models such as Llama 2. Many sizes of models are available to run on your hardware (usually with a GPU, but some simplified versions run on CPU), and Github repos provide clear examples on how to deploy them on bare-metal, Cloud VM or Kubernetes clusters.

Each route has pros and cons in terms of privacy, security, cost, and flexibility. Let’s look at them into the standard workflow from experimentation to production.

For experimentation, commercial APIs allow you to cut corners

Many companies have put together a Generative AI task force to experiment and prototype with LLMs. Their first job is to understand the possibilities of the technology, identify internal use cases, and start validating them. This is the experimentation phase.

For this phase, the commercial API route will provide a turn-key solution, ready in a few clicks. At this stage, organizations are looking for iteration cycles and cost is generally not an issue because one can work on small data samples first. If compliance is an issue, experimenting with public data can usually get you a long way.

The in-house route is technically harder, but much more acceptable from a compliance standpoint as no internal information or prompts from your users will leave your secure infrastructure. Furthermore, you’re sure that no external service provider will use your data to improve its performance, thus avoiding competition concerns. It is also a way to confront obstacles that you may otherwise face down the road (provisioning of GPU, getting the right expertise, compliance validations…).

For production, in-house may be the only option

After identifying the relevant use cases, it’s time to consider putting LLMs in production. This brings a series of new considerations:

  • Controlling the cost of running the LLMs at the scale of the organization
  • Enforcing the same governance process as other services
  • Controlling the access to the models within the same framework as for other services (SSO, logging, RBAC etc.)
  • Improving performance for specific tasks. This can be done by prompt engineering or by fine-tuning. Prompt engineering is the process of writing optimized prompts to improve the LLMs performance for a given task. This does not alter the model per se. Fine-tuning is the process of adding extra steps of training of a model, on specific data. Fine-tuning can greatly improve performance also by allowing to use models much smaller and cheaper to run.

Commercial APIs can in theory cover these needs. However, because you send data to the provider’s infrastructure, it may be difficult or even impossible to get this authorized for compliance or competition reasons. Typically, if you are handling healthcare data, it’s very likely that using an external service provider will take you months, or may never happen. Also, running the LLMs in-house is much more flexible as it allows you to:

  • fully integrate with your existing infrastructure for cost & access control
  • own your models as they will be a key differentiating factor for your business
  • adapt models to any internal constraints, like compliance and security

Focusing on LLMs & privacy

Once you’ve decided to go in-house, there are some privacy considerations to have in mind. In a coming series of blog posts, we’ll dive into the privacy risks associated with genAI. We’ll explore different approaches to remove or mitigate those risks and propose solutions to build a simple, privacy-preserving LLM deployment in your environment.

We’ll also see how you can run an open-source model inside your secure infrastructure, be it on-premise or on your cloud.

Finally, we’d be happy to share with you our experience and get your feedback from the field! So please contact us at contact@sarus.tech or join our LLM beta.

About the author

Vincent Lepage

Cofounder & CTO @ Sarus

Ready?

Ready to unlock the value of your data? We can set you up in no time.
main.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Shell

Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
32, rue Alexandre Dumas
75011 Paris — France
Resources
Blog
©2023 Sarus Technologies.
All rights reserved.