How Tabnine protects your code privacy

TL;DR: Your code always remains private

Tabnine NEVER stores or shares any of your code. This is true for all three deployment options: SaaS, VPC and On prem. All of Tabnine’s models: the Public model, the private model (with codebase familiarity) and Chat are self contained and have no third party connectivity. We develop our own models based on our own pioneering experience and on the best of breed permissive open source technologies in the market.

To start with, any action that shares your code with Tabnine servers for the purpose of training team models requires explicit opt-in. There is no shadow caching or sending code in the background.

As you code, the Tabnine client (plugin) requests AI assistance from the Tabnine cluster.

For code suggestions, the process occurs in the background as you code. For chat, this request process occurs once the user asks a question.

These requests include some code from the local project as context (the “context window” as shown below) to allow Tabnine to return the most relevant and accurate answers. This context window may include elements from your local environment, such as:

  • Chat history (for chat)
  • Lines of code
  • Variables
  • Type declarations
  • Functions
  • Objects
  • Related imports from the current file
  • Related files
  • Syntactic and semantic error reports

This context is deleted immediately after the server returns the answer to the client.

Tabnine doesn’t retain any user code beyond the immediate time frame required for inferencing the model. This is what we call ephemeral processing.

The sole purpose of the context window is to facilitate the most accurate answers possible. The moment that output is generated, the code is discarded and is never stored.

This is true even for Tabnine Enterprise's private deployment options (on-prem and VPC).

Trained on open-source code with permissive licenses

Tabnine's generative AI only uses open source code with permissive licenses for our Public Code trained AI mode. These licenses are the following: MIT, MIT-0, Apache-2.0, BSD-2-Clause, BSD-3-Clause, Unlicense, CC0-1.0, CC-BY-3.0, CC-BY-4.0, RSA-MD, 0BSD, WTFPL, ISC. We attribute all code suggested and used by our users to the full list of codebases we use to train our model, which you can always find updated and publicly available in our Trust Center.

This decision has painful implications for Tabnine: it limits the amount of data with which our models can be trained. But for us, our user’s peace of mind comes first at any cost. Using only permissively licensed code, we guarantee our users can safely use the code that Tabnine generates in commercial projects without any license compliance uncertainty whatsoever. Moreover, we are good open source users. By training our model on code with permissive licenses only, we adhere and fully respect the original intent of those that contributed code.

This applies to all the ways you can use Tabnine. Whether you are using our SaaS offering based on our Public Code model, or using a Trained model on your premises and when you use Tabnine Chat, with which you can communicate in plain language. Tabnine Chat is our latest launch. Read all about it here. But, with regards to Code Privacy, the same principle still applies to Chat. We are using this data set which is licensed as Apache-2, free from copyrighted materials. This, again, makes all Tabnine Chat suggestions safe to use.

Our open source ethos

Being a good open source citizen means actively participating and contributing to the open source community in a positive and responsible manner. It involves adhering to certain principles and practices that promote collaboration, transparency, and the mutual benefit of the community as a whole. Here are some key aspects of being a good open source citizen:

1. Respect and Follow Licensing: Open source projects are governed by specific licenses that define how the code can be used, modified, and distributed. Being a good open source citizen means respecting these licenses, understanding their terms, and complying with their requirements.

2. Contribute Back: Open source thrives on collaboration and shared knowledge. Contributing back to the community is an essential aspect of being a good open source citizen. This can involve submitting bug fixes, contributing new features, improving documentation, or providing support to other community members.

3. Give Credit and Attribute: When using open source code in your own projects, it is important to give credit and attribute the original authors and contributors. Acknowledging their work helps maintain a culture of appreciation and recognition within the open source community.

In the odd chance that the model suggests code that already existed in the training data that wouldn’t be a problem for Tabnine users because this code is permissively licensed. You can always use Tabnine’s code suggestions safely.

Your code and data are NEVER used to train any models other than private code models.

Tabnine privacy deep dive

What models does tabnine use?

Tabnine provides AI code completions using two sources: (1) A Public Code model that has been trained on permissive, state of the art, open source code. (2) Private code models trained on private code.

How do we build our models?

Tabnine’s Public code model: state of the art, permissive open source LLMs

This model is the result of our involvement in the open source LLM community. We are regular contributors to many of the main open source LLM models. This gives us the upper hand when it comes to picking the best one out there. Our clients trust us in curating every option in the market for them. We assure them that each time we update the Public model it includes the latest and gratest. Being good open source citizens also allows us to have our proprietary add-on ready for any model. Our enhancement is model-agnostic.

Tabnine Private model: relevant to your company’s best practices

This model is the premium version of Tabnine that includes additional features and enhancements compared to the Public Code version. Effectively, we are deploying the Public Code model to our client’s premise. A new model is created by training the public model with the client’s codebase. Tabnine will submit each query to both models: the vanilla, universal Public Code model and the one already trained in the client's code. Tabnine will then pick the most relevant code suggestion from the two options provided. But, most importantly, the private model will continuously be trained from the decisions made by the user. It will learn from best practices, policies, and APIs that are part of the coding practices of our clients so that Tabnine’s Private model’s suggestions are in line with those. This way, our clients can easily onboard new devs and make their path to seniority much faster.

How do we use our models?

Tabnine uses different deployments to deliver the best possible experience and quality.

SaaS

Tabnine is committed to maintaining the highest level of security and privacy for our clients. With our SaaS deployment, we have implemented an end-to-end encryption system to ensure that the communication between your client’s users and our servers remains completely secure.

  • End-to-End Encryption: When a user is coding and Tabnine provides suggestions, the data sent from the user's machine to our servers is encrypted using industry-standard encryption algorithms. This ensures that the data cannot be read or tampered with during transit. Likewise, when our servers send back code suggestions to the user's machine, this data is also encrypted. This encryption process ensures that only the user's machine and our servers can decrypt and understand the data, making it secure from eavesdropping or man-in-the-middle attacks.
  • TLS (Transport Layer Security): We use TLS, a widely-adopted security protocol, for securing the communication channel between the client's machine and our servers. TLS ensures that the data is not only encrypted but also that the integrity and authenticity of the data are maintained.
  • No Code Storage Policy (Ephemeral processing): At Tabnine, we recognize the sensitive nature of the codebase. Therefore, we adhere to a strict policy wherein no code is stored on our servers. The code is only ephemeraly processed to provide coding suggestions and is then immediately discarded. This minimizes the risk of any unauthorized access or data breaches concerning your code.
  • Data Handling Compliance: In addition to end-to-end encryption, Tabnine complies with various international standards and regulations regarding data handling and privacy. This ensures that our practices are aligned with the best industry standards.
  • Continuous Monitoring and Audits: Our security team continuously monitors for vulnerabilities and conducts regular audits to ensure that our security infrastructure is up to the latest standards. This is aimed at providing our clients with the confidence that their data is handled with the utmost care and security. Please check ourTrust Center for more information.

Please note that this description is hypothetical and for illustrative purposes. For specific and up-to-date information on Tabnine's security practices, please refer to the official documentation or contact Tabnine directly.

VPC

This deployment option runs on a Virtual Private Cloud. No code ever leaves our client’s premises. We use metadata to keep our Product team well informed of bugs, performance and user interactions with the product but the content of it – suggestions, code, workspace – remains always under the clients control.

On Prem

This allows you to run Tabnine AI models locally. The models are downloaded to your machine and the data is queried locally. You receive code completions continuously and quickly in this mode.

IDE Integrations
VSCode
IntelliJ
WebStorm
Pycharm
GoLand
Eclipse
Sublime
RubyMine
Clion
Neovim
PhpStorm
Android Studio
AppCode
Rider
Visual Studio