Tabnine NEVER stores or shares any of your code. This is true for all three deployment options: SaaS, VPC and On prem. All of Tabnine’s models: the Public model, the private model (with codebase familiarity) and Chat are self contained and have no third party connectivity. We develop our own models based on our own pioneering experience and on the best of breed permissive open source technologies in the market.
To start with, any action that shares your code with Tabnine servers for the purpose of training team models requires explicit opt-in. There is no shadow caching or sending code in the background.
As you code, the Tabnine client (plugin) requests AI assistance from the Tabnine cluster.
For code suggestions, the process occurs in the background as you code. For chat, this request process occurs once the user asks a question.
These requests include some code from the local project as context (the “context window” as shown below) to allow Tabnine to return the most relevant and accurate answers. This context window may include elements from your local environment, such as:
This context is deleted immediately after the server returns the answer to the client.
Tabnine doesn’t retain any user code beyond the immediate time frame required for inferencing the model. This is what we call ephemeral processing.
The sole purpose of the context window is to facilitate the most accurate answers possible. The moment that output is generated, the code is discarded and is never stored.
This is true even for Tabnine Enterprise's private deployment options (on-prem and VPC).
Trained on open-source code with permissive licenses
Tabnine's generative AI only uses open source code with permissive licenses for our Public Code trained AI mode. These licenses are the following: MIT, MIT-0, Apache-2.0, BSD-2-Clause, BSD-3-Clause, Unlicense, CC0-1.0, CC-BY-3.0, CC-BY-4.0, RSA-MD, 0BSD, WTFPL, ISC. We attribute all code suggested and used by our users to the full list of codebases we use to train our model, which you can always find updated and publicly available in our Trust Center.
This decision has painful implications for Tabnine: it limits the amount of data with which our models can be trained. But for us, our user’s peace of mind comes first at any cost. Using only permissively licensed code, we guarantee our users can safely use the code that Tabnine generates in commercial projects without any license compliance uncertainty whatsoever. Moreover, we are good open source users. By training our model on code with permissive licenses only, we adhere and fully respect the original intent of those that contributed code.
This applies to all the ways you can use Tabnine. Whether you are using our SaaS offering based on our Public Code model, or using a Trained model on your premises and when you use Tabnine Chat, with which you can communicate in plain language. Tabnine Chat is our latest launch. Read all about it here. But, with regards to Code Privacy, the same principle still applies to Chat. We are using this data set which is licensed as Apache-2, free from copyrighted materials. This, again, makes all Tabnine Chat suggestions safe to use.
Our open source ethos
Being a good open source citizen means actively participating and contributing to the open source community in a positive and responsible manner. It involves adhering to certain principles and practices that promote collaboration, transparency, and the mutual benefit of the community as a whole. Here are some key aspects of being a good open source citizen:
1. Respect and Follow Licensing: Open source projects are governed by specific licenses that define how the code can be used, modified, and distributed. Being a good open source citizen means respecting these licenses, understanding their terms, and complying with their requirements.
2. Contribute Back: Open source thrives on collaboration and shared knowledge. Contributing back to the community is an essential aspect of being a good open source citizen. This can involve submitting bug fixes, contributing new features, improving documentation, or providing support to other community members.
3. Give Credit and Attribute: When using open source code in your own projects, it is important to give credit and attribute the original authors and contributors. Acknowledging their work helps maintain a culture of appreciation and recognition within the open source community.
In the odd chance that the model suggests code that already existed in the training data that wouldn’t be a problem for Tabnine users because this code is permissively licensed. You can always use Tabnine’s code suggestions safely.
Your code and data are NEVER used to train any models other than private code models.
What models does tabnine use?
Tabnine provides AI code completions using two sources: (1) A Public Code model that has been trained on permissive, state of the art, open source code. (2) Private code models trained on private code.
Tabnine’s Public code model: state of the art, permissive open source LLMs
This model is the result of our involvement in the open source LLM community. We are regular contributors to many of the main open source LLM models. This gives us the upper hand when it comes to picking the best one out there. Our clients trust us in curating every option in the market for them. We assure them that each time we update the Public model it includes the latest and gratest. Being good open source citizens also allows us to have our proprietary add-on ready for any model. Our enhancement is model-agnostic.
Tabnine Private model: relevant to your company’s best practices
This model is the premium version of Tabnine that includes additional features and enhancements compared to the Public Code version. Effectively, we are deploying the Public Code model to our client’s premise. A new model is created by training the public model with the client’s codebase. Tabnine will submit each query to both models: the vanilla, universal Public Code model and the one already trained in the client's code. Tabnine will then pick the most relevant code suggestion from the two options provided. But, most importantly, the private model will continuously be trained from the decisions made by the user. It will learn from best practices, policies, and APIs that are part of the coding practices of our clients so that Tabnine’s Private model’s suggestions are in line with those. This way, our clients can easily onboard new devs and make their path to seniority much faster.
Tabnine uses different deployments to deliver the best possible experience and quality.
Tabnine is committed to maintaining the highest level of security and privacy for our clients. With our SaaS deployment, we have implemented an end-to-end encryption system to ensure that the communication between your client’s users and our servers remains completely secure.
Please note that this description is hypothetical and for illustrative purposes. For specific and up-to-date information on Tabnine's security practices, please refer to the official documentation or contact Tabnine directly.
This deployment option runs on a Virtual Private Cloud. No code ever leaves our client’s premises. We use metadata to keep our Product team well informed of bugs, performance and user interactions with the product but the content of it – suggestions, code, workspace – remains always under the clients control.
On Prem
This allows you to run Tabnine AI models locally. The models are downloaded to your machine and the data is queried locally. You receive code completions continuously and quickly in this mode.