Autocompletion with deep learning
Update (August 19): We’ve released TabNine Local, which lets you run Deep TabNine on your own machine.
TL;DR: TabNine is an autocompleter that helps you write code faster. We’re adding a deep learning model which significantly improves suggestion quality. You can see videos below and you can sign up for it here.
There has been a lot of hype about deep learning in the past few years. Neural networks are state-of-the-art in many academic domains, and they have been deployed in production for tasks such as autonomous driving, speech synthesis, and adding dog ears to human faces. Yet developer tools have been slow to benefit from these advances. To use a surprisingly common idiom among software blogs, the cobbler’s children have no shoes.
TabNine hopes to change this. Python:
About Deep TabNineDeep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it. To achieve this goal, it learns complex behaviour, such as type inference in dynamically typed languages:
app.get_user()is assumed to be an object with setter methods, while the return type of
app.get_users()is assumed to be a list:
The model also uses documentation written in natural language to infer function names, parameters, and return types:In the past, many users said they wished TabNine came with pre-existing knowledge, instead of looking only at the user’s current project. Pre-existing knowledge is especially useful when the project is small or a new library is being added to it. Deep TabNine helps address this issue; for example, it knows that when a class extends
React.Component, its constructor usually takes a single argument called
props, and it often assigns
this.statein its body:
Deep TabNine can even do the impossible and remember C++ variadic forwarding syntax:
Using Deep TabNine
Deep TabNine requires a lot of computing power: running the model on a laptop would not deliver the low latency that TabNine’s users have come to expect. So we are offering a service that will allow you to use TabNine’s servers for GPU-accelerated autocompletion. It’s called TabNine Cloud, it’s currently in beta, and you can sign up for it here.
We understand that many users want to keep their code on their own machine for privacy reasons. We’re taking the following steps to address this use case:
For individual developers, we are working on a reduced-size model which can run on a laptop with reasonable latency. Update: we’ve released TabNine Local.
For enterprises, we will offer the option to license the model from us and run it on your own hardware. We can also train a custom model for you which understands the unique patterns and style within your codebase. If this sounds interesting to you, we would love to hear more about your use case at email@example.com.
If you choose to use TabNine Cloud, we take the following steps to reduce the risk of data breach:
- TabNine Cloud will always be opt-in and we will never enable it without explicitly asking for your permission first.
- We do not store or log your code after your query is fulfilled.
- Your connection to TabNine servers is encrypted with TLS.
- There is a setting which lets you use TabNine Cloud for whitelisted directories only.
TabNine Cloud is currently in beta, and scaling it up presents some unique challenges since queries are computationally demanding (over 10 billion floating point operations) yet they must be fulfilled with low latency. To ensure high service quality, we are releasing it gradually. You can request access here. Customers of TabNine will be the first to receive access.
Frequently asked questions
This deep learning stuff is cool but I’m skeptical that it can improve over my existing autocompleter which actually parses the code.
You can use both! TabNine integrates with any autocompleter that implements the Language Server Protocol. TabNine will use your existing autocompleter when it provides suggestions and use Deep TabNine otherwise.
What latency can I expect?
What languages are supported?
Only code with one of the following licenses is included in the training data:
- Apache 2.0
- BSD 2-clause
- BSD 3-clause
Licenses are determined per-repository by Licensee.