Setting the Standard: Tabnine Code Review Agent Wins Best Innovation in AI Coding- 2025 AI TechAwards

Posted on April 24th, 2025

We’re proud to announce that Tabnine has been named a winner of the 2025 AI TechAwards in the AI Coding Assistant category. This recognition honors the innovation and impact of our Code Review Agent—a purpose-built AI agent that enables developers to review code with speed, precision, and confidence.

The AI TechAwards, presented annually at AI DevSummit, are among the most prestigious honors in the AI and developer tools space. Winners are selected by the expert-led DevNetwork Advisory Board and reflect outstanding achievement in technical innovation, developer adoption, and industry influence across 20 categories.

This award affirms what leading engineering organizations already know: Tabnine’s Code Review Agent is redefining how teams ensure quality at scale.

Code Review, Reimagined for the AI-Native Era

Code review has long been a cornerstone of software quality. But traditional approaches—manual, time-consuming, and fragmented—can’t keep pace with modern development cycles.

As engineering velocity increases through AI adoption, review processes must evolve in kind. Tabnine’s Code Review Agent is built for this new reality.

The Tabnine Code Review Agent works inside your command line, comes prebuilt with industry leading rules, and is fully customizable to align with your own standards.

It integrates natively into your SDLC and evaluates code changes against your code quality standards examining every line for architecture, maintainability, security, readability, correctness, and performance. The code review agent finds, flags, and suggests fixes with rationale your developers can understand and trust.

The code review agent augments software engineering teams so they can continue to accelerate delivery while also ensuring stability. It makes code reviews faster, clearer, and more consistent across the entire team.

How It Works

Customizable Guardrails

Tabnine gives you full control. Scope the agent’s reviews, tune its criteria, and tailor its behavior to your engineering practices. Adding on new rules is as easy as uploading a CSV with a name and natural language description.

Enforces Code Quality

The agent provides what static code analysis tools and linters cannot. It reviews code against your specific and defined code quality, security, and architectural policies—supporting consistency and compliance at scale.

Every rule. Every changed line.

The code review agent uses AI and the enterprise context engine to check every changed line against every single rule – every time. This frees your senior engineers to execute code review the way it was meant to be as a time for 1:1 training, coaching, skill development with their team.

Empowers your engineers

Every suggestion includes a clear, traceable explanation with relevant logic. You’ll see what rule was violated and why, how to fix, and it’s severity level. The agent promotes understanding and enhances developer judgement. With a quick selection of terminal as context scope the Tabnine AI pair programmer is there to help developers resolve critical code review issues.

Recognition of the Future of Engineering AI

“Developer tools and technologies are shaping the future of software development,”said Jonathan Pasky, Executive Producer of AI DevSummit and the AI TechAwards. “Tabnine’s win highlights their leadership in building AI solutions that are practical, innovative, and deeply valuable to the engineering community.”

As agentic AI becomes standard in the SDLC, engineering leaders are demanding solutions that prioritize trust, traceability, and control.

Tabnine is answering that call—delivering AI that enhances human judgment, aligns with engineering rigor, and accelerates software quality.

Winning this award is a proud milestone. But it’s only the beginning.

We’re committed to advancing the frontier of engineering AI—helping teams ship faster, safer, and with greater clarity than ever before.

See It in Action

Want to see how Tabnine’s Code Review Agent can help your team scale code quality without slowing down?

Join our weekly live demo or contact us to learn more.

The AI Software Dev Platform Built for Bitbucket and Jira

Posted on April 18th, 2025

Engineering teams that rely on Bitbucket and Jira already operate with structure, clarity, and control. Tabnine brings AI into that environment—natively—so teams can accelerate delivery without sacrificing the integrity of their tools, processes, or codebase.

Tabnine connects directly to your Bitbucket repositories and Jira workflows. It enables your developers to write, review, test, and maintain code with the support of trusted AI—within the platforms they already use every day.

This is AI that integrates into the Atlassian stack by design. It works with your source of truth and enforces your standards. It meets your team in the IDE, understands the structure of your code, and aligns to the way your organization builds.

Made for the Atlassian ecosystem

Tabnine integrates with Bitbucket and Jira at the core platform level—not just through extensions or plugins. That means every agent, every suggestion, every AI pair programmer interaction is grounded in your actual code and planning context.

When connected to Bitbucket, Tabnine continuously indexes your repositories using real-time semantic and graph-based techniques. It maps relationships between services, functions, and files so it can retrieve relevant context during coding, testing, and review. Developers can scope their interactions down to specific folders or files for precision, and every suggestion is grounded in real project context.

With the Jira to Code Agent, developers can turn structured Jira tickets into structured code. Tabnine connects requirements to implementation by linking story-level context with the relevant Bitbucket repositories, helping teams move from planning to production with consistency.

Together, these integrations make Tabnine the most aligned AI platform for teams operating inside the Atlassian ecosystem. This is AI that fits your environment, speaks your language, and respects your workflows.

Controlled. Grounded. Secured.

Tabnine is built for engineering teams that work in regulated environments, operate at scale, and expect precision from every system in their stack.

Controlled: Developers get complete visibility into what Tabnine suggests and why. Every recommendation includes references, explanations, and validation checks.

→ Nothing ships without developer approval.

Grounded: Tabnine adapts to your environment, aligns with your codebase, and enforces your organization’s standards—from architecture and code quality to compliance and security. Guardrails are built in and enforced at every step.

→ AI onboards to your organization.

Secured: Tabnine runs wherever you need it to: SaaS, VPC, on-premises, or air-gapped. Your source code stays inside your perimeter. Your data stays yours.

→ Your code stays where it belongs.

The Enterprise Context Engine: How Tabnine onboards AI to your organization

At the core of Tabnine is the Enterprise Context Engine—a system purpose-built to bring structure, explainability, and safety to AI-assisted development inside of Tabnine.

Tabnine continuously ingests real-time context from your codebase, workspace, and supporting documentation—so the AI always reflects the current state of your system, not last week’s assumptions. It maps the architecture of your software using graph-based models, allowing it to understand relationships between services, components, and dependencies—so every suggestion respects how your system is actually built.

Through intelligent retrieval, Tabnine identifies the most relevant context for each task—surfacing the right information at the right moment, without noise or guesswork. And it gives developers control when they need it. Developers can scope context with precision—down to the repository, file, or even function—so the AI works within clearly defined boundaries and doesn’t overreach.

Every output is validated against your organization’s standards for quality, security, and maintainability—ensuring the AI doesn’t just generate code, it generates the right code. And with centralized governance, admin-level policies, and usage analytics, you maintain full oversight—so AI adoption aligns with your security posture, compliance frameworks, and engineering objectives.

Any AI can run inside an environment. What makes Tabnine different is it understands what your developers need to build, what already exists, how it fits together, and how it needs to be done. That’s what we mean by AI that’s onboarded into your organization.

Improve what you’ve already built

With Tabnine, you don’t replace your existing stack—you unlock more from it.

Accelerate modernization projects in Bitbucket
Reduce tech debt and increase test coverage
Validate changes with precision during code review
Connect planning to implementation through Jira and context-aware agents

No context switching. No retraining. No disruption.

Just better outcomes—delivered through the systems your teams already trust.

See it in action

Tabnine Enterprise includes full native support for Bitbucket and Jira.

Join our next live session or request a tailored walkthrough to see how engineering leaders are using Tabnine to bring AI into every step of the SDLC—with control, clarity, and confidence.

[Join office hours] | [Start your evaluation]

How OpenLM Scaled Secure, Context-Aware AI Across Hundreds of Microservices with Tabnine

Posted on April 16th, 2025

Scaling Innovation Across Microservices and Frontend UX with Contextual AI

OpenLM, a global leader in engineering license management, helps some of the world’s most sophisticated enterprises optimize software usage, enforce compliance, and cut licensing costs across vast technical environments. With customers spanning aerospace, defense, automotive, and semiconductors, OpenLM builds highly specialized infrastructure software that integrates with hundreds of engineering tools and platforms.

To serve these customers, OpenLM runs a lean but powerful engineering organization. Its teams manage an extensive microservices-based architecture across a Kubernetes environment, with hundreds of interconnected repositories, backend-heavy workloads, and growing frontend complexity. With limited time, limited headcount, and increasing demand, OpenLM turned to Tabnine to unlock new development velocity—without compromising quality, security, or control.

Why OpenLM Replaced Boilerplate with AI—and Never Looked Back

Like many fast-moving engineering teams, OpenLM’s developers were spending too much time on boilerplate and repetitive tasks. As Petru Betco, one of OpenLM’s development team leads, put it: “Like most engineering organizations, we wanted to minimize time spent on boilerplate and repetitive tasks. Our goal was to automate the mundane so developers could focus on the high-impact, intellectually demanding parts of the work.”

The pain was especially acute on the frontend. While most of OpenLM’s systems are backend-centric, user-facing products were growing in complexity. With fewer frontend specialists, Petru’s team needed to deliver faster—without letting their backend-focused developers drown in unfamiliar UI work.

Testing was another pressure point. “Testing is critical, but nobody wants to spend time doing it. It’s also one of the most time-consuming and least rewarding tasks. We needed a way to maintain high test coverage without slowing down velocity.” Petru noted. The team wanted to improve test coverage and quality while reducing the overhead of writing and maintaining complex unit and integration tests.

With Tabnine now driving an Automation Factor nearing 40%, the team has substantially reduced time spent on low-value tasks like boilerplate and test scaffolding. This has allowed engineers to shift more of their time toward architecture, optimization, and innovation—without sacrificing quality or consistency.

At the same time, OpenLM was growing. Squads were distributed across multiple offices, engineers were onboarding into unfamiliar projects, and technical leaders needed visibility into the codebase without slowing teams down. They needed a secure, scalable, context-aware AI solution.

From Generic AI Tools to Enterprise-Grade Engineering Assistants

Like many developers, Petru began experimenting with ChatGPT in the browser. But it quickly became clear that generic AI tooling wasn’t built for software teams operating at scale. What they needed was something purpose-built for professional development environments: tightly integrated, security-aware, and context-driven.

“I hadn’t realized that AI tools could connect directly to the codebase and respond based on context,” Petru shared. “That level of contextual awareness completely shifted my view—this wasn’t just autocomplete, it was an actual engineering assistant. That changed everything.”

That discovery led OpenLM to Tabnine—a platform designed to bring contextual intelligence directly into the IDE. Rather than flipping between tabs or pasting code into external chats, developers could now access real-time suggestions, code explanations, and test scaffolding inside their secure environment, grounded in the full context of their codebase.

Privacy, Trust, and Control: Meeting CISO-Level Standards for Secure AI Adoption

As a company dealing with sensitive customer environments, OpenLM had a strict policy around data handling. Trust, security, and compliance weren’t just talking points—they were gating requirements for any vendor.

“Naturally, we had concerns around trust, security, and compliance. But those were quickly addressed once we dug into Tabnine’s architecture” Petru said.

OpenLM adopted a clear model policy: Tabnine Protected is used on all security-sensitive code, while Claude 3.5—also available inside Tabnine—is used for general-purpose development. This model-level control ensures developers can move fast with confidence, balancing productivity with compliance.

“The ability to switch models based on sensitivity is key. For regulated workloads, we default to Tabnine Protected. For general development, we use higher-speed models like Claude 3.5 Sonnet. That kind of flexibility gives us productivity without compromising control.” Petru said.

Codebase-Aware AI at Scale: Navigating Hundreds of Repos with Confidence

OpenLM’s codebase reflects the scale and sophistication of its platform. After migrating from a monolithic legacy system, the company now maintains a fully containerized architecture built on Kubernetes and Docker, with dozens of services managed by independent squads.

“Each repository is a service. One squad is responsible for multiple services—usually ones that work in tandem,” Petru explained. With hundreds of interconnected repositories and countless dependencies, navigating and maintaining cohesion across the architecture was a serious challenge.

Tabnine changed that.

“It accelerates development by understanding the architectural context of what you’re working on—so it doesn’t just suggest code, it delivers value aligned to our structure and standards.“

“The biggest value for us is how well Tabnine understands the code context. When you dive into a project you’ve never seen before, it helps you grasp the dependencies, the architecture, and what’s going on without having to dig through documentation or message coworkers.“

This deep contextual awareness enabled faster collaboration across squads, accelerated onboarding into unfamiliar services, and helped new team members start contributing faster.

OpenLM’s overall Productivity Factor recently peaked at 89.58%—a powerful signal that developers are consistently integrating Tabnine into their core workflows. Rather than relying on occasional suggestions, engineers are using Tabnine as a daily accelerant for writing, reviewing, and reasoning through production code.

This high productivity signal reinforces Tabnine’s role not just as an autocomplete tool, but as an embedded agent that adapts to OpenLM’s engineering DNA.

Accelerated Onboarding, Smarter Collaboration

Tabnine adoption has surged across OpenLM’s globally distributed engineering organization, with near-total license utilization and strong daily usage across squads. Developers have embraced the platform not just as a productivity booster, but as a true engineering partner — one that understands their code, their architecture, and the way they work.

With distributed engineering teams across Europe and Israel, OpenLM needed to ensure that new developers could ramp quickly and contribute confidently.

“The onboarding curve is minimal. The interface is intuitive, and most developers start seeing value almost immediately—often without needing documentation. That ease of adoption has been key to our usage rates.”

All new engineers are onboarded with Tabnine, with environment-specific install guides for JetBrains, Visual Studio, and VS Code. From day one, developers have an intelligent assistant that helps them explore, understand, and contribute to unfamiliar code.

“We like to give new developers real responsibilities right away and just throw them in the water—” Petru joked. “Tabnine helps them stay afloat and productive from day one.”

From OKRs to Edge Cases: Using AI to Plan, Test, and Ship Faster

Petru also shared a powerful use case that extended far beyond day-to-day development. When setting team OKRs, he turned to Tabnine’s Claude integration to help structure his goals.

“I provided a rough list of OKRs and asked Tabnine to help quantify and refine them. It returned detailed metrics, action steps, and success criteria—turning a high-level vision into a structured, executable plan.“

That same flexibility shows up in OpenLM’s UI migration efforts. With complex components like paginated dropdowns, dynamic preload states, and edge-case behaviors, Tabnine has helped frontend developers reason through difficult scenarios and move faster.

“There’s still a lot of work to do as a Software Engineer, but Tabnine really helps cut the time and friction,” Petru said.

How Tabnine Delivers Ongoing Value at OpenLM

OpenLM’s adoption of Tabnine isn’t just a cultural or process shift — it’s backed by consistent, measurable results that highlight the platform’s long-term value across velocity, quality, and developer experience.

Productivity is accelerating: OpenLM reached a peak Productivity Factor of 89.58%, signaling deep integration of AI into daily engineering workstreams. Developers are consistently accepting and acting on AI-suggested code — driving faster delivery without compromising standards.

Automation without risk: With a sustained Automation Factor nearing 40%, OpenLM’s engineers are streamlining repetitive tasks like boilerplate generation and test scaffolding — while applying granular security policies through Tabnine Protected wherever needed.

Context is everything: Tabnine’s codebase-aware intelligence is a perfect match for OpenLM’s architecture — enabling developers to confidently contribute across a containerized microservices environment with hundreds of interconnected repositories.

AI chat drives strategic value: Engineers aren’t just using Tabnine to generate code — they’re using it to shape direction. From defining OKRs to planning architecture migrations, Claude-powered chat has become as integral as completions, demonstrating Tabnine’s role as a full-lifecycle assistant.

These results underscore what’s possible when AI is integrated with context, governed with control, and trusted by engineers. Tabnine delivers not just short-term boosts — but long-term, compounding value across the SDLC.

Join industry leaders like OpenLM in transforming your software development lifecycle.

Tabnine is the only AI software development platform built from the ground up to be context-aware, developer-friendly, and enterprise-secure. What makes Tabnine unique isn’t just that it generates code — it generates the right code, for your codebase, grounded in your architecture, patterns, and practices.

Why does that matter? Because trust is the true unlock for enterprise AI adoption — and trust is built on accuracy. At Tabnine, we achieve that accuracy by grounding every suggestion, test plan, explanation, and refactor in the real-time context of your codebase. That’s why Tabnine helps developers move faster without hesitation — because they know the AI understands exactly what they’re working on.

We don’t just integrate into your workflow — we adapt to it. Whether you’re deploying across air-gapped environments, managing sensitive IP, or scaling development across dozens of squads, Tabnine delivers intelligence you can trust, in the tools your developers already use.

Let Tabnine help your developers do their best work — faster, safer, smarter, and with complete confidence.

Contact us today or register for an upcoming Tabnine Office Hours to see how Tabnine can accelerate your engineering velocity while meeting the highest standards of trust, privacy, and performance.

Tabnine Now Supports Perforce Helix Core

Posted on April 16th, 2025

Enterprise-grade AI has finally arrived where Git can’t.
Tabnine now brings trusted, context-aware AI to Perforce Helix Core—enabling the world’s most advanced engineering teams to move faster, stay in flow, and maintain complete control.

Modern engineering organizations architect the systems that move economies, defend nations, and power the future. These environments are complex, governed, and designed for scale. For many of these teams, Perforce Helix Core is the foundation of their software development lifecycle. For years, teams working inside Helix Core environments have been underserved by AI tooling. Most assistants fail to understand large-scale architecture, miss important branching context, or bypass security protocols altogether.

Today, Tabnine is introducing native support for Helix Core within our Enterprise Codebase Connections capability. With this addition, Tabnine enables AI-driven development inside the environments that power the most advanced systems in the world. This release builds on our existing integrations with GitHub, GitLab, and Bitbucket, further extending Tabnine’s role as the AI software development platform for every engineering team across every system.

Tabnine integrates directly into the IDEs your teams already use, including Visual Studio, IntelliJ, VS Code, and Eclipse. It works across your entire technology stack, supporting more than 600 programming languages, frameworks, and libraries. And it connects to all your SCMs—including now, Perforce Helix Core. The integration brings deep understanding of your architecture, your branching strategies, and the organizational standards that define how your team builds software. Tabnine provides accurate, explainable AI code recommendations, fully aligned with the structure and requirements of your Perforce repositories.

Tabnine operates entirely within your environment—on-premises, in a private VPC, in an air-gapped deployment, or SaaS—without ever transmitting code externally. Your development processes remain secure, controlled, and compliant with internal and regulatory requirements. Tabnine gives developers full visibility into suggestions, explanations, and references. It supports engineering leaders with fine-grained enforcement of standards, policies, and guardrails—applied in real-time across every phase of development. The platform runs inside your perimeter, integrates with your identity systems, and respects your governance model. Sensitive IP stays protected. Development remains accountable. And every line of code can be traced, understood, and validated.

Tabnine includes specialized agents for each phase of the software development lifecycle—supported by an AI pair programmer that works seamlessly in any environment. Developers can explore and understand existing systems, transform requirements into code, generate and update documentation, test with confidence, resolve technical debt, and modernize legacy systems. Every agent is designed to accelerate productivity while reinforcing your organization’s engineering practices. Underpinning the entire experience is Tabnine’s Enterprise Context Engine, which delivers fine-grained personalization, project-specific insight, and policy-aware assistance—tailored to how your teams build software.

Perforce Helix Core is the version control backbone for many of the world’s most sophisticated software systems. It manages terabytes of structured data, supports collaborative workflows at global scale, and enforces the precision and performance these systems require. With native support for Helix Core, Tabnine now extends its capabilities to these environments—enabling development teams to move faster, reduce complexity, and improve throughput, all while maintaining the highest standards of control, compliance, and technical integrity.

Tabnine for Perforce Helix Core is available now for all Enterprise customers. If you’re ready to bring trustworthy AI into your most critical systems—without compromising compliance, control, or context—request a tailored demo or join our next Tabnine Office Hours. For current customers, reach out to your account lead to activate the integration today.

Tabnine is designed for engineering organizations building what the world depends on—systems where quality, scale, and trust are foundational. With support for Helix Core, we’re expanding what’s possible.

A Return to Craftsmanship in Software Engineering

Posted on April 14th, 2025

AI in software development is at a turning point. The tools are powerful. The adoption is accelerating. And the stakes—for quality, security, and trust—have never been higher.

Many AI providers are rushing toward full automation, treating developers as inefficiencies to eliminate rather than experts to empower. Some voices have even gone so far as to call for an Industrial Revolution in software engineering.

And amid the momentum, a troubling pattern has emerged: vibe coding.

Vibe coding is what happens when AI is applied indiscriminately—without structure, standards, or alignment to engineering principles. Developers lean on generative tools to create code that “just works.” It might compile. It might even pass a test. But in enterprise environments, where quality and compliance are non-negotiable, this kind of code is a liability, not a lift.

The appeal is easy to understand. Vibe coding feels fast. It delivers quick wins. But speed without discipline isn’t innovation—it’s entropy. And the long-term cost is steep: inconsistent patterns, broken dependencies, silent failures in production, legal exposure, and a culture of mistrust.

These aren’t hypothetical risks.

In a 2025 UiPath survey of IT leaders at billion-dollar enterprises, the most-cited limitations of AI tools were poor explainability, lack of integration, hallucinated results, and misalignment with internal systems. Engineering leaders know the truth: automation without alignment breaks things.

At Tabnine, we see AI differently. Not as an agent of automation, but as an amplifier of human creativity. We offer engineers a fundamentally different path than vibe coding.

We call it craft coding.

Craft coding is the disciplined, future-forward alternative to vibe coding. It merges the speed of AI with the rigor of enterprise engineering. It doesn’t bypass process—it strengthens it. Every suggestion is grounded in context. Every action is explainable. Every line of code reflects the standards, systems, and architecture that make your software work.

This is about more than just better code. It’s about re-centering the engineer in the age of AI. There is no neural network that can match the ingenuity of a engineer. No fine-tuned model that can capture the depth of human expertise.

Where others push automation to eliminate the developer, we see an opportunity to elevate them. Because software engineering isn’t factory work. It’s a discipline of knowledge, precision, and creativity. Developers aren’t inefficiencies to remove. They are the architects of innovation.

The industrial metaphor—one where AI replaces people—is not only outdated, it’s dangerous. Tabnine rejects that paradigm. We believe the right metaphor is the Renaissance. Just as the Renaissance spread the most innovative ideas across the world, reshaping art, philosophy, and culture, we believe a Renaissance in software development is coming—one that brings craftsmanship back to engineering.

We’re entering a new era of software development—one where AI is an amplifier of creativity, not a substitute for it.

That’s why Tabnine built the first platform designed to elevate the craft of engineering.

At the heart of it is our Enterprise Context Engine—a breakthrough system that makes AI agents deeply aware of your codebase, your workflows, and your standards. It includes:

An Insight Layer that builds a map of your architecture, requirements, and codebase—becoming the unified source of truth for every AI interaction.
A Control Layer that allows developers to scope context with precision, guiding every retrieval task, chat interaction, and pair programming query to the right files and rules for the job.

With this foundation, Tabnine delivers human-in-the-loop AI agents purpose-built for each step of the SDLC.

Where vibe coding demands a singular general-purpose agent given a license to refactor; the craft coding approach demands focus and specialization from AI.

Our human-in-the-loop agents help developers plan, create, document, test, review, and maintain code—without breaking flow or compromising standards.

Supporting them is our AI Pair Programmer, always aligned to your codebase, always grounded in your systems, always explainable.

And we’ve gone further.

We’ve expanded our capabilities with advanced RAG systems—including graph-based retrieval that understands the relationships in your codebase. This allows Tabnine to eliminate redundant microservices, avoid duplication, and reuse existing functionality wherever possible. Our RAG Controller dynamically adjusts retrieval based on the task—ensuring each agent interaction is precise, relevant, and low-risk.

Add to that fine-grained context scoping, expanded IDE workspace awareness, real-time terminal integration, and image-as-context capabilities, and the result is a platform that doesn’t just help developers write code—it helps them engineer trust.

Any agent can write code. Ours earn your developers’ trust.

That trust matters. Because by 2028, 90% of enterprise developers will use AI daily. But AI at scale only works when it’s trustworthy—when it’s aligned to your systems, your standards, your expectations. Engineering leaders are waking up to this fact. They’re no longer chasing flashy features. They’re demanding performance, explainability, compliance, and control.

They’re demanding platforms that don’t just move fast—but build what lasts.

That’s what Tabnine was built for.

We’re not building another vibe coding tool. We can offer you a AI platform that elevates the craftsmanship of your engineering work. And we believe the future of software development belongs to those who pair human ingenuity with responsible, explainable, context-aware AI.

So if you believe that AI should empower, not erase, the developer—if you believe this is the beginning of a new Renaissance, not a repeat of the industrial age—then you’re our kind of engineer.

That’s our kind of “vibe.”

Your AI Doesn’t Need More Training—It Needs Context.

Posted on April 3rd, 2025

Many engineering leaders embarking on AI initiatives believe fine-tuning is the logical next step. It isn’t. Unless you’re expanding a model’s exposure to a low-resource programming language, fine-tuning is almost always the wrong tool. Fine-tuning does not teach a model your security policies. It does not encode your internal codebases. It does not align the model with your development workflows.

At best, it increases familiarity with narrow patterns seen in a limited training set. At worst, it bloats the model, introduces overfitting, complicates compliance, and makes future updates brittle and expensive. RAG is the key to accurate, reliable AI for software engineering.

Retrieval-Augmented Generation (RAG) solves the actual problem.

RAG gives your model structured access to your real-world context: your documentation, source code, test cases, design patterns, compliance rules, and internal APIs. It doesn’t try to embed your org into the model’s weights—it retrieves the right information at the right time, with semantic precision and architectural awareness.

Whether through vector embeddings, semantic similarity, knowledge graphs, or agentic workflows, RAG enables LLMs to generate high-accuracy responses grounded in your enterprise environment—with no need to retrain, redeploy, or revalidate a new model version every time your codebase changes.

Fine-tuning is the wrong solution to the right problem. RAG is the engineering-aware, governance-aligned path forward. And our implementation of RAG continues to grow in its sophistication. Tabnine’s Enterprise Context Engine produces a 82% lift in code consumption rates compared to out the box LLM performance, certainly nothing to wag a finger at when other provider’s systems only produce a 30 to 40% lift.

99% of the time, you don’t need fine-tuning.

Enterprise AI systems often must incorporate proprietary or up-to-date technical knowledge beyond an LLM’s training data. Two common strategies are Retrieval-Augmented Generation (RAG) – feeding external information into prompts – and fine-tuning the model on domain data . Recent research in software engineering contexts compares these approaches to guide how best to improve code generation, developer assistants, and technical Q&A. Overall, studies show that RAG-based methods frequently match or outperform fine-tuned models on accuracy and code quality, especially when domain knowledge is complex or rapidly evolving,

Better Answers, Lower Cost: Why RAG Beats Fine-Tuning at Its Own Game

Inject new knowledge without retraining—and outperform on what actually matters. Semantic or Vector RAG uses vector embedding search to retrieve relevant context (e.g. code snippets, docs) based on semantic similarity. In enterprise use-cases, this approach lets an LLM access up-to-date internal knowledge without retraining. Empirical results indicate substantial gains in factual accuracy over relying on fine-tuned models alone.

For example, Ovadia et al. (2023) found that unsupervised fine-tuning provides only modest gains, whereas RAG “consistently outperforms it, both for existing knowledge … and entirely new knowledge” . Similarly, Soudani et al. (2024) showed that while fine-tuning improves performance on common content, RAG “surpasses FT by a large margin” on low-frequency, domain-specific facts.

Importantly for efficiency minded engnineering leaders, fine-tuning a large model on niche info is resource-intensive, whereas retrieval is both effective and efficient for injecting new knowledge.

RAG Writes Code That Compiles—and Keeps Up

In software engineering tasks, semantic RAG has been applied to augment code models with API documentation, prior code, and knowledge base articles. Bassamzadeh and Methani (2024) compared a fine-tuned Codex model to an optimized RAG approach for a domain-specific language (DSL) used in enterprise automation (workflows with thousands of custom API calls).

The fine-tuned model had the highest code similarity to reference solutions, but the RAG-augmented model achieved parity on that metric while also reducing syntax errors (higher compilation success) by 2 percentage points. The RAG approach did have slightly higher hallucination in API names (by 1–2 points) compared to the fine-tune, but crucially it could handle new, unseen APIs with additional context.

In other words, an embedding-based RAG “grounding” of the code generator matched a specialist model’s quality and stayed up-to-date as APIs evolved, something a static fine-tuned model struggled with . These findings suggest that vector RAG can yield comparable or better code accuracy than fine-tuning, while offering flexibility (no retraining needed) for enterprise codebases that change over time. Grounding your model with real documentation beats retraining it every time your APIs change.

When Vector Search Falls Short, Graph RAG Connects the Dots

It’s the smarter way to retrieve answers from complex, interrelated enterprise systems. Graph RAG integrates structured knowledge graphs or linked data into the retrieval process, rather than relying purely on embedding similarity. This method leverages relationships between entities (e.g. linking functions, libraries, or concepts) to retrieve a richer context. Research on GraphRAG has demonstrated how graph-grounded retrieval boosts performance on complex enterprise documents.

It uses an LLM to build a knowledge graph of a private dataset and then retrieves information via graph connections, achieving “substantial improvements in question-and-answer performance” on long, interrelated enterprise texts . GraphRAG was shown to answer holistic queries requiring “connecting the dots” across disparate pieces of organizational data where baseline vector RAG failed. With complex enterprise codebases, GraphRAG’s structured approach outperforms previous semantic RAG methods in both accuracy and breadth of answers, highlighting its value for enterprise knowledge discovery.

Don’t Fine-Tune for Complexity—Graph RAG Navigates It Naturally

When answers require reasoning, relationships matter more than retraining. Academic evaluations corroborate the benefits of graph-enhanced retrieval. For multi-hop question answering, Jiang et al. (2025) propose a KG-guided RAG that first does semantic retrieval then expands context via a knowledge graph. Their experiments on HotpotQA show the KG-augmented RAG delivered better answer quality and retrieval relevance than standard RAG baselines.

By pulling in related facts through graph links, the model can generate more complete and correct answers. These results indicate that Graph-based RAG can complement or even replace fine-tuning when queries demand reasoning over complex, connected knowledge (a common situation in large codebases or enterprise data). Rather than fine-tuning a model to memorize all relations, a graph RAG dynamically taps into an ontology or knowledge network, yielding higher precision responses in domains like software architecture (where functions, modules, and their dependencies form a graph).

Smarter Than Fine-Tuning: Agentic RAG Thinks Before It Answers

When engineering tasks get complex, you need a model that can reason, not just recall. Agentic RAG refers to giving an LLM agent the ability to decide when and how to use retrieval in a multi-step, interactive manner . Unlike a fixed single-pass RAG pipeline, an “agentic” approach lets the model iteratively query a knowledge source or use tools (e.g. search, compilers) as needed to fulfill a task. This is especially useful in software engineering assistants, where a query might require exploring multiple pieces of information (for instance, reading error logs then retrieving API docs).

An agentic RAG system breaks the linear prompt→retrieve→generate flow: the LLM can choose to skip retrieval if the answer is known, or perform several retrieval steps and reasoning loops for complex problems . This flexible strategy can significantly improve outcomes compared to a fine-tuned model that must produce an answer in one shot from its internal weights.

Why Agentic RAG Outsmarts Fine-Tuning in Real-World Dev Tasks

Emerging research shows that agentic and multi-step retrieval strategies yield measurable gains in accuracy. Chang et al. (2025) introduce MAIN-RAG, a multi-agent RAG framework with LLM “agents” that collaboratively filter and select documents before generation. Without any model fine-tuning, MAIN-RAG achieved 2–11% higher answer accuracy than traditional one-step RAG across several QA benchmarks, by eliminating irrelevant context and retaining high-relevance info . The agent-based approach also improved consistency of answers, offering a “competitive and practical alternative to training-based solutions.”

This suggests that for tasks like code assistance, an agentic RAG could outperform a fine-tuned model: the agent can, for example, decide to fetch different code snippets, run test cases, or consult documentation in a loop until the answer or code fix is verified – capabilities a static fine-tuned model lacks. While research on agentic RAG in developer tools is still nascent, the evidence so far points to greater problem-solving ability by coupling LLMs with decision-making and retrieval, rather than relying solely on fine-tuned knowledge. In enterprise settings, this means an AI assistant can automatically traverse internal knowledge bases or project repositories in multiple steps, yielding more accurate and context-aware help for developers.

RAG Beats Fine-Tuning Where It Matters Most: Accuracy, Efficiency, and Real-World Relevance

If your model isn’t grounded in your knowledge, it’s guessing. The evidence is clear: Retrieval-Augmented Generation (RAG) consistently outperforms fine-tuning across the metrics that matter most to enterprise software engineering—accuracy, adaptability, and code quality.

Vector and semantic RAG bring in up-to-date technical knowledge without retraining. Graph RAG builds structured context from complex systems, enabling deeper understanding. And agentic RAG introduces reasoning and decision-making—turning your LLM from a static predictor into a dynamic, problem-solving assistant.

Fine-tuning can only take you so far. RAG takes your model the rest of the way—with precision, context, and real-time relevance.

If you’re looking for AI to support your engineers, you don’t need another fine-tuning pipeline. You need a system that knows where to look.

Learn how Tabnine’s Enterprise Context Engine uses RAG to deliver accurate, context-aware assistance grounded in your codebase, docs, APIs, and security protocols—out of the box.

Explore Tabnine’s Enterprise AI Software Dev Platform →

Before You Scale AI for Software Dev, Fix How You Measure Productivity

Posted on April 1st, 2025

The Hidden Saboteur in Your AI Transformation: Broken Productivity Metrics

Across the industry, engineering leaders are embracing AI with urgency. Copilots, agents, and assistants are being integrated into workflows with the promise of accelerating delivery, reducing repetitive toil, and freeing developers to focus on high-value work.

But there’s a problem. You can’t accurately measure the impact of AI if your existing productivity metrics are already broken.

Most organizations still rely on outdated metrics—lines of code, commit counts, story points—that were never designed to reflect true engineering impact. And when these legacy indicators are applied to AI-augmented workflows, they fall apart entirely. Developers accept AI suggestions, bots refactor code, and teams move faster—but the data doesn’t explain why. Attribution becomes unclear, value delivery becomes harder to track, and engineering leaders are left staring at dashboards full of noise.

When what gets measured is flawed, what gets managed is misdirected. The result? Misleading KPIs, misaligned incentives, and missed opportunities.

The issue isn’t AI. The issue is your measurement system.

Why AI Disrupts the Old Model

AI changes the dynamics of contribution. It suggests, autocompletes, reviews, documents, and even writes code autonomously. Yet most legacy systems have no way of tracking this new layer of productivity.

For example, AI-generated code may inflate commit counts without improving velocity or value. Suggestion acceptance rates provide a narrow view, often disconnected from downstream outcomes. Time saved in onboarding, documentation, or test creation is rarely tracked, even though the benefits are significant.

What looks like a productivity boost in raw numbers may actually be technical debt in disguise. Misapplied metrics give a false sense of progress—and can undermine trust in both AI tools and the teams that use them.

A Better Way to Think About Developer Productivity

Developer productivity is complex. Decades of research and practice have proven there’s no single formula. And that’s the point.

Modern engineering productivity spans multiple dimensions. It includes output—such as commits and features delivered—but also quality indicators like defect rates and rework. It encompasses collaboration through code reviews, team interactions, and shared knowledge. It reflects operational excellence in system stability, MTTR, and deployment frequency. And it hinges on developer satisfaction: engagement, burnout risk, and tool usability.

Truly effective organizations assess productivity at three levels: individual, team, and organizational. At the individual level, metrics reflect a developer’s contributions, growth, and well-being. At the team level, they reveal process health, delivery velocity, and collaboration. At the organizational level, they tie engineering efforts to strategic goals and business outcomes.

Crucially, context matters. A metric that provides insight at one level may mislead at another. Lines of code might indicate stuckness at the individual level, but they’re meaningless at the team or org level. Effective measurement means matching the metric to the level, interpreting it carefully, and ensuring it supports—not distorts—desired outcomes.

Proven Frameworks That Matter

Leading organizations have moved beyond simplistic measures and adopted multi-dimensional frameworks.

The SPACE framework, identifies five key dimensions of productivity: satisfaction, performance, activity, collaboration, and efficiency and flow.

It emphasizes the importance of balancing these elements rather than optimizing one at the expense of others.

For example, Satisfaction might be measured via developer surveys or eNPS scores; Activity could be commits or code changes; Efficiency/Flow could involve measuring interruptions or time in “flow state”; Communication might be captured by code review interactions or knowledge sharing; and Performance by outcome metrics like features delivered or customer impact. The SPACE framework explicitly cautions against focusing on only one dimension in isolation – teams should consider multiple signals in tension.

This prevents scenarios where, say, high activity (many code changes) is mistaken for productivity even if satisfaction or collaboration is low. SPACE is applicable at individual scale (e.g. a developer’s own satisfaction and flow), team scale (team communication, overall outcomes), and organization scale (aggregate performance and efficiency). It has become a guiding methodology for many engineering orgs to define a balanced “dashboard” of metrics rather than a single KPI.

The DORA metrics—deployment frequency, lead time for changes, change failure rate, and mean time to recovery—focus on delivery performance.

Deployment Frequency (how often an organization deploys code to production), Lead Time for Changes (time from code committed to code successfully running in prod), Change Failure Rate (what percentage of deployments cause a failure, bug, or outage), and Mean Time to Recovery (MTTR) (how long on average to restore service when an incident occurs).

High-performing teams strive for frequent, fast deployments with a low failure rate and quick recovery – in other words, high speed and stability. DORA metrics are usually applied at the team or organizational level to assess DevOps and engineering effectiveness. Research has shown that teams excelling in these metrics also achieve better business outcomes (such as higher profitability, market share and customer satisfaction) by enabling faster delivery of value with quality.

These metrics have been widely adopted in industry as a standard for benchmarking engineering teams (e.g. “Elite” performers vs “Low” performers). However, it’s worth noting they measure team capabilities, not individual performance. Even the DORA team cautioned against using these metrics punitively or for strict team-by-team comparisons, to avoid encouraging the wrong incentives . Instead, they work best as indicators for continuous improvement. Many tools and dashboards (from GitHub, GitLab, etc.) now automatically report DORA metrics for an organization’s delivery pipeline.

Agile process metrics, like velocity, cycle time, and defect escape rates, help teams evaluate how effectively they deliver work.

When used appropriately, they help diagnose bottlenecks and track improvements over time.

Velocity, which is the average amount of work a team completes per iteration (sprint), measured in story points or some unit of effort. Velocity helps with forecasting and ensuring the team isn’t overcommitting . Other Agile metrics include Sprint Burndown (tracking work completed vs. time in a sprint), Cycle Time (how long a single work item takes from start to finish), Lead Time (from idea reported to work delivered, similar to cycle time but may include wait states), and Throughput (number of work items completed in a period) . Kanban teams often track Work In Progress and Cycle Times to optimize flow.

These metrics are largely at the team level and focus on process efficiency and output consistency. Agile methodologies also emphasize quality metrics like defect rates or customer-reported issues per iteration to ensure speed isn’t achieved at the cost of quality.

While Agile metrics are narrower in scope than SPACE or DORA, they fit into those frameworks – for example, a team’s cycle time is a component of “Performance” and “Flow” in SPACE, and can influence Lead Time in DORA. The key is to use Agile metrics as health checks for the team’s process, not as absolute judgments of individual performance. For instance, velocity varies by team and is useful for a team to track its own improvements, but it should not be used to compare two different teams.

Together, these frameworks form a balanced scorecard for engineering productivity. They are not replacements for judgment but essential tools for understanding where and how value is created.

What Leading Teams Measure Today

At the individual level, leading teams track developer satisfaction, estimate time saved with AI tools, and monitor usage and engagement to understand adoption trends.

At the team level, they measure code review cycle time, throughput of completed issues, escaped defects, test coverage, and rework due to AI-generated code.

At the organizational level, they assess deployment frequency, lead time for feature delivery, mean time to recovery, and business value delivered per unit of engineering capacity. These metrics are often aligned with OKRs to ensure strategic coherence.

Code Velocity

Code velocity generally refers to the speed at which code is produced and delivered. It’s a loose term – some define it as “commit velocity” (number of commits or lines of code over time), others as how quickly features move through the pipeline. In Agile teams, velocity has a specific meaning (completed story points per sprint). High code velocity means the team or developer is delivering changes rapidly. This can be measured by commits per day, lines changed, or story points completed. However, caution is needed: raw code output alone is not a definitive indicator of productivity or value.

For example, a high commit count could include trivial changes or even introduce churn. In fact, focusing on lines-of-code metrics can backfire: developers might write unnecessary code just to “look productive,” leading to more code to maintain (a classic case of Goodhart’s law in action).

Thus, code velocity metrics should be paired with quality metrics. Typically, code velocity is looked at on a team level (e.g. our team deploys X commits or completes Y story points per week). At an individual level, managers may glance at commit activity as one input (to ensure no one is stuck or overloaded), but it’s rarely used as a KPI due to variability in tasks and the risk of misuse . In summary, code velocity is useful to track trends (are we speeding up or slowing down delivery?), especially when combined with other measures.

Issue Throughput

This metric counts how many work items (tickets, user stories, bugs) a team completes in a given period. It is a direct measure of team output in terms of units of work delivered. For example, a team might resolve 30 Jira issues in a sprint, or merge 10 pull requests per week. Tracking throughput helps in understanding capacity and consistency. It’s often used in Kanban style teams (throughput per week/month) and in Scrum (stories per sprint). High throughput with steady quality means the team is effectively getting things done.

If throughput drops, it could indicate bottlenecks or blockers. However, be mindful of work item size – 30 small trivial tasks are not the same as 5 major features. Many teams therefore also track work item size or classify issues by type (new feature vs chore vs bug) to give throughput more context. Issue throughput is inherently a team-level metric. Using it for individuals can be misleading since tasks vary in complexity and are often collaborative. A related metric is throughput ratio (e.g. ratio of completed vs incoming work) to see if the team is keeping up with demand or a growing backlog. This metric ties into performance/outcome in SPACE (delivering value) and can be linked to business outcomes when the “issues” represent user stories that deliver business value.

Code Review Cycle Time

The speed and efficiency of code reviews is a critical productivity indicator at the team level. This is often measured as part of cycle time – e.g. the time from a pull request (PR) being opened to it being merged and deployed. Specifically, code review time can be defined as the duration a PR waits for review and the time taken to get approval. Long review times can slow down delivery and hinder developers waiting on feedback. Recent research underscores the importance of this: accelerating the code review process can lead to a 50% improvement in overall software delivery performance. This is because quicker reviews mean code gets to production faster and developers spend less time context-switching or waiting.

Metrics to track here include: average PR wait time, average time to first reviewer comment, and average time from PR open to merge. Many engineering intelligence tools provide a “PR cycle time” breakdown. Code review efficiency touches on Communication & Collaboration in SPACE, since it reflects how well team members coordinate. It’s also one of the “hidden” contributors to faster lead times (thus impacting DORA metrics). Engineering managers often set internal goals for review times (for example, aim to review PRs within 1 business day on average). If a team finds their reviews are taking too long, they might adjust policies (e.g. reduce required approvers, dedicate review time each day) to improve flow . This metric is communicated at team level but can also be aggregated at org level to identify systemic bottlenecks in the development process.

Test Coverage and Quality Metrics

While writing more code faster is one side of productivity, the quality of that code is equally important for long-term productivity. Test coverage (percentage of code covered by automated tests) is a commonly cited metric. High test coverage can indicate a safety net that allows developers to move fast with confidence (i.e. you can deploy often because tests catch regressions). Coverage is usually measured at the codebase or module level, and teams might set targets (e.g. “maintain at least 80% unit test coverage”).

However, coverage is not a perfect metric – 100% coverage doesn’t guarantee good tests, and chasing coverage numbers can even lead to writing superficial tests. Still, it serves as a rough gauge of how much of the code is verified by tests. Other quality metrics include defect rates (bugs reported in production per quarter), Escaped Defects (bugs found by users that were not caught in internal testing), and Code Churn (how often code is rewritten or reverted shortly after being written).

High churn could signal poor initial quality or unclear requirements. Static analysis tools also provide metrics like lint issues or security scan results which feed into code quality assessment. These metrics are important at the team and org level to ensure that productivity gains are not coming at the expense of quality. For example, if a team’s velocity increases but so do production bugs, the net productivity might actually be worse (due to firefighting and rework). Organizations often set up dashboards for these quality metrics and tie them to engineering OKRs (e.g. “reduce escaped defects by 30%”). Test coverage and related metrics align with the “Performance” dimension in SPACE (since it affects the outcomes delivered to users) and also relate to “Reliability” as discussed in DevOps literature .

Mean Time To Recovery (MTTR)

This is one of the DORA four key metrics and measures operational productivity – specifically, how quickly the team can restore service when an incident occurs. MTTR is typically measured in hours (or minutes for very critical systems) and is averaged over incidents in a given period. A low MTTR means the engineering team is effective at quickly diagnosing and fixing problems under pressure, which is a sign of strong capability (and often good instrumentation and on-call processes).

MTTR is usually considered at the organization or service level (e.g. across all incidents affecting a product or system). It’s a key metric for DevOps/operations productivity and is often reported to upper management as part of reliability or uptime reports. Improvement in MTTR can come from better monitoring, runbooks, incident response training, and resilient architecture – all indicating a mature engineering organization. MTTR is strongly tied to business outcomes because downtime directly impacts users and revenue.

For example, if MTTR is reduced from 1 hour to 15 minutes, the business experiences far less disruption from incidents. Many top-performing teams measure MTTR alongside Mean Time Between Failures (MTBF) to balance speed of recovery with overall system stability. In terms of frameworks: MTTR is a Performance/Outcome metric (SPACE) and a Stability metric (DORA). Reporting on MTTR to executives helps demonstrate how engineering productivity contributes to reliability (e.g. “In Q1 our average recovery time improved by 50%, minimizing customer impact of outages” – a clear business benefit).

Developer Satisfaction and Engagement

Productivity isn’t only about output; it’s also about how developers feel and how likely they are to sustain high performance. Happy, engaged developers tend to be more productive and creative . Thus, many organizations measure developer satisfaction or developer experience through periodic surveys. This can include questions about satisfaction with tools and processes, work-life balance, feeling of accomplishment, etc. Some companies use a Developer NPS (Net Promoter Score) asking how likely a developer is to recommend the engineering org as a great place to work. Others calculate an internal Developer Satisfaction Index. In the SPACE framework, Satisfaction and well-being is the first dimension, highlighting its importance.

Measuring it at the individual level (via anonymous survey) and aggregating to team/org level can reveal problem areas – e.g. perhaps one team has low morale due to poor processes, which will eventually hurt productivity through attrition or burnout. A Microsoft study noted that productivity and satisfaction are “intricately connected” . High churn of developers or widespread burnout is a red flag that any short-term productivity gains are unsustainable.

Therefore, CTOs and VPs increasingly present developer satisfaction metrics to the board alongside delivery metrics, to show that the team’s health is being maintained. Some modern tools (see section on tools) even integrate developer mood surveys into their platforms . Best practices for measuring satisfaction include doing it regularly (e.g. quarterly), keeping it anonymous, and following up with action plans so developers see improvements – which in turn boosts engagement.

Linking to Business Outcomes

Ultimately, productivity metrics should connect to business performance and customer value. This is where organizational-level metrics come in. Examples include: feature lead time (time from ideation to feature in customers’ hands), customer satisfaction (CSAT/NPS) related to product improvements, revenue per engineer (rough measure of ROI on engineering), and other outcome-based KPIs. While it’s hard to draw a direct line from an individual developer’s commits to, say, quarterly revenue, engineering leaders try to correlate their metrics with business outcomes.

For instance, if deployment frequency and lead time improved due to productivity initiatives, did it result in the company capturing market opportunities faster or improving user retention?

One approach is to use OKRs where the Objective is a business goal (e.g. “Improve user retention by 5%”) and the engineering Key Results include delivering specific product enhancements or reliability improvements by certain dates – essentially measuring if engineering output drives the desired business result. In reporting, VPs of Engineering will often translate technical metrics into business terms: e.g. “We achieved a 30% faster release cadence, which enabled Marketing to run two extra promotions this quarter, contributing to an X% increase in new user sign-ups.”

Another example: DORA’s research found that elite performers (good DevOps productivity) were twice as likely to meet or exceed their organizational performance goals (like profitability, market share) compared to low performers Showing this kind of data can convince executives that investing in developer productivity (tools, automation, training) has real ROI.

Good metrics programs don’t stop at engineering efficiency; they trace the impact through to customer and business value. This might mean creating composite metrics like “cycle time to business impact” or tracking the percentage of engineering work aligned with strategic business initiatives. When communicating to business stakeholders, framing productivity in terms of outcomes (features delivered, incidents reduced, users gained) is far more effective than raw tech stats .

Introducing AI-Aligned Productivity

To navigate the AI transition successfully, organizations must build upon their sophisticated foundational measurement strategy by adding on AI-Aligned Productivity.

AI-Aligned Productivity blends trusted frameworks like SPACE and DORA with AI-specific insights. It captures how AI contributes across the software lifecycle and links that contribution to outcomes that matter.

This approach rests on five key principles:

First, track contributions across the entire development lifecycle—not just code, but also review, documentation, onboarding, and testing.
Second, pair AI usage data with outcome metrics to understand not just what AI did, but what it achieved.
Third, implement human-plus-AI attribution models to clarify shared contribution.
Fourth, collect qualitative feedback through regular developer surveys to surface insights on satisfaction, flow, and time savings.
Finally, translate engineering performance into business value, such as improved delivery velocity, reduced risk, or increased customer impact.

AI Contribution to Project Completion

One high-level question is: Did AI help us finish the project faster or better? To measure this, organizations can perform A/B comparisons or pilot studies. For example, during a pilot, some teams use the AI tool and others don’t, and then compare metrics like feature lead time, story points completed, or cycle time. If the AI-assisted teams consistently deliver features faster or complete more scope in the same time, that’s a quantifiable contribution of AI.

While you can’t randomly assign teams indefinitely, even short trials or historical comparisons (before vs after AI adoption) can be illustrative. Another approach is using survey-based attribution: ask engineers how much they feel the AI helped in completing tasks, and aggregate those estimates. If 80% of developers say “Tabnine helped me finish tasks ~20% faster,” that provides a rough quantification.

Tabnine also tracks usage metrics (e.g. how many suggestions accepted, how many times the AI was invoked). These can serve as proxies – if a project had, say, 1000 AI suggestions accepted across its development, one could qualitatively assess that “AI had a hand in many parts of the code.” Ultimately, linking AI to project outcomes should also consider quality and whether the project met its goals. It’s good to look at both speed (did we finish on time or save time?) and outcomes (did AI help achieve the desired performance, security, user satisfaction targets of the project?).

Time Saved in Onboarding, Documentation, and Other Activities

AI tools are not just coding assistants; they also can answer questions (like a chatbot trained on company docs), generate documentation, write tests, and more. To measure time saved in these areas, consider using surveys and time-tracking studies.

For onboarding, you might compare the ramp-up time of new hires now (with AI help) vs a year ago (pre-AI). If new hires reach their first production commit in 2 weeks now versus 4 weeks before, that’s a tangible improvement – though some confounding factors exist, a portion could be attributed to AI if, say, new hires report using Tabnine’s code explore agent extensively.

For documentation, Tabnine’s documentation agent, custom commands, and AI chat can automate the generation of documentation. It can be beneficial to measure how long it took to produce docs before vs now. Perhaps writing a design spec took 10 hours of senior engineer time before, but now an AI can draft 70% of it and the engineer spends 3 hours editing – that’s a 7-hour saving.

Similar logic for testing: Tabnine’s testing agent significantly accelerates the development and implementation of comprehensive test suites. As a result perhaps developers go from spending 2 days on tests to 1 day, effectively doubling testing productivity. One concrete way companies measure this is through internal surveys asking developers: “How much time do you estimate the AI tool saves you per week on tasks X, Y, Z?” If across a team of 50 devs the average reported saving is 3 hours/week, that’s 150 hours/week regained – which is like adding ~4 extra developers worth of capacity, a compelling stat to report.

Another angle is measuring the output of those activities: e.g., number of knowledge base articles written, number of tests created. If those metrics went up after AI introduction without a corresponding increase in time spent, it implies AI helped.

For maintenance tasks like bug triage or code refactoring, our customers use Tabnine’s in IDE AI chat to support their workflows. A helpful measure could be how many bugs were triaged or how many refactors done. Each such instance is time a human didn’t spend. It can be useful to translate that into dollar terms for leadership: e.g. “Our AI documentation assistant saved an estimated 200 hours of engineers’ time last quarter, which is roughly $X in value.”

Code Quality and Security Metrics Influenced by AI

As noted earlier, measuring quality is essential to ensure AI isn’t just creating more work. To gauge AI’s effect on quality, organizations can track metrics like defect density (bugs per KLOC), post-release bugs, code review findings, and security vulnerabilities before vs after AI adoption. One could, for example, compare the bug rate of code written with AI assistance to that of code written manually. If there’s a significant difference, that’s signal.

The Uplevel study found higher bug introduction with AI , so a team might notice an uptick in bugfix commits or tickets linked to areas where AI was heavily used. Tabnine’s Code Review agent assists with this by providing a report on the number of issues found in code review and their severity.

Security metrics: tools like Snyk or Checkmarx could report how many vulnerabilities were found in AI-generated code vs others. Interestingly, AI can also improve security if used correctly (for example, Tabnine’s code review and validation agents check generated code against your specific code quality standards).

A good metric here is vulnerability escape rate – are more security issues slipping to production due to AI-written code? Or perhaps AI-assisted code review (like Tabnine’s code review agent) catches issues faster, which you could measure by time to remediate vulnerabilities. Additionally, monitor code churn specifically for AI-written code: if churn is double as predicted , that suggests quality issues or misaligned suggestions.

I recommend focusing on outcome metrics (defects, quality) rather than naive counts when evaluating AI impact . So, for every “AI boosted output by X%,” one should also report “and here’s what happened to our defect rates or reliability.” Ideally, AI’s contribution should be a positive or neutral on quality; if it’s negative, processes need adjusting.

AI Usage Metrics and Developer Sentiment

Another form of measurement is tracking how widely and frequently AI tools are used in the org. This is somewhat meta, but it shows adoption (which is a precursor to impact). For example, “80% of our developers are now using the AI coding assistant daily” is a metric indicating that the tool has become integral. High adoption usually means developers find value in it. This is a fairly simple metric to track and is supported in the Tabnine dashboard.

You can also measure developer sentiment about AI through surveys: e.g. ask “Does the AI assistant improve your productivity?” with a rating scale. These sentiment scores can be presented alongside hard metrics. If 90% say “yes, it’s helpful,” that’s a strong indicator of impact (even if you can’t quantify every aspect).

In surveys done by our customers and during proof of value trials, the majority of Tabnine users report it improves their coding satisfaction and efficiency. Tracking sentiment over time can show if improvements to the AI (or perhaps new policies around AI use) are having an effect.

How Tabnine Helps

Tabnine partners with engineering organizations throughout their AI adoption journey to help identify high-impact opportunities across their software development lifecycle. Rather than replacing existing productivity measurement tools, Tabnine works within each organization’s existing productivity frameworks to highlight where AI agents can deliver measurable improvements.

We enable organizations to deploy AI agents across every stage of the SDLC—from coding and reviewing to documentation and testing—so that engineering leaders can understand precisely where Tabnine is delivering value. Our platform provides visibility into the quality and utility of AI-generated code, ensuring it meets internal standards, avoids intellectual property liability, and is actually adopted by developers in day-to-day workflows.

By surfacing how and where our agents are trusted and effective, we help engineering leaders make informed decisions about AI adoption and scale. Tabnine’s goal is not to replace your dashboards or existing measurement tools but to instead help you deliver clear business impact through the adoption of AI agents.

As You Add AI, Build Upon Your Measurement Foundation

AI can transform software development. But if your organization doesn’t understand what productive, high-quality engineering actually looks like, it’s challenging to identify and articulate the impact.

High-performing organizations are not waiting. They’re investing now in modern, meaningful measurement frameworks. They’re building the foundations for sustainable AI-driven performance.

Before you scale AI, fix how you measure productivity.

March Changelog

Posted on April 1st, 2025

Last month, we drew a line: Tabnine isn’t chasing the hype cycle. We’re building for the real world of software engineering—where systems are complex, constraints are real, and quality matters.

AI should adapt to your developers, your policies, and your infrastructure—not the other way around.

This month’s updates deliver on that promise. More control. Deeper context. Smarter governance. These aren’t features. They’re building blocks for a future-proof platform: enterprise AI that’s fast, trusted, and secure by design.

Context Scoping That Understands Real Engineering Workflows

In large-scale environments, code doesn’t live in a neat local folder. It’s spread across terminals, remote repositories, mounted volumes, and multi-tool dev chains. But most AI tools can only see a narrow slice of that reality—making their suggestions unreliable, or worse, misleading.

This month, we expanded Tabnine’s context scoping to include remote files, folders, and active terminal sessions. That means Tabnine can now reason with a broader and more accurate picture of what the developer is actually doing.

This isn’t just an enhancement—it’s a requirement. Enterprise AI without access to meaningful context is like a senior engineer who’s only read half the ticket. This is a direct answer to a common pain point with generic AI tools: “It’s fast, but it doesn’t know what I’m doing.”

Because in the enterprise, context isn’t just nice to have—it’s everything.

Provenance, Attribution, and Censorship: Control Where Your Code Comes From

As AI-generated code becomes a bigger part of your codebase, the question of where it came from matters more than ever. Engineering leaders are rightfully concerned about license contamination, IP leakage, and the downstream risk of unauthorized code inclusion.

This month, we’ve taken two major steps to help enterprises own their security posture when it comes to provenance:

Attribution controls and database updates ensure that the AI suggestions your developers see remain aligned with your policies around third-party code and license use.
Censorship toggles allow organizations to proactively block Tabnine from suggesting any code with restricted origins or privileged patterns.

This isn’t just about compliance—it’s about trust. Your developers need to move quickly without second-guessing every suggestion. Your security team needs to know that those suggestions are clean, trackable, and policy-aligned.

Tabnine makes that possible, and we’re continuing to lead the industry in enterprise-grade attribution and provenance infrastructure.

Tabnine Chat: Responsive, Contextual, and IDE-Native

Tabnine Chat is becoming the primary interface between engineers and their AI assistant. But in order to earn that position, it has to be fast, intelligent, and above all—non-intrusive.

We’ve redesigned the Apply button for better usability and performance. It now works seamlessly, implementing suggestions provided by Tabnine Chat regardless of your cursor location or page state—ensuring that inserting a suggestion is immediate, intuitive, and doesn’t interrupt your flow.

We’ve also expanded indexing support to Perforce, allowing even teams using legacy or specialized VCS tools to benefit from full-chat context awareness.

These improvements are part of our belief that AI should adapt to you, not the other way around. Where other tools push centralized chat UIs or web-first workflows, Tabnine is meeting developers inside their environment—on their terms.

Full Model Flexibility: Claude 3.7, Gemini 2.0 Flash, and Admin-Level Control

AI model choice is no longer just a question of performance—it’s a matter of policy, architecture, and preference. And in March, we’re delivering the flexibility enterprise customers have been asking for.

We’ve added:

Claude 3.7 Sonnet as the new default model for Enterprise SaaS deployments via Tabnine’s endpoint.
Gemini 2.0 Flash as a supported model for private deployments via Vertex AI.
A new admin toggle that allows complete control over which models are active—including the ability to disable Tabnine Protected entirely.

This shift is critical for organizations who want to fully customize their AI architecture. Whether you’re standardizing on a specific provider, managing latency/cost tradeoffs, or ensuring alignment with internal risk policies—you’re now in full control.

Other tools bake in their own models and force you to opt out. With Tabnine, you decide what models your engineers can use, where they run, and how they’re governed.

SMTP with OAuth: Aligning with Enterprise-Grade Security Standards

For self-hosted, air-gapped, or security-sensitive environments, Tabnine continues to evolve into a deeply configurable platform. This month, we’ve added support for SMTP with OAuth, replacing legacy user/password authentication with modern, token-based email integration.

Why does this matter? Because authentication is often the silent failure point in enterprise tooling. Static credentials introduce risk. OAuth tokens align with security best practices—and are often mandatory in organizations with hardened security postures or centralized SSO enforcement.

This update allows Tabnine to integrate more safely with internal systems while aligning with your org’s identity and access control standards. It’s another example of how we’re not just adding features, but building toward a platform that respects and reinforces your security architecture.

Looking Ahead

As we said last month: flexibility without governance is a non-starter. This month’s releases are about proving that those two things can—and must—coexist in a modern AI platform.

With smarter context awareness, safer identity syncing, secure system integration, stronger attribution controls, and unmatched model flexibility, Tabnine continues to evolve into the platform of choice for teams leading a new renaissance in software development—where AI enhances creativity, accelerates delivery, and elevates engineering itself.

Curious how it all works in practice? Have questions, or want to see Tabnine in action?We host Tabnine Office Hours every Wednesday— where engineering leaders, platform teams, and AI champions come together to share insights, ask questions, and influence our roadmap. We’d love to see you there. Reserve your spot for our next Office Hours.