Checksum's Vision: Building the Missing Piece for Autonomous AI Engineering

Checksum develops AI agents for self-healing and self-correcting software.

State-of-the-art models are already pretty good at writing code, and engineers who learn how to utilize AI can write software faster. English will become the programming language of choice, although developers still need to articulate the problem and sometimes dictate solutions.

The next obvious evolution of AI code generation is to move from copilot to full captain. An agentic AI developer that goes beyond writing short snippets of code, to developing and deploying an ever-growing body of work.

But we’re already seeing the slow-down in LLM code-writing improvements. LLMs today are like junior engineers; they have limited understanding of the system in which their code operates. They do not understand the full codebase, models, past issues, customers, prod usage, etc. We shouldn’t expect them to, because from a structural perspective, the requisite data they need for training is beyond their grasp.

The delta between GPT3.5 to GPT4 was massive, but the gains have significantly slowed as we reach a Pareto threshold: The last 20% of the job takes 10X more than the first 80%. We’ve seen this stage in autonomous cars, where 10 years ago everyone thought that self-driving cars were just around the corner, only to discover that far more work remained.

Agentic AI Engineering problem is not just a code generation problem. It is, ultimately, a ‘system of systems’ challenge. And this is where LLMs fall short.

Modern tools try to address this by indexing the codebase and increasing the context window to allow for more data to be fed to the models. But modern software engineering is not an exercise in kitchen-sink ingestion and memorization of code. It requires an aerial understanding of how the system operates, its contours and corner points. It requires years of experience building the system, living through failures, and anticipating the unintended consequences of prior changes.

This level of context is missing from current LLMs who are trained on the open web. And without a layer of testing and analysis, LLMs will always be a midwit engineer who simply memorized the documentation, even with an unlimited context window.

Checksum plays this role. By focusing on the post-coding layer - test generation and execution, bug detection and fixing and production usage - Checksum’s AI understands the system as a whole and, even more important, how it will behave in the wild as new users interact with the system.

Checksum does not generate the code itself. That’s OpenAI’s and Antropic’s, etc, job. Checksum provides “real-world” code analysis and auto-correction based data from the full development cycle and live session and activity data from actual human users – critical data that cannot be ingested or scraped from any known source. Failed tests, PRs, bug fixes and production usage - all feed models that are trained in system level engineering.

The only way to push the envelope of development further—first by allowing non-technical users to code in English and then developing real autonomous agents—is through an immediate feedback loop of tests that verify both new code written and regression of previous features.

General Purpose LLMs are great coders, and Checksum - feeding feedback, detecting and correcting errors - is the missing piece that is needed to pave the way to a true autonomous AI engineer.

And as Checksum sees more failures, analyzes more root causes, and observes more fixes—either by a human or AI—we can start to predict failures and prescribe solutions earlier. Just as software engineering moved from episodic releases to Continuous Integration and Deployment, with Checksum we move from episodic testing in staging and production to Continuous Quality.

Today, we do that on the end-to-end level once the code is ready. In the future, we’ll be able to match code patterns—as code is written—to previously created bugs.

By doing so, Checksum plays a fundamental role in creating AI coding agents, as well as solving immediate real world problems.