Blog

Autonomous SDLC: A Test Product Perspective with Modern Software

The Checksum team

March 5, 2025

Checksum CEO and co-founder Gal Vered recently sat down with Mike Verinder of Modern Software for a wide-ranging conversation on autonomous testing, the future of the SDLC, and why quality is the missing layer in AI-driven engineering. If you want to watch the full conversation, the video is above. Here are the highlights.

Where Checksum started

Checksum was not built in response to ChatGPT. Gal and his co-founders identified the problem years earlier: as engineering teams scale, maintaining confidence that a code change did not break something else becomes increasingly difficult. The result is engineers spending up to 50% of their time firefighting instead of building.

End-to-end testing solves this, but writing and maintaining a test suite is itself a significant project. Checksum was built to take that burden off the team entirely, generating Playwright tests automatically and keeping them current as the product changes.

Why open source matters for testing

Checksum made a deliberate bet on open source. Tests are delivered as Playwright code via pull request, living in the customer's own repository and running in their existing CI pipeline. The reasoning is straightforward: a test suite is only as useful as the engineers connected to it. If tests live in a platform nobody visits, they stop being useful. Keeping tests in code, in Git, with tools engineers already know, is what makes the signal trustworthy.

The autonomous SDLC: a long tail problem

Gal draws the analogy to autonomous driving. Getting to 80% happened fast. Getting to fully autonomous is a much longer tail. The same is true for software development. Tools like Cursor make engineers significantly more productive, but full autonomy requires a feedback loop, and that feedback loop requires reliable testing.

Checksum's long-term vision is to be the quality layer that makes autonomous AI engineering possible. An AI agent writes code, Checksum tests it, failures feed back into the model, and the cycle continues. That loop already exists in practice today, just with a human engineer in the middle.

Manual testing is not going away

Counterintuitively, Gal sees a continued role for manual testing, not for regression coverage, but for the edge cases that require deep product context. One customer out of a thousand using a feature in an unusual way, a configuration that only surfaces under specific conditions. Automation covers the core. Manual covers the known unknowns that are not worth automating.

How Checksum handles hallucinations

Rather than one large model trying to do everything, Checksum runs a system of small, specialized models. One summarizes HTML. One decides the next action. One generates the best selector. One writes assertions. Breaking the problem into narrow, specific tasks keeps accuracy high, costs manageable, and failures diagnosable.

Privacy by design

Checksum only operates on front-end behavior, the same surface any customer of that web app can see. No access to source code, backend systems, databases, or PII. For environments where even that requires extra care, sanitization mechanisms are in place.

Want to go deeper? Watch the full conversation with Gal and Mike below, then request a demo to see how Checksum works on your own app.

The Checksum team

We're the team behind Checksum, building tools that help developers ship software they can trust. Our mission is to make quality a seamless part of the development process, not an afterthought.