-
-
Notifications
You must be signed in to change notification settings - Fork 322
Against the ‘QA‐free’ philosophy
I recently watched Pieter Levels' interview with Lex Friedman and dove through the comments. Most discussed one hallmark topic: Pieter’s 1-click deployment system that only leverages local tests for testing. In Pieter’s own words, he’s notorious for not setting up staging servers. He prefers a stripped-down pipeline, something that enables him to ship fast.
I’m watching all of this and thinking… must be nice.
Don’t get me wrong; Pieter’s lean process is fantastic and incredibly efficient. But it’s something that’ll only work for an indie hacker. Once upon a time, companies thought they could move like him—think back to Facebook’s notorious move fast and break things motto. But that didn’t age well; in 2014, Facebook (hilariously) shifted its motto to move fast with stable infrastructure. The rebrand served as a perfect example of how fleshed-out engineering teams are forced to grapple with QA differently.
For those unfamiliar, QA stands for quality assurance. It’s a fancy term we’ve assigned to a set of processes that ensure that a product works and will perform as expected when delivered to customers. Think unit tests, human tests, and staging servers. Nowadays, companies develop quality assurance processes quite early, typically before even a Series A rolls around.
For us, QA is a big deal. We ship a billing infrastructure product; our code enables apps to make money. So if our product breaks, serious revenue could be lost. It’s tough because there are a lot of intricate parts. Billing is complex, stuffed with critical subprocesses such as cost calculation, triggered invoices, or charge captures.
To make things even trickier, we are proudly an open-source product. If we publish a bad release, it’ll be in the public ether forever. And that’s not just a vanity thing; a bad release runs the risk of broken code getting cloned and deployed.
Consequently, we’ve spent a lot of time developing a strong QA process. And in the past year, we have found one that works. It’s a process that I’m fairly proud of, especially because we’ve dramatically reduced errors in production. Today, I want to discuss our learnings, covering how Lago does QA and why it works for us.
At Lago, we strongly believe in [good, detailed specs](https://www.getlago.com/blog/how-we-ship-fast-our-framework) that account for all possible behaviors. Our attitude is: if it’s not in the spec, it’s not part of the feature.
We use specs to map our new features, considering all the pertinent questions about our product. Questions like… Is there a calculation change? Does it impact existing subscriptions? Does it impact past or new invoices? It’s not just about the hazards, but also the automatic and manual tests needed to safeguard us.
To be clear, not all of our specs are complex. If we’re adding a simple SaaS feature, the corresponding spec tends to be short and sweet. Conversely, if we’re impacting any component of our billing logic, then our spec will be lengthy, taking into account every possible breaking point.
Once the spec is finished, we’re clear to build the feature. This part is pretty self-explanatory. Mock-ups are designed, code is written, deadlines are met. The tricky part is what comes next.
The goal of writing technical tests is to ensure that there is no regression (i.e., breaking any previously working features). This includes both critical billing behaviors and any minor bugs. We need tests that’ll tackle the whole span of these.
After running tests, backend and frontend teams need to ensure that the resulting behavior matches what’s in the spec. But this is just part one. Then, the build is off to the product team.
Once the tests are finished, developers create a QA branch so that the product team can scrutinize it. This hand-off is important at Lago; the product team has the final say, as they are masters of both design and UX.
Notably, Lago’s product team does their checks synchronously over a video call. We've found that doing synchronous checks is important because it helps us unveil new scenarios as we brainstorm possible edge cases. Async doesn’t offer the same benefit; async encourages us to strictly stick to a plan. And while we believe in specs, we also acknowledge that only after a feature is built do we sometimes see that elusive edge case. There have been plenty of times when we’ve realized an unhappy path that was invisible before.
(Curiously, as a side effect, these calls help unite the team. It’s one of the few opportunities we get to chat verbally, as most of our comms are on Slack.)
While the product team leads these calls, engineers are still looped in. This helps spread accountability and, should errors arise, makes it easy to plan next steps.
If all the behavior is validated, the product team posts a message on Slack to clear a code merge.
Contrarily, if there is failing behavior, the product team creates QA returns, with one Linear card pegged to each return. Each of these returns is assigned to an engineer to tackle. Once all the QA returns are finished, we’ll re-perform QA from scratch.
Next, we’ll merge the passing code with the main
branch. This is important because we can use previous data attached to the main
branch to ensure that things still work. This data is more “real world” than test data.
We do more testing, asking the question, “What else could break?”
Even when releasing the product to prod, we take a staggered approach out of caution. First, we release it on the cloud version of Lago (either the US or EU branch). This enables us to revert in the case of a problem. If the deployment clears, we’ll release it to the other branch.
Once our paid-plan customers and beta users have tested the feature (either for a few days or upwards of a week), we’ll then release it to the open-source GitHub repo. This is a win-win for everyone; paid-plan customers don’t have to wait for in-demand features, and our open-source branch maintains the lowest risk profile.
Our QA process was designed for us. We ship a high-impact product where an outage could dish out serious damage. That’s why our process is considerably heavy; quite far from Pieter Levels’ QA-free philosophy.
Hopefully, if you work at a company, some of these techniques are helpful to you even if you ship a less risky product. In general, we’ve found that this QA pipeline results in significantly fewer hiccups, and that saves time in its own way. It also reduces stress and keeps our customer base happy. At the end of the day, that’s what really matters.
Raffi