Manual testing is a common practice for software engineers early in their career. Every engineer’s first programs are hand-tested. As you learn through education or real-world experience, you see larger systems with complex use cases, and intuition makes it obvious that automated testing is really helpful.
Automated tests can provide the flexibility to change to evolving business needs by checking things work as intended, and stay that way. They make it easier to onboard and train new engineers by providing a kind of documentation and some guard rails. They can also help with paying down technical debt, by creating confidence you didn’t break anything that currently functions. In a sense, automated tests are about one thing: confidence in managing your software.
In this post, I’ll share how we think about testing — how we value it and how we weigh the different tests that are available in an engineer’s toolkit. By reading this post, I hope you’ll walk away with some new ideas and perspectives that you can apply to your own automated testing suite.
How should I value tests?
There are two dimensions for thinking about the value of tests. First, how much value does a test run provide? Intuition would tell us something along the lines of "for each test we learn that the thing it's testing works". It's true that we obtain the data that the thing still works. But data isn’t the same as information or learning. In terms of the information you get from running tests, it's more complicated.
Let’s digress momentarily on why it’s important to understand data versus information. Data is just figures and facts that have no meaning by themselves. Information is what we get when we derive learning from data (usually from aggregated data). The value of testing lies in the information we get from our tests, not the data. We want to learn about our programs!
With this separation in mind, let’s do a thought experiment: how much information would you get from a test that passes every time? The answer is none! Tests that always pass provide no information. So a point of reflection could be: how often do my changes meaningfully break the test suite? That's an important indicator of the value of your test suite. Another way to think about it is: how often does the test suite actually teach me something about the changes I want to make to the system?
How often do my changes meaningfully break the test suite? That's an important indicator of the value of your test suite.
A second dimension for valuing tests is the business side. Start with why we build software: to provide value to customers. Customers only care about how the system behaves as perceived by them. Another way to put that is that the only real value of a piece of software is its behavior as a whole because that's how customers experience the software, and therefore make decisions about how to value it. Bringing this back to tests, it stands to reason that the closer a test is to the user, the more business value it has.
There’s also an interesting aspect of dog-fooding here. As engineers are writing tests closer to how the customer perceives the software, they are learning about the real value of the software — the customer’s experience. So a test’s proximity to the customer’s experience is an important dimension to consider when thinking about how to value your tests.
How valuable are different types of tests?
Ok so now we have two dimensions of thinking about the value of tests:
- How often do they break and teach us something about the code changes we want to make?
- How close are they to what the customer cares about?
These aren’t the traditional ways to allocate testing. The default choice for approaching this is the testing pyramid, which is highly focused on test-run speed. This is the diagram from the great Martin Fowler’s site:
This isn't an original thought but it's what I’ve come to believe: the testing pyramid is bullshit. You see these posts come around once in a while on Hacker News, and I agree intellectually and experientially. Generally, the testing pyramid idea breaks down tests into three types.
- End-to-end (E2E) tests - The most expensive so you just want a sprinkle.
- Integration tests - Medium expensive, so have a modest amount.
- Unit tests - These are cheap so have many. Hey, just go nuts.
We’ll go into more detail for each test as examine them against our established dimensions for evaluating the value of tests, meaningful failures and customer value.
End-to-end tests check what the customers actually care about. These are the highest value in the customer dimension since they simulate the customer’s full experience.
How about how meaningfully they break? E2E tests break in meaningful and meaningless ways. A meaningful E2E test fails because the change made inadvertently breaks some seemingly unrelated thing over in another area. This happens with some regularity with a comprehensive set of E2E tests.
But there’s another type of regular failure with E2E tests. There’s a brittleness to E2E tests that can really hurt. Something like "oh Chrome pushed a new update last night that changed some subtle behavior of how cookies get managed that breaks a few of our tests".
Another brittle quality in terms of their relationship with engineering is their performance — they're really slow. So E2E tests check literally what the customers care about, but break regularly in meaningful and meaningless ways.
Integration tests are somewhere in the middle here, and different ecosystems and frameworks have different ways of what exactly they consider an integration test. A common definition derives from the meaning of the word integration: they test a bunch of parts of the system actually put together. They’re closer to what customers value and experience from the software. They break meaningfully, often similarly to E2E tests. But less brittle, primarily since they don't involve browsers. Integration tests score in between E2E and unit tests on customer value and experience.
Unit tests score close to zero in relation to the customer experience. Customers don't have any idea what engineers consider a unit to be, and a lot of times the engineers don't either.
Unit tests either break infrequently (many of us have seen unit tests that just passed meaninglessly for years), or for some dumb, low-information reasons. "I'm refactoring this class to add a new feature and broke a bunch of unit tests while the integration and E2E tests continue to pass" is a common pattern. Another example is unit tests breaking because some minor change caused a bunch of mocks or stubs to be wrong.
What we’re doing at Convictional: more integration tests
We've looked at the value of different kinds of tests against the provided dimensions. Is there another way of thinking about how to approach testing our system? I think so. It's a different allocation than the testing pyramid. The pyramid is correct to frame the problem as return-on-investment. We should be thinking about tests in return-on-investment terms. But where’s the most return in terms of holistic value? This is where I get to the punchline of this post: integration tests. Integration tests are the best return-on-investment because they test closer to the customer’s experience of the software than unit tests and break more meaningfully. They aren’t as close as an E2E test, but they run faster and are less brittle.
Kent C Dodds put up this diagram of testing investment (which he describes as “the testing trophy”) a while back and it’s a good visual to describe how we see testing allocation:
Since adopting this way of thinking about testing, we’ve started implementing integration tests at the API level, which test a fully integrated backend server. We’re calling these functional tests since they’re a subcategory of all the different types of integration tests we have. They test the outcomes of common customer use cases and flows, like building a quote or changing an order. Notably, they don’t test the UI, edge cases, or regressions since these are better handled by E2E or unit tests. We’re also leveraging these functional tests as a way to validate (and eventually, generate) our documentation so we’re confident it remains correct. We’re getting more comfortable making changes and confidence in how we evolve our software, and spending less time on low-value unit tests. This is allowing our engineering to move faster, and produce more value for our customers.
Want to learn more?
I hope this post can help you find some new ideas and perspectives on how to think about the automated testing suite for your software. If you want to read more by someone who's spent far more time on this, check out James Coplien's Why Most Unit Testing is Waste. That paper was hugely influential on my own thinking and this post.
If you’re interested in deep thinking about building software, you’d like working at Convictional. We’re hiring for engineering, come check us out!