Spark Joy by Running Fewer Tests - Shopify
Developers write tests to ensure correctness and allow future changes to be made safely. However, as the number of features grows, so does the number of tests. Tests are a double-edged sword. On one hand, well-written ones catch bugs and maintain a program’s stability, but as the code base grows, a high number of tests impedes scalability because they take a long time to run and increase the likelihood of intermittently failing tests. Software projects often require all tests to pass before merging to the main branch. This adds overhead to all developers. Intermittently failing tests worsen the problem. Some causes of intermittently failing tests are
timing
instability in the database
HTTP connections/mockings
random generators
tests that leak state to other tests: the test passes every single time by itself, but fails other tests depending on the order.
Unfortunately, one can’t fully eradicate intermittently failing tests, and the likelihood of them occurring increases as the codebase grows. They make already slow test suites even slower, because now you have to retry them until they pass.
I’m not implying that one shouldn’t write tests. The benefits of quality assurance, performance monitoring, and speeding up development by catching bugs early instead of in production outweigh its downsides. However, improvements can be made. My team thus embarked on a journey of making our continuous integration (CI) more stable and faster. I’ll share the dynamic analysis system to select tests that we implemented, followed by other approaches we explored but decided against. Test selection sparks joy in my life. I wish that I can bring the same joy to you.
Problems with Tests at Shopify
Tests impede developers’ productivity here. The test suite of our monolithic repository:
has over 150,000 tests
is growing by 20-30% in size annually
takes about 30-40 min to run on hundreds of docker containers in parallel.
Each pull request requires all tests to pass. Developers have to either wait for tests or pay for the price of context switching. In our bi-annual