Hidden test dependencies
We all know each one of our tests should be independent, self-contained and therefore able to run by itself or in parallel with other tests with deterministic results. However in real life we many times encounter tests that sometimes pass and sometimes fail. There are many different sources of non-deterministic outcomes of tests. In this post, I would like to specifically focus on tests that depend on a global state which they don’t control for sufficiently.
This might be a global (process) variable, filesystem, etc. One test might not clean-up this global state, the next test might not initiate all it’s assumptions completely. If executed in sequence, in the natural order, neither the developer nor the CI ever notice a problem. However, a hidden dependency between tests had been introduced and will cause problems later.
- dependent tests have a deterministic outcome, it's just that ordering of tests is one of the inputs. That's why I think it's useful to clearly distinguish between them and other non-deterministic/flaky tests.
- many times, the incorrect code which causes the test to fail is seemingly unrelated and "far" from the test itself.
- the tests are passing when executed in the usual ordering (CI)
I would like to share 2 simple techniques which help to tackle dependent tests, that I didn't see in other sources:
- automatically run each test in isolation
- execute all tests in reverse order
For demonstration I have this test suite in test_dep.py:
import locale def test1(): assert locale.str(0.1) == '0.1' def test2(): locale.setlocale(locale.LC_ALL, 'cs_CZ') assert locale.str(0.2) == '0,2' def test3(): assert locale.str(0.3) == '0,3'
When I execute this test suite with
pytest everything seems to be fine.
Automatically run each test in isolation
There is a pytest plugin called pytest-forked which executes each test in a forked process and transmits the results to the master runner. This is a good start because it will identify tests that are clearly not standalone and don't work by themselves. I executed this with my sample test suite
$ pip install pytest-forked ... $ pytest --forked -v
and I got:
collected 3 items test_deps.py::test1 PASSED [ 33%] test_deps.py::test2 PASSED [ 66%] test_deps.py::test3 FAILED [100%] ===================== FAILURES ===================== ______________________ test3 _______________________ def test3(): > assert locale.str(0.3) == '0,3' E AssertionError: assert '0.3' == '0,3' E - 0.3 E + 0,3 test_deps.py:14: AssertionError ======== 1 failed, 2 passed in 0.09 seconds ========
pytest test_deps.py::test3 I can confirm that indeed the test doesn't run by itself which allows me to work on a fix.
Execute all tests in reverse order
Existing articles recommend executing the tests in random order to identify some of the dependencies. That is a valid suggestion, however, I think a special case of "random" order deserves priority treatment. Let's execute the tests in reverse order. That should disturb the most dependencies which were unconsciously created while developers and CI were executing the tests in the natural order.
I didn't find any pytest plugin which would accomplish this. However, it's very easy to customize the pytest run through conftest.py like this:
def pytest_collection_modifyitems(items): items.reverse()
After placing this conftest.py into the test root directory I executed
$ pytest -v
with this result
collected 3 items test_deps.py::test3 FAILED [ 33%] test_deps.py::test2 PASSED [ 66%] test_deps.py::test1 FAILED [100%] ===================== FAILURES ===================== ______________________ test3 _______________________ def test3(): > assert locale.str(0.3) == '0,3' E AssertionError: assert '0.3' == '0,3' E - 0.3 E + 0,3 test_deps.py:14: AssertionError ______________________ test1 ______________________ def test1(): > assert locale.str(0.1) == '0.1' E AssertionError: assert '0,1' == '0.1' E - 0,1 E + 0.1 test_deps.py:5: AssertionError ======== 2 failed, 1 passed in 0.06 seconds ========
two tests are failing which is valuable information. We already know that test3 doesn't work by itself, so we leave it aside. Regarding test1 and test2 there seems to be an interaction between them. We can confirm this by executing
pytest test_deps.py::test1 test_deps.py::test2 vs
pytest test_deps.py::test2 test_deps.py::test1, getting different result. With this information, we can work on a fix. In this case, we would probably choose to improve test2 so that it cleans-up after itself properly. In a larger test suite, we might not be able to identify the code "corrupting" the global state, but at least we can harden the failing test (test1) set-up so that it's resilient in such case.
In this article, I focused on non-deterministic test outcomes stemming from different ordering/selection of tests. While there is no silver bullet my two tricks will help expose tests that don't set-up their environment well enough.