Research · Software Testing

What Are Order-Dependent Flaky Tests?

Order-dependent flaky tests (OD tests) are a specific category of flaky test that only fail when executed after a particular 'polluter' test. The polluter modifies some shared state — a database, a static variable, a file — and does not restore it. When the victim test runs next, it encounters unexpected state and fails.

By Hasnain Iqbal · Updated 1 April 2025

The Polluter-Victim Relationship

Every order-dependent flaky test involves at least two tests:

The polluter — a test that modifies shared state and does not restore it after the test completes. On its own, the polluter passes.

The victim — a test that assumes a certain initial state. When it runs after the polluter, that assumption is violated and the test fails. In isolation, the victim also passes.

The insidious part: both tests pass individually and in most orderings. The failure only surfaces when the victim runs immediately (or eventually) after the polluter. In a large test suite run in a random order, this specific pairing may occur rarely — making the test appear to "work most of the time."

There is also the state setter-brittle test pattern: a test only passes when run after a specific other test that sets up state the brittle test depends on. This is the inverse relationship.

Common Sources of Shared State

The shared state that OD tests exploit comes from several sources:

Static/global variables — class-level variables in Java or Python that persist across test method boundaries.

Database state — tests that insert or delete records without rolling back transactions.

File system — tests that create, modify, or delete files or directories.

Environment variables — tests that set environment variables and do not unset them.

Singletons and caches — application-level singletons or caches initialised once and shared across tests.

JVM or runtime state — class loading, security managers, or locale settings changed by one test.

Why They Are Hard to Detect

OD tests are particularly difficult to detect for several reasons.

Exponential search space — for a suite of N tests, there are N! possible orderings. Exhaustively testing every ordering is infeasible for any suite with more than a few dozen tests.

Low individual probability — in a large suite run in a single standard order, a specific polluter-victim pairing may never occur, hiding the OD relationship entirely.

CI randomisation — many CI systems run tests in a fixed order by default. The OD relationship only becomes visible when randomisation is introduced.

Inter-test distance — the polluter does not need to immediately precede the victim. State pollution can persist across many intervening tests, making the causal relationship hard to trace.

Research I have conducted at IIT, University of Dhaka, explored how to prioritise potential OD test pairs for re-execution — dramatically reducing the number of runs needed to confirm OD relationships, which is the main bottleneck in OD test detection at scale.

Detection Tools and Approaches

iDFlakies (University of Illinois) — runs tests in randomised orderings and collects failure data to classify flaky tests by type.

DeFlaker (University of Massachusetts) — detects flaky tests by comparing test results between the current and previous commits.

SHAKER — a static analysis tool that identifies tests likely to be order-dependent by analysing shared state access.

Prioritisation-based detection — rather than random shuffling, identify test pairs that are most likely to have an OD relationship based on shared state analysis, and test only those pairs first. This significantly reduces CI time.

Fixing Order-Dependent Flaky Tests

The fix for OD tests is hermetic test design:

1. Setup and teardown — use setUp/@Before and tearDown/@After methods to initialise and restore all state required by each test.

2. Database isolation — wrap each test in a transaction that is rolled back, or use an in-memory database that is recreated per test.

3. Static state reset — explicitly reset all static variables in teardown, or refactor to avoid static state entirely.

4. Dependency injection — inject dependencies rather than using global singletons, making it easier to provide fresh instances per test.

5. Test doubles — use mocks and stubs for external resources so that shared state is never modified.

Related research

I published research on this topic at the International Flaky Tests Workshop (FTW), co-located with IEEE/ACM ICSE 2025 — exploring how to prioritise potential order-dependent flaky tests to reduce CI re-run costs.

Read the paper →

Frequently asked questions

What is an order-dependent flaky test?

An order-dependent flaky test is one that only fails when run after a specific other test (called the polluter). The polluter leaves shared state that the victim test incorrectly depends on, causing it to fail in that particular ordering.

How do I detect order-dependent tests?

The standard approach is to run the test suite in multiple different orderings and flag tests that produce inconsistent results. Tools like iDFlakies automate this. Prioritisation-based approaches reduce the number of orderings that need to be tested.

What is a polluter test?

A polluter is a test that modifies shared state — a database, a static variable, an environment variable — and does not restore it after the test completes. On its own, the polluter passes. It only causes problems when it runs before certain other tests that rely on the original state.

Are all flaky tests order-dependent?

No. Order-dependent (OD) tests are one category of flaky tests. Other categories include async/timing-related tests, tests with external dependencies, tests affected by network conditions, and tests that depend on platform or environment characteristics.