Research · Software Testing

What Are Flaky Tests?

A flaky test is an automated test that sometimes passes and sometimes fails on the same code, without any change to the application being tested. They are one of the most frustrating problems in software engineering: they waste developer time, erode trust in the test suite, and slow down CI/CD pipelines.

By Hasnain Iqbal · Updated 1 April 2025

Why Flaky Tests Happen

Flaky tests arise from non-determinism in the test environment. Common causes include:

Test order dependency — one test pollutes shared state, causing a later test to fail only when run in a specific sequence.

Asynchronous operations — tests that rely on timing, network calls, or delays that are not reliably controlled.

Shared global state — static variables, databases, or file system state shared across tests that is not properly reset between runs.

Resource contention — tests that compete for ports, files, or environment variables when run in parallel.

External dependencies — calls to third-party APIs, email services, or time-sensitive operations.

Of these, test order dependency is particularly difficult to detect because the failure only surfaces when the suite is run in a specific order — which may happen rarely in local development but frequently in CI.

Why They Matter in CI/CD

In continuous integration pipelines, flaky tests cause two compounding problems.

First, they produce false negatives — a developer's correct code change is flagged as broken because an unrelated flaky test failed. The developer investigates, finds nothing wrong, and re-runs the build. This is sometimes called a "restart tax."

Second, they produce false positives — a real bug is masked because the test that should catch it is known to be flaky and is being ignored or automatically retried.

Studies have found that flaky tests can account for 14–16% of test failures in large CI/CD pipelines, and that developers spend significant time investigating failures that turn out to be non-deterministic rather than real regressions.

Detection Strategies

Several approaches exist for detecting flaky tests:

Repeated execution — run the test suite multiple times and flag tests that produce inconsistent results. This is reliable but expensive in CI time.

Test reordering — run the suite in different orders to surface order-dependent flaky tests. Tools like DeFlaker and iDFlakies use this approach.

Static analysis — examine test code for known anti-patterns (e.g., use of sleep, reliance on current time, missing teardown).

Machine learning — train classifiers on historical test execution data to predict which tests are likely to be flaky before they manifest.

Prioritization — rather than running all tests in every possible order, identify and prioritize the tests most likely to be order-dependent, reducing the search space significantly. This is the approach explored in my research at IIT, University of Dhaka.

How to Fix Flaky Tests

Fixing flaky tests depends on the root cause.

For order-dependent tests, the fix is to make each test hermetic — it sets up its own state in a setup method and tears it down afterward. Using test doubles (mocks and stubs) for shared resources helps isolate tests from each other.

For async and timing issues, use explicit waits or event-based synchronization rather than fixed sleep calls. Most testing frameworks offer utilities for this.

For external dependencies, mock or stub the external service in tests. Use contract testing (e.g., Pact) to verify the integration separately.

For shared state, enforce isolation at the test framework level — reset databases, clear caches, and reinitialise singletons between tests.

When immediate fixing is not feasible, quarantine the flaky test by moving it to a separate suite that runs separately and does not block the main build. Track quarantined tests and set a deadline for fixing or deleting them.

Related research

I published research on this topic at the International Flaky Tests Workshop (FTW), co-located with IEEE/ACM ICSE 2025 — exploring how to prioritise potential order-dependent flaky tests to reduce CI re-run costs.

Read the paper →

Frequently asked questions

What is a flaky test?▼

A flaky test is an automated test that produces inconsistent results — sometimes passing, sometimes failing — on the same codebase without any code change. The failure is caused by non-determinism in the test environment, not a bug in the application.

How do I find flaky tests in my test suite?▼

The most reliable method is repeated execution: run your test suite multiple times and flag tests with inconsistent results. For order-dependent flaky tests specifically, reorder the tests across runs. Tools like iDFlakies and DeFlaker automate this process.

What is the difference between a flaky test and a failing test?▼

A failing test consistently fails, indicating a genuine bug in the code. A flaky test fails intermittently without a code change — it passes sometimes and fails others. This intermittency is the defining characteristic.

Should I delete flaky tests?▼

Only if the test provides no value after repeated failures to fix it. A better approach is to quarantine: move the test to a separate suite that runs without blocking the build, track it, and set a deadline to fix or remove it.

What are order-dependent flaky tests?▼

Order-dependent flaky tests (OD tests) are a subtype of flaky tests that only fail when run after a specific other test (the 'polluter'). The polluter leaves shared state that causes the victim test to fail. They are particularly difficult to detect because the failure only surfaces in specific test orderings.