How it Works

This page provides an overview of the concepts Unflakable uses to protect continuous integration builds from flaky tests. It covers the following topics:

Test flakiness
Test statuses
Test naming
Branch stability

Test flakiness

We define a test as flaky if it both passes and fails when executed multiple times on identical code. There are a variety of potential causes of test flakiness, including:

Bugs or sources of non-determinism (e.g., the current system time) in a test itself.
Bugs or sources of non-determinism in the code that is being tested (often referred to as the system under test).
Infrastructure problems such as network timeouts or memory exhaustion that occur while a test is running.

Test flakiness can cause significant problems for software development teams. Teams often have policies that require all tests to pass before code changes can be merged into a shared code repository. When tests fail due to flakiness, developers often have no recourse but to retry the entire continuous integration (CI) workflow, which can both significantly impact developer productivity and increase compute costs associated with running CI builds. If flakiness occurs frequently, developers may become accustomed to ignoring test failures, leading to missed bugs and reduced code quality.

Unflakable identifies test flakiness by automatically retrying failed tests. If a test initially fails but then passes during one of the retries, the test outcome is a : a single instance of test flakiness.

By default, the Unflakable test framework plugins retry failed tests twice (a total of three attempts), but the plugins can be configured to retry failures any number of times, or to disable retries altogether. Increasing the number of retries improves the odds of detecting flakiness, at the expense of longer CI runs. Note that if retries are disabled, Unflakable will not identify flakiness automatically. Instead, flaky tests will need to be identified and quarantined manually.

While Unflakable can identify many instances of test flakiness, it is not able to identify the source of test flakiness (e.g., code bugs vs. transient infrastructure issues). For this reason, we strongly recommend investigating flaky tests soon after Unflakable identifies or quarantines them. Leaving important tests quarantined for extended periods of time may cause your team's CI process to miss bugs that would otherwise have been identified by test failures.

Be sure to enable integrations to receive notifications whenever flaky tests are identified.

Test statuses

Unflakable associates every test with one of four test statuses:

: the test is currently passing.
: the test is currently failing.
: the test has exhibited flakiness in the past. Unlike tests with status, a test that fails will cause its test suite to fail. Once a test has this status, it will not transition back to or without manual intervention. It may transition to if auto-quarantine is enabled and it exhibits further flakiness.
: the test is known to be flaky or otherwise unreliable. By default, tests that fail or exhibit flakiness while quarantined will not cause a test suite to fail. However, quarantined tests may still execute and report results, depending on how the test framework plugin is configured.

A test acquires status by being manually quarantined, or by exhibiting flakiness when auto-quarantine is enabled. Once a test has this status, it will not change status without manual intervention.

The status of a test may change either automatically in response to test outcomes (see below) or when a user manually updates a test status. Note that automatic test status changes only occur when tests run on stable code branches. This behavior avoids drawing conclusions from test failures that occur on unstable code such as during a pull request or code review workflow.

Auto-quarantine

The auto-quarantine feature determines whether a test transitions to status or status after exhibiting flakiness. Auto-quarantine works best on test suites that experience limited flakiness due to infrastructure problems (see above). If your team experiences frequent infrastructure timeouts across a test suite, consider disabling auto-quarantine to prevent otherwise reliable tests from being quarantined due to infrastructure problems.

Auto-quarantine enabled

The following state diagram illustrates how tests change status after each test outcome when auto-quarantine is enabled. Note that when a new test runs for the first time, it behaves as if its previous status was :

Auto-quarantine disabled

By contrast, the following state diagram illustrates how tests change status when auto-quarantine is disabled. Without auto-quarantine, there are no automated state transitions that result in a test becoming :

Manual status changes

For completeness, the following state diagram illustrates the manual status changes that are supported:

Test naming

Unflakable uniquely identifies tests through a combination of:

The relative path within the code repository to the file containing the test (e.g., frontend/src/notifications/notifications.test.ts).
The name of the test (e.g., response_async_test), which is usually the name of the function implementing the test.

All test results reported for a given (file path, test name) pair will map to the same test within Unflakable. If a file or test is renamed, or a test is moved to another file, it will be treated as a new test.

Unflakable limits file paths to 4096 bytes (using UTF-8 to encode any Unicode characters).

Certain test frameworks (e.g., Jest) use hierarchical names for organizing tests (e.g., using describe() blocks or class names). Unflakable uses the complete test name, including any hierarchical ancestors, to identify tests.

Unflakable limits test names to a maximum of eight components, including seven levels of hierarchical nesting. Additionally, each name component is limited to a maximum of 4096 bytes (using UTF-8 to encode any Unicode characters). Tests with more than eight components or with any component longer than 4096 bytes will be ignored.

Branch stability

Software development teams often run tests in two phases of development:

On unstable code that has not been merged into a shared code repository (e.g., during a code review or pull request workflow).
On stable code that typically represents the latest version of a shared code repository (often called main, master, or trunk).

If a test is , failures will be ignored on both stable and unstable code. This helps ensure that developers are not prevented from merging code as a result of test flakiness.

Because unstable code often exhibits behavior such as test failures that are not reflective of the overall state of a test, Unflakable only updates test statuses when it receives test results corresponding to stable code. For similar reasons, it does not record any new tests until they run on a stable branch.

Unflakable distinguishes between stable and unstable code by comparing the version control (e.g., Git) branch for which test results are reported to a regular expression configured as part of each test suite.

Refer to the test framework plugin documentation to ensure that the correct branch name is captured from your team's version control system. For convenience, Unflakable will attempt to auto-detect branch names for tests within Git repositories.

How it Works

LinkTest flakiness

LinkTest statuses

LinkAuto-quarantine

LinkAuto-quarantine enabled

LinkAuto-quarantine disabled

LinkManual status changes

LinkTest naming

LinkBranch stability