This page provides an overview of the concepts Unflakable uses to protect continuous integration
builds from flaky tests. It covers the following topics:
We define a test as flaky if it both passes and fails when executed multiple times on
identical code. There are a variety of potential causes of test flakiness, including:
Bugs or sources of non-determinism (e.g., the current system time) in a test itself.
Bugs or sources of non-determinism in the code that is being tested (often referred to as the
system under test).
Infrastructure problems such as network timeouts or memory exhaustion that occur while a
test is running.
Test flakiness can cause significant problems for software development teams. Teams often have
policies that require all tests to pass before code changes can be merged into a shared code
repository. When tests fail due to flakiness, developers often have no recourse but to retry the
entire continuous integration (CI) workflow, which can both significantly impact developer
productivity and increase compute costs associated with running CI builds. If flakiness occurs
frequently, developers may become accustomed to ignoring test failures, leading to missed bugs
and reduced code quality.
Unflakable identifies test flakiness by automatically retrying failed tests. If a test initially
fails but then passes during one of the retries, the test outcome is a Flake: a single
instance of test flakiness.
By default, the Unflakable test framework plugins retry failed tests twice (a total
of three attempts), but the plugins can be configured to retry failures
any number of times, or to disable retries altogether. Increasing the number of retries improves
the odds of detecting flakiness, at the expense of longer CI runs.
Note that if retries are disabled,
Unflakable will not identify flakiness automatically. Instead, flaky tests will need to be
identified and quarantined manually.
While Unflakable can identify many instances of test flakiness, it is not able to identify the
source of test flakiness (e.g., code bugs vs. transient infrastructure issues). For this reason,
we strongly recommend investigating flaky tests soon after Unflakable identifies or quarantines
them. Leaving important tests quarantined for extended periods of time may cause your team's CI
process to miss bugs that would otherwise have been identified by test failures.
Be sure to
enable integrations to receive notifications whenever flaky tests are identified.
Unflakable associates every test with one of four test statuses:
Passing: the test is currently passing.
Failing: the test is currently failing.
Flaky: the test has exhibited flakiness in the past. Unlike tests with
Quarantined status, a Flaky test that fails will cause its test suite to fail.
Once a test has this
status, it will not transition back to Passing or Failing without
manual intervention. It
may transition to Quarantined if auto-quarantine is enabled and it
exhibits further flakiness.
Quarantined: the test is known to be flaky or otherwise unreliable. By default, tests
that fail or exhibit flakiness while quarantined will not cause a test suite to fail. However,
quarantined tests may still execute and report results, depending on how the
test framework plugin is configured.
The status of a test may change either automatically in response to test outcomes (see
below) or when a user
manually updates a test status. Note that automatic
test status changes only occur when tests run on stable code branches.
This behavior avoids drawing conclusions from test failures that occur on unstable code
such as during a pull request or code review workflow.
The auto-quarantine feature determines whether a test transitions to Flaky status or
Quarantined status after exhibiting flakiness. Auto-quarantine works best
on test suites that experience limited flakiness due to infrastructure problems (see
above). If your team experiences frequent infrastructure timeouts across a test
suite, consider disabling auto-quarantine to prevent otherwise reliable tests from being quarantined
due to infrastructure problems.
The following state diagram illustrates how tests change status after each test outcome when
auto-quarantine is enabled. Note that when a new test runs for the first time, it behaves as
if its previous status was Passing:
State diagram with automatic state transitions (auto-quarantine enabled)
By contrast, the following state diagram illustrates how tests change status when
auto-quarantine is disabled. Without auto-quarantine, there are no automated state transitions
that result in a test becoming Quarantined:
State diagram with automatic state transitions (auto-quarantine disabled)
Unflakable uniquely identifies tests through a combination of:
The relative path within the code repository to the file containing the test (e.g.,
frontend/src/notifications/notifications.test.ts).
The name of the test (e.g., response_async_test), which is usually the name of the function
implementing the test.
All test results reported for a given (file path, test name)
pair will map to the same test within Unflakable. If a file or test is renamed, or a test is
moved to another file, it will be treated as a new test.
Info
Unflakable limits file paths to 4096 bytes
(using UTF-8New tab to encode
any Unicode characters).
Certain test frameworks (e.g., Jest) use
hierarchical names for organizing tests (e.g., using
describe()New tab blocks or class names).
Unflakable uses the complete test name, including any hierarchical ancestors, to identify tests.
Info
Unflakable limits test names to a maximum of eight components, including seven levels of
hierarchical nesting. Additionally, each name component is limited to a maximum of 4096
bytes (using UTF-8New tab to encode
any Unicode characters). Tests with more than eight components or with any component longer
than 4096 bytes will be ignored.
Software development teams often run tests in two phases of development:
On unstable code that has not been merged into a shared code repository (e.g., during
a code review or pull request workflow).
On stable code that typically represents the latest version of a shared code repository
(often called main, master, or trunk).
If a test is Quarantined, failures will be ignored on both stable and unstable code. This
helps ensure that developers are not prevented from merging code as a result of test flakiness.
Because unstable code often exhibits behavior such as test failures that are not reflective of
the overall state of a test, Unflakable only updates test statuses when it receives
test results corresponding to stable code. For similar reasons, it does not record any new
tests until they run on a stable branch.
Unflakable distinguishes between stable and unstable code by comparing the version control
(e.g., GitNew tab) branch for which test results are reported
to a regular expression configured as part of each test
suite.
Refer to the test framework plugin documentation to ensure that the correct branch
name is captured from your team's version control system. For convenience,
Unflakable will attempt to auto-detect branch names for tests within Git repositories.