A flaky test is one that sometimes passes and sometimes fails for the same code. It's the most demoralizing problem in test automation โ and the most common.
Once your team learns to ignore failing tests because "they're probably just flaky," your test suite has lost its value. The fix is not to retry until they pass. The fix is to understand why they're flaky and eliminate the root cause.
The Most Common Causes
1. Race Conditions
The test interacts with an element before the app is ready for it. This is the most common cause.
// FLAKY โ clicks before navigation completes
await page.getByRole('button', { name: 'Login' }).click();
await page.getByRole('button', { name: 'Add to cart' }).click(); // might fail
Playwright's auto-waiting helps here but doesn't solve everything. If the page loads new content after a navigation, wait for a stable element before interacting:
// STABLE โ wait for inventory page to be ready
await page.getByRole('button', { name: 'Login' }).click();
await expect(page.getByRole('heading', { name: 'Products' })).toBeVisible();
await page.getByRole('button', { name: 'Add to cart' }).first().click();
2. Test Order Dependency
Test A leaves state behind. Test B assumes clean state. B passes when run alone, fails when run after A.
// Test A adds items to the cart
// Test B assumes cart is empty โ fails when run after A
test('cart is empty on fresh load', async ({ page }) => {
await page.goto('/cart.html');
await expect(page.locator('.cart_item')).toHaveCount(0); // fails if A ran first
});
The fix: each test must set up its own state. Never rely on state left by another test.
On SauceDemo, the cart persists via localStorage. Reset it before each test:
test.beforeEach(async ({ page }) => {
// Clear localStorage before each test
await page.goto('/');
await page.evaluate(() => localStorage.clear());
});
Or use a fresh browser context for each test (Playwright's default with test.beforeEach using { page } fixture โ each test gets a clean context).
3. Timing Issues with Animations
An element is visible but still animating. Playwright waits for it to be stable before interacting, but some animations confuse this detection.
// If a modal animates in, wait for it to finish before clicking inside
await expect(modal).toBeVisible();
await page.waitForTimeout(300); // sometimes necessary for CSS transitions
waitForTimeout is a code smell but sometimes the only option for CSS animations. Keep it short and comment why it's there.
4. Network-Dependent Tests
Tests that depend on real API calls fail when the network is slow or the API is down. SauceDemo is mostly static but in real apps this is constant.
Mock the network layer for unit-level tests. For true E2E tests, accept that they require a stable environment and don't run them in every CI stage.
5. Shared State Between Parallel Tests
Tests running in parallel can interfere with each other if they share state โ same database record, same test user account, same localStorage.
Playwright runs each test file in its own worker by default. Tests within the same file run serially. To run tests within a file in parallel:
test.describe.configure({ mode: 'parallel' });
Only do this if the tests are genuinely independent.
6. Environment-Specific Failures
Tests pass locally, fail in CI. Common reasons:
- Different screen sizes (Playwright uses
1280x720by default in CI) - Different locale/timezone
- Missing environment variables
- CI machine is slower โ timeouts
For screen size, set it explicitly in config:
use: {
viewport: { width: 1280, height: 720 },
},
For timeouts in CI, don't increase them globally. Instead, identify which tests are slow and optimize them.
Diagnosing Flaky Tests
Step 1: Reproduce It
Run the test in a loop:
# Run the test 10 times
for i in {1..10}; do npx playwright test tests/cart/add-to-cart.spec.ts; done
Or use Playwright's built-in repeat:
npx playwright test --repeat-each=10
If it fails 2 out of 10 times, you have a reproducible flaky test. Note the failure rate.
Step 2: Read the Trace
Always run with traces when debugging:
npx playwright test --trace=on tests/cart/add-to-cart.spec.ts
Open the trace:
npx playwright show-report
The trace shows every action, the DOM state before and after each action, network requests, and console logs. You can see exactly what was on screen when the failure happened.
Step 3: Run in Headed Mode
npx playwright test --headed --slow-mo=500
--slow-mo=500 adds a 500ms delay between actions. This makes it easy to see what's happening visually. Often the problem becomes obvious.
Step 4: Add Explicit Waits for the Right Thing
If you see a race condition in the trace, add a wait for the specific condition โ not a general waitForTimeout.
// Instead of this
await page.waitForTimeout(2000);
// Wait for specific network request
await page.waitForResponse('**/api/products');
// Wait for specific DOM state
await expect(page.locator('.inventory_list')).toBeVisible();
// Wait for navigation
await page.waitForURL('/inventory.html');
Playwright's Built-In Retry Mechanism
Configure retries for CI:
// playwright.config.ts
retries: process.env.CI ? 2 : 0,
This retries failing tests up to 2 times in CI. A test that passes on retry is marked as "flaky" in the report โ not as passing. You get visibility into which tests are flaky without hiding failures.
Never rely on retries to hide real flakiness. Retries are a safety net while you investigate, not a solution.
Isolating Tests with storageState
For tests that require login, repeating the login flow for every test is slow and adds unnecessary failure points. Save authenticated state and reuse it:
// setup/auth.setup.ts
import { test as setup } from '@playwright/test';
const authFile = 'playwright/.auth/user.json';
setup('authenticate', async ({ page }) => {
await page.goto('/');
await page.getByPlaceholder('Username').fill('standard_user');
await page.getByPlaceholder('Password').fill('secret_sauce');
await page.getByRole('button', { name: 'Login' }).click();
await page.waitForURL('/inventory.html');
// Save the storage state (cookies + localStorage)
await page.context().storageState({ path: authFile });
});
// playwright.config.ts
projects: [
{
name: 'setup',
testMatch: '**/auth.setup.ts',
},
{
name: 'chromium',
use: {
...devices['Desktop Chrome'],
storageState: 'playwright/.auth/user.json',
},
dependencies: ['setup'],
},
],
Now every test starts already logged in โ no login step, no login-related flakiness.
Add playwright/.auth/ to .gitignore.
Quarantining Flaky Tests
When you find a genuinely flaky test you cannot immediately fix, quarantine it rather than letting it pollute the suite:
test.fixme('cart count sometimes shows stale value', async ({ page }) => {
// Known flaky โ timing issue with cart badge update
// Tracked in: https://github.com/your-org/repo/issues/42
});
test.fixme() marks the test as expected to fail. It runs, fails, and is reported as "fixme" โ not as a test failure that blocks the build. The comment and issue link ensure it gets fixed eventually.
The Flaky Test Checklist
When a test is flaky, go through this in order:
- Can you reproduce it? Run
--repeat-each=10. If it never fails, it may have been a one-off environment issue. - Is it a race condition? Open the trace. Look for actions on elements that weren't fully loaded.
- Is it state pollution? Run the test in isolation. If it passes, another test is leaving behind state.
- Is it an environment issue? Run on a different machine or in a Docker container matching CI.
- Is there an animation? Slow the test down with
--slow-moand watch the browser. - Is it network-dependent? Mock the API call and see if the flakiness disappears.
Once you know the cause, the fix is usually straightforward. The hard part is the diagnosis.
Next chapter: getting your tests into CI/CD so they run on every push.