AI CLI Smackdown 2026: Playwright Test Generation for a React App's Auth Flow

opencode plan "Generate Playwright tests for login flow."

This command kicks off a structured analysis of my React codebase. The output isn't test code—it's a multi-step strategy document. This immediate contrast reveals the fundamental philosophical split in the 2026 AI CLI landscape: structured planners versus direct generators versus shell-sidekicks. I spent a week putting three leading contenders—OpenCode CLI, Gemini CLI, and GitHub Copilot CLI—through the concrete task of building a Playwright end-to-end test suite for a typical React authentication flow (login, registration, logout, error states). The results were far from uniform.

The Core Problem: From UI Concept to Reliable Test Code

Manually writing tests for interactive flows is tedious. Modern React apps with components, state managers, and async flows make selectors and assertions a guessing game. The promise of AI CLI tools is to shortcut this by understanding both the intent ("test the login") and the codebase. The root cause of varied success lies in how each tool acquires and uses context.

OpenCode, with its LSP integration, reads the project structure like an IDE. Gemini CLI leans heavily on a provided GEMINI.md file for architectural context. Copilot CLI, in its current 2026 iteration, excels at manipulating the environment around the test code but doesn't agentically generate full suites.

OpenCode CLI: The Architect

Its "Plan/Build" workflow is its standout feature. You don't just get code; you get a proposed blueprint. For my auth flow, after initializing opencode in the project root, I prompted for a test suite. Its LSP backend scanned LoginForm.tsx and authSlice.ts, then presented this plan:

typescript
// **PLAN: playwright/auth.spec.ts**
// 1. Navigate to /login.
// 2. Test: Failed login (empty submission).
//    - Action: Click submit.
//    - Assert: Presence of error message `.error-text`.
// 3. Test: Failed login (invalid credentials).
//    - Action: Fill `#email` & `#password`, click submit.
//    - Assert: Error message contains "Invalid".
// 4. Test: Successful login.
//    - Action: Fill with valid credentials, click submit.
//    - Assert: URL redirects to /dashboard, `.user-menu` is visible.
// 5. Test: Logout.
//    - Action: Click `.user-menu > .logout-btn`.
//    - Assert: URL redirects to /, login button is visible.
// NOTES: Selectors derived from LoginForm component (data-testid preferred). Use `test.describe` for grouping.

This plan was accurate. The selectors matched my component's structure because OpenCode used the LSP to inspect the JSX, spotting my data-testid="email-input". Accepting the plan triggered the Build phase, which generated the complete, executable .spec.ts file. The main trade-off? The two-step process feels slower for trivial tasks, but for complex flows, it prevents the AI from wandering down a wrong path, saving significant correction time later.

Gemini CLI: The Contextual Code Slinger

Gemini CLI operates with a different rhythm. It requires upfront context via a GEMINI.md file. Once that's set, it generates code directly. The quality hinges entirely on the context's quality. With a well-structured GEMINI.md describing the auth provider and component names, I prompted: gemini "Write a Playwright test for successful login and logout."

It generated a single, dense block of code. Notably, it inferred the need for setup and teardown:

typescript
import { test, expect } from '@playwright/test';

test.describe('Authentication Flow', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('http://localhost:3000');
  });

  test('successful login and logout', async ({ page }) => {
    // Navigate to login
    await page.getByRole('link', { name: /login/i }).click();
    await expect(page).toHaveURL(/\/login/);

    // Fill credentials from environment (inferred from GEMINI.md mention of .env)
    await page.getByLabel('Email').fill(process.env.TEST_USER!);
    await page.getByLabel('Password').fill(process.env.TEST_PASS!);
    await page.getByRole('button', { name: /sign in/i }).click();

    // Assert successful redirect
    await expect(page).toHaveURL(/\/dashboard/);
    await expect(page.getByText('Welcome back')).toBeVisible();

    // Logout
    await page.getByRole('button', { name: /user menu/i }).click();
    await page.getByRole('menuitem', { name: /logout/i }).click();
    await expect(page).toHaveURL('http://localhost:3000/');
  });
});

It used getByRole locators—a Playwright best practice—without being explicitly told. However, when I asked it to add a registration test, it occasionally hallucinated component names not in the GEMINI.md file. The trade-off is clear: incredible speed and fluency when context is rich, but a higher chance of plausible-sounding fabrications when context is thin.

Copilot CLI: The Shell-Command Maestro

Here’s where the 2026 landscape gets defined. Copilot CLI didn't generate my test suite. When I tried copilot gh "Write a Playwright test for login", it suggested looking at existing tests or creating a skeleton. Its power lies elsewhere. Once the test files existed from OpenCode or Gemini, Copilot CLI became indispensable for operating the workflow.

copilot shell "Run the playwright auth tests headlessly and generate a report"

It instantly output the correct command: npx playwright test auth.spec.ts --headed --reporter=html. It's the best at Git integration, CI scripting, and environment wrangling. For example, needing to install the correct browser for CI: copilot shell "Install Playwright chromium for Ubuntu in CI" gave me the exact npx playwright install chromium command with --with-deps. It’s not the test author; it's the expert stage manager that gets the authored tests running anywhere.

Measurable Outcome and Final Trade-offs

For generating the initial 5-test suite:

OpenCode CLI: Took ~4 minutes (plan review + build). First-run pass rate: 4/5 tests passed. One selector needed tweaking.
Gemini CLI: Took ~90 seconds for the first test. First-run pass rate: 2/3 generated tests passed, but it only wrote 3 tests without prompting for the full suite. Required follow-up prompts.
Copilot CLI: Did not generate the core tests. It reduced my shell command lookup time for running/reporting by an estimated 70%.

The choice isn't about which is "best." It's about your bottleneck. If your codebase is complex and you need accurate, maintainable tests, OpenCode's planning is worth the speed tax. If you have excellent high-level documentation and need rapid drafts, Gemini is a powerhouse. For seamlessly integrating those tests into your DevOps pipeline, Copilot CLI is unmatched.

Your 10-Minute Experiment: Pick one. In an existing project, create a GEMINI.md file with three bullet points about a UI component. Run gemini "Write a Playwright test to check this component is visible." See what it builds with just that sliver of context. It will perfectly illustrate the power and the peril of context-driven generation.

AI CLI Smackdown 2026: Playwright Test Generation for a React App's Auth Flow

The Core Problem: From UI Concept to Reliable Test Code

OpenCode CLI: The Architect

Gemini CLI: The Contextual Code Slinger

Copilot CLI: The Shell-Command Maestro

Measurable Outcome and Final Trade-offs

Luca

Keep Reading

TurboQuant + MTP: Get 40.6 tok/s Out of Qwen3.6

Taming a 26B MoE Model on 8GB VRAM with TurboQuant