2026

4 articles

Browse all articles tagged with 2026. Found 4 articles covering this topic.

Articles tagged "2026"

TurboQuant + MTP: Get 40.6 tok/s Out of Qwen3.6

AILLM

May 25, 2026

1 min read

TurboQuant + MTP: Get 40.6 tok/s Out of Qwen3.6

How to build llama.cpp w/ TurboQuant and MTP and use it on consumer HW - 32GB RAM + 8GB VRAM GPU.

#AI #LLM #2026+3

Taming a 26B MoE Model on 8GB VRAM with TurboQuant

AILLM

May 14, 2026

2 min read

Taming a 26B MoE Model on 8GB VRAM with TurboQuant

Forget Docker images and pre-built binaries. Here's how I compiled a custom llama.cpp fork with TurboQuant, ran a 26B Gemma 4 MoE model on a consumer RTX 3070, and squeezed 262K context into less than 4GB of VRAM.

#AI #LLM #2026+2

Deploying Qwen3-Coder-30B-A3B on 8GB GPU with Docker

AIOpenCode

Mar 16, 2026

1 min read

Deploying Qwen3-Coder-30B-A3B on 8GB GPU with Docker

A 30B model on an 8GB GPU sounds impossible, but quantization and llama.cpp make it work. This guide shows how to run it with Docker and use it in OpenCode.

#AI #OpenCode #LLM+3

AI CLI Smackdown 2026: Playwright Test Generation for a React App's Auth Flow

case-studyAI-CLI

Mar 4, 2026

1 min read

AI CLI Smackdown 2026: Playwright Test Generation for a React App's Auth Flow

Comparing how Gemini CLI, Copilot CLI, and OpenCode CLI approach generating Playwright E2E tests for a React authentication flow. Real code examples and workflow trade-offs for 2026.

#case-study #AI-CLI #2026

2026

Articles tagged "2026"

TurboQuant + MTP: Get 40.6 tok/s Out of Qwen3.6

Taming a 26B MoE Model on 8GB VRAM with TurboQuant

Deploying Qwen3-Coder-30B-A3B on 8GB GPU with Docker

AI CLI Smackdown 2026: Playwright Test Generation for a React App's Auth Flow

Related tags