Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings.

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Comments

Popular posts from this blog

Gitflow Workflow overview

UV - a New Python Package Project and Environment Manager. Here we provide it's short description, performance statistics, how to install it and it's main commands