Comparing LLMs performance on Ollama on 16GB VRAM GPU

March 09, 2026

Benchmark of 14 LLMs on RTX 4080 16GB with Ollama 0.15.2. Compare tokens/sec, VRAM usage, and CPU offloading for GPT-OSS, Qwen3, Qwen3.5, Mistral, and more.

Search This Blog

Software Development News

Comparing LLMs performance on Ollama on 16GB VRAM GPU

Comments

Post a Comment

Popular posts from this blog

Gitflow Workflow overview

Reranking text documents with Ollama and Qwen3 Embedding model - in Golang:

UV - a New Python Package Project and Environment Manager. Here we provide it's short description, performance statistics, how to install it and it's main commands