Posts

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings. Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Understanding Goroutines and Channels in Depth

Learn how to master Go's concurrency model with goroutines and channels. This guide covers mechanics, patterns, best practices, and performance optimization for building efficient, scalable concurrent applications. Understanding Goroutines and Channels in Depth

Unload All llama.cpp Router Models Without Restarting

Learn how to unload every loaded llama.cpp router model with curl and jq, free VRAM safely, and avoid restarting llama-server in local LLM workflows. Unload All llama.cpp Router Models Without Restarting

LLM Wiki - Compiled Knowledge That RAG Cannot Replace

RAG retrieves fragments on demand. LLM Wiki compiles structured knowledge before any question is asked. Learn when ingest-time synthesis beats query-time retrieval, and when it does not. LLM Wiki - Compiled Knowledge That RAG Cannot Replace

PKM vs RAG vs Wiki vs Memory Systems Explained Clearly

Compare PKM, RAG, wikis, and AI memory systems by structure, retrieval, ownership, evolution, and real-world use cases. PKM vs RAG vs Wiki vs Memory Systems Explained Clearly

Second Brain Explained for Engineers and Knowledge Workers

Learn what a second brain really is, how it differs from PKM, wikis, and RAG, and why the best systems turn notes into reusable thinking over time. Second Brain Explained for Engineers and Knowledge Workers

Personal Knowledge Management - Goals, Methods and Tools to use in 2025

Personal Knowledge Management - What it is, it's goals, methods and tools to use in 2025 Personal Knowledge Management - Goals, Methods and Tools to use in 2025

LLM Structured Output Validation in Python That Holds Up

Validate LLM JSON in Python with JSON Schema and Pydantic, handle fences and tool args, add repair retries, tests, and production-safe failure handling. LLM Structured Output Validation in Python That Holds Up