Posts

Showing posts from May, 2026

CPU vs GPU Inference for LLMs: Cost per 1M Tokens Comparison

Compare CPU vs GPU inference for LLMs in 2026, focusing on cost per 1M tokens, performance, and scalability. Learn when to use NVIDIA Grace CPUs or Rubin CPX GPUs for optimal efficiency. CPU vs GPU Inference for LLMs: Cost per 1M Tokens Comparison

Build a Web API with Go in Under an Hour

Learn to build a functional web API with Go 1.26, covering RESTful endpoints, database persistence with PostgreSQL and GORM, and testing best practices in under an hour. Build a Web API with Go in Under an Hour

Concurrency Patterns for High-Throughput LLM Systems

Explore concurrency patterns for high-throughput LLM systems, including pipeline parallelism, asynchronous I/O, and distributed locking to optimize performance and resource utilization in production environments. Concurrency Patterns for High-Throughput LLM Systems

AI for Knowledge Management: Real Workflows That Hold Up

A practical guide to AI-augmented knowledge management, from summarisation and extraction to semantic linking, local models, APIs, and review loops. AI for Knowledge Management: Real Workflows That Hold Up

Multi-Tenancy Database Patterns with examples in Go

Explore shared database, separate schema, and database-per-tenant patterns for multi-tenant apps. Learn trade-offs, security, and when to use each approach - with examples in Go Multi-Tenancy Database Patterns with examples in Go

Parallel Table-Driven Tests in Go

Parallel execution of table-driven tests in Go: Learn best practices, avoid race conditions, and optimize test performance with t.Parallel() and subtests. Parallel Table-Driven Tests in Go

Go Unit Testing: Structure & Best Practices

Master Go unit testing with built-in testing package, table-driven tests, mocks, coverage analysis, and industry best practices for robust Go applications. Go Unit Testing: Structure & Best Practices

Zettelkasten for Developers: A Practical Method That Works

A practical Zettelkasten guide for developers: write atomic notes, link concepts to code, avoid folder traps, and build a useful knowledge system. Zettelkasten for Developers: A Practical Method That Works

OpenClaw Production Setup Patterns with Plugins and Skills

Real world OpenClaw production setups combining plugins and skills by user type, with practical architecture patterns for reliability, workflows, and scale. OpenClaw Production Setup Patterns with Plugins and Skills

Structured Output Validation with Pydantic vs JSON Schema: A Comprehensive Comparison

A comprehensive comparison of Pydantic and JSON Schema for structured output validation in Python, covering performance, usability, and integration with LLMs. Learn when to use each tool for optimal data validation in modern applications. Structured Output Validation with Pydantic vs JSON Schema: A Comprehensive Comparison

OpenClaw vs Hermes Agent: Stars, Downloads & Usage 2026

Full data: 20 AI agent repos ranked by GitHub stars, OpenRouter daily tokens, npm/PyPI downloads, CVE history, ecosystem size, and Reddit sentiment. OpenClaw vs Hermes Agent: Stars, Downloads & Usage 2026

Second Brain Explained for Engineers and Knowledge Workers

Learn what a second brain really is, how it differs from PKM, wikis, and RAG, and why the best systems turn notes into reusable thinking over time. Second Brain Explained for Engineers and Knowledge Workers

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings. Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Understanding Goroutines and Channels in Depth

Learn how to master Go's concurrency model with goroutines and channels. This guide covers mechanics, patterns, best practices, and performance optimization for building efficient, scalable concurrent applications. Understanding Goroutines and Channels in Depth

Unload All llama.cpp Router Models Without Restarting

Learn how to unload every loaded llama.cpp router model with curl and jq, free VRAM safely, and avoid restarting llama-server in local LLM workflows. Unload All llama.cpp Router Models Without Restarting

LLM Wiki - Compiled Knowledge That RAG Cannot Replace

RAG retrieves fragments on demand. LLM Wiki compiles structured knowledge before any question is asked. Learn when ingest-time synthesis beats query-time retrieval, and when it does not. LLM Wiki - Compiled Knowledge That RAG Cannot Replace

PKM vs RAG vs Wiki vs Memory Systems Explained Clearly

Compare PKM, RAG, wikis, and AI memory systems by structure, retrieval, ownership, evolution, and real-world use cases. PKM vs RAG vs Wiki vs Memory Systems Explained Clearly

Second Brain Explained for Engineers and Knowledge Workers

Learn what a second brain really is, how it differs from PKM, wikis, and RAG, and why the best systems turn notes into reusable thinking over time. Second Brain Explained for Engineers and Knowledge Workers

Personal Knowledge Management - Goals, Methods and Tools to use in 2025

Personal Knowledge Management - What it is, it's goals, methods and tools to use in 2025 Personal Knowledge Management - Goals, Methods and Tools to use in 2025

LLM Structured Output Validation in Python That Holds Up

Validate LLM JSON in Python with JSON Schema and Pydantic, handle fences and tool args, add repair retries, tests, and production-safe failure handling. LLM Structured Output Validation in Python That Holds Up

Agentic LLM Inference Parameters Reference for Qwen and Gemma

Curated reference of vendor and community inference parameters for Qwen 3.6 and Gemma 4, optimized for agentic workflows and real-world coding systems. Agentic LLM Inference Parameters Reference for Qwen and Gemma

Building a Distributed Task Queue in Go for AI Jobs

Learn how to build a scalable distributed task queue in Go for AI jobs using RabbitMQ, Kubernetes, and Go 1.21. Covers architecture, reliability, fault tolerance, and AI workload optimization. Building a Distributed Task Queue in Go for AI Jobs

Idempotency in Distributed Systems That Actually Works

Idempotency is not an HTTP trick. Learn how to stop duplicate writes, replayed messages, and double charges across APIs, queues, webhooks, and workflows. Idempotency in Distributed Systems That Actually Works

Hermes Voice Control from Your Phone

Set up Hermes voice control on Telegram and Discord using local Whisper and free Edge TTS. Includes setup, tuning tips, examples, and troubleshooting. Hermes Voice Control from Your Phone

Kanban in Hermes Agent for Self Hosted LLM Workflows

Set up Hermes Kanban to safely schedule multi agent tasks on self hosted LLMs using a dispatcher daemon, rate limits and cron based batching. Kanban in Hermes Agent for Self Hosted LLM Workflows

Hermes Agent CLI cheat sheet — commands, flags, and slash shortcuts

Quick reference for Hermes Agent CLI. Install path, hermes chat and -z one-shots, model, gateway, skills, memory, logs, profiles, and slash shortcuts. Hermes Agent CLI cheat sheet — commands, flags, and slash shortcuts

Hermes Agent Skill Authoring — SKILL.md Structure and Best Practices

Author Hermes skills with YAML frontmatter, progressive disclosure, conditional activation, secrets versus config, and index troubleshooting. Hermes Agent Skill Authoring — SKILL.md Structure and Best Practices

Data Infrastructure for AI Systems: Object Storage, Databases, Search & AI Data Architecture

Engineering guide to data infrastructure for production AI systems — S3-compatible object storage (MinIO, Garage, AWS S3), PostgreSQL, Elasticsearch, streaming and messaging (Kafka, Airflow, queues), SaaS integrations, AI-native data layers, benchmarks, and trade-offs. Data Infrastructure for AI Systems: Object Storage, Databases, Search & AI Data Architecture

MinIO CE in 2026: Retired Upstream, Source-Only, and What to Use

MinIO CE is archived, source-only, and operationally high risk. Here is the timeline, community verdict, and safer alternatives for S3 compatible storage. MinIO CE in 2026: Retired Upstream, Source-Only, and What to Use

NemoClaw practical guide for secure OpenClaw operations in 2026

NemoClaw runs OpenClaw inside OpenShell with policy driven network control, routed inference, and lifecycle tooling. This guide covers quickstart, operations, and troubleshooting. NemoClaw practical guide for secure OpenClaw operations in 2026

Agent Memory Providers Compared — Honcho, Mem0, Hindsight, and Five More

Compare eight agent memory backends for Hermes, OpenClaw, and other agents — Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, Supermemory — dependencies, self-hosting, and activation. Agent Memory Providers Compared — Honcho, Mem0, Hindsight, and Five More