Posts

TGI - Text Generation Inference - Install, Config, Troubleshoot

A practical guide to installing Hugging Face TGI, launching your first LLM endpoint, tuning key flags, and fixing the failures you will meet. TGI - Text Generation Inference - Install, Config, Troubleshoot

Practical, minimal examples for working with Ollama in real applications.

Practical Ollama examples in Go & Python, including structured output, Docker, and reverse proxy setups. Practical, minimal examples for working with Ollama in real applications.

RTX 5090 in Australia March 2026 Pricing Stock Reality

RTX 5090 GPUs in Australia remain scarce and expensive in March 2026, with limited stock, long wait times, and inflated prices. Here is what is really happening and what comes next. RTX 5090 in Australia March 2026 Pricing Stock Reality

Orchestrating AI Tasks with Celery vs Temporal

A comprehensive comparison of Celery and Temporal for orchestrating AI tasks, covering architecture, performance, features, and use cases in distributed AI workflows. Orchestrating AI Tasks with Celery vs Temporal

Top Python Libraries for AI Workflow Automation in 2026

Explore the top Python libraries for AI workflow automation in 2026, including n8n, Vellum AI, and Make. Learn how to integrate AI models, implement RAG, and build scalable, secure workflows for content creation, lead scoring, and data enrichment. Top Python Libraries for AI Workflow Automation in 2026

Using asyncio Queues for AI Task Orchestration

Learn how to use asyncio queues for efficient AI task orchestration, including pipeline design, workload optimization, and real-world examples with Redis and Python. Master asynchronous task management for scalable AI systems. Using asyncio Queues for AI Task Orchestration

Build Your First Python Autonomous Agent

Learn to build your first Python autonomous agent using modern frameworks like Autogen and LangGraph. This guide covers core logic, communication protocols, and deployment best practices for AI agents. Build Your First Python Autonomous Agent

Deploying vLLM at Scale on Kubernetes: A Comprehensive Guide

Learn how to deploy vLLM at scale on Kubernetes with PagedAttention, continuous batching, and tensor parallelism for high-throughput LLM inference. Covers multi-GPU, multi-node strategies and best practices. Deploying vLLM at Scale on Kubernetes: A Comprehensive Guide