Posts

LLM Development Ecosystem: Backends, Frontends & RAG

Here is a Set of articles about LLM Development Ecosystem: Backends, Frontends & RAG . LLM hosting (Ollama, Docker Model Runner, cloud providers), Coding in Python and Go, RAG, vector stores, embeddings, MCP

GPU and RAM Prices Surge in Australia: RTX 5090 Up 15%, RAM Up 38% - January 2026.

RTX 5090 prices jumped 15.2% to $5,566, RTX 5080 rose 6% to $1,899, and RAM surged 38% to $689 in Australia. Latest price analysis for January 2026 across Centrecom, PCCG, and Scorptec. https://www.glukhov.org/post/2026/01/ram-and-gpu-price-increase/ #SelfHosting #Hardware

NVIDIA DGX Spark Pricing: $6,249-$7,999 at Major Retailers in Australia

The NVIDIA DGX Spark (GB10 Grace Blackwell) is now available in Australia at major PC retailers with local stock. If you’ve been following the global DGX Spark pricing and availability , you’ll be interested to know that Australian pricing ranges from $6,249 to $7,999 AUD depending on storage configuration and retailer. These systems are good for running local AI/LLM workloads, like LLM inference with Ollama . The Australian configurations come with 128GB unified memory and up to 1 PFLOP of AI compute power, making them suitable for models up to ~200B parameters.  

Building LLM Applications with Rust: candle and llm Crates

  Building LLM applications in Rust using candle and llm crates reveals candle as the more viable choice due to its active development and broader hardware support. Candle 0.4.0 with CUDA 12.1 enables GPU acceleration for tensor operations, demonstrated in fintech applications with reduced latency. The llm crate, being archived and limited to GGMLv3 models, lacks support for modern formats like GGUF and newer hardware. For new projects, prioritize candle, leveraging its 2026 release features such as quantization for LLaMA and distributed inference. Explore tools like kalosm and atoma-infer to extend candle’s capabilities in production deployments.

Best Open-Source LLMs You Can Run on 16 GB VRAM (As of 2026)

Running powerful open-source LLMs on 16 GB VRAM systems is feasible through quantization and optimized deployment. Converting models like Mistral Large 3 to 4-bit precision reduces VRAM usage by up to 4x, enabling execution on consumer-grade GPUs. Phi-3 Mini achieves 68.8 MMLU and 62.2 HumanEval scores at 3.8B parameters with 8 GB VRAM at 4-bit quantization, making it ideal for low-latency applications. Use vLLM with speculative execution to deploy Mixtral 8x7B on RTX 4090 via Docker for high-parameter workloads. Evaluate model size, quantization level, and inference tools like BitsAndBytes and Hugging Face Transformers to select the best fit for your VRAM and performance needs.

Infrastructure as Code: Terraform vs OpenTofu vs Pulumi - A 2026 Comparison

  Infrastructure as Code (IaC) has become essential for managing cloud resources efficiently. This post compares Terraform 1.5, OpenTofu 1.0, and Pulumi 5.0 , analyzing their architecture, performance, features, and use cases. Key differences include language support, state management, plugin systems, and integration with cloud providers. The comparison covers technical aspects relevant to deployment pipelines, team collaboration, and infrastructure scalability.

Building High-Performance APIs with FastAPI and Async Python

  FastAPI, leveraging Python’s async/await model, enables the development of high-performance, scalable APIs suitable for modern web services. Asynchronous programming reduces latency and improves concurrency, making it essential for handling high request volumes efficiently. This post covers the fundamentals of async programming in FastAPI, designing efficient endpoints, optimizing with middleware and background tasks, and testing performance through benchmarking. Target audience includes developers familiar with Python 3.11+ and basic web framework concepts, with knowledge of async programming in 2026 being advantageous.

Terraform Best Practices: Code Organization and Standards:

Terraform best practices emphasize modular, reusable code and strict naming conventions to ensure maintainability and scalability. Modularization through self-contained modules with input/output parameters improves deployment consistency and reduces duplication. Terraform v1.6.5 enforces lowercase hyphenated resource names to avoid parsing errors, while TFLint 0.52.0 integrates with CI/CD tools for automated validation. Implement Git with descriptive commit messages and CI/CD pipelines using Terraform fmt and TFLint for pre-commit checks. For large-scale projects, adopt a centralized module repository and enforce .tflint.hcl configurations for team-wide standards.