2 guides covering common problems, patterns, and production issues in Ollama.
Ollama lets you run open-source LLMs locally on your own hardware — Mac, Linux, or Windows with GPU. It provides a simple CLI and OpenAI-compatible REST API, making it straightforward to plug local models (Llama, Mistral, Qwen, Gemma, and 100+ others) into any agent framework.
Ollama exposes an OpenAI-compatible REST API on localhost:11434. Most AI frameworks have an OpenAI client built in — pointing it at Ollama instead of api.openai.com routes all inference to your local ...
Ollama runs model inference on GPU (or CPU fallback). VRAM is the binding constraint. A model that does not fit in VRAM either runs on CPU (10-50x slower) or crashes. The rule: the model's weights in ...
New guides drop regularly. Get them in your inbox — no noise, just signal.