Ollama | AI Workshack

Ollama lets you run open-source LLMs locally on your own hardware — Mac, Linux, or Windows with GPU. It provides a simple CLI and OpenAI-compatible REST API, making it straightforward to plug local models (Llama, Mistral, Qwen, Gemma, and 100+ others) into any agent framework.

100+ models: Llama, Mistral, Qwen, Gemma, DeepSeek, and more
OpenAI-compatible REST API for seamless framework integration
GPU acceleration on Apple Silicon, NVIDIA, and AMD GPUs
Model quantisation to run large models on consumer hardware
Multimodal model support including vision and code models

Visit official site →

Ollama April 08, 2026 1 min

Ollama as the Model Layer in Your Agent Stack: LangChain, LlamaIndex, and n8n Integration

Ollama exposes an OpenAI-compatible REST API on localhost:11434. Most AI frameworks have an OpenAI client built in — pointing it at Ollama instead of api.openai.com routes all inference to your local ...

Read guide →

Ollama April 08, 2026 1 min

Ollama Model Selection in 2026: What Runs on Your Hardware and What Doesn't

Ollama runs model inference on GPU (or CPU fallback). VRAM is the binding constraint. A model that does not fit in VRAM either runs on CPU (10-50x slower) or crashes. The rule: the model's weights in ...

Read guide →

🦙 Ollama

Ollama as the Model Layer in Your Agent Stack: LangChain, LlamaIndex, and n8n Integration

Ollama Model Selection in 2026: What Runs on Your Hardware and What Doesn't

Stay sharp as AI tools evolve