Ollama GPU

Run any open LLM (Llama 70B, Qwen 72B, Mistral, etc.) on GPU. OpenAI-compatible API included. Pull models on demand.

AI / ML
ai
llm
inference
gpu
ollama
openai
ollama-gpu
https://github.com/ollama/ollama
Ready to deploy
Docker image
ollama/ollama:latest
Resources
4
CPU
16Gi
RAM
10Gi
Disk
Exposed ports
1143480 (public)
Environment variables
OLLAMA_HOSTrequired
Listen address (0.0.0.0 for all interfaces)
OLLAMA_MODELSrequired
Directory for downloaded model weights (persistent)