Run any open LLM (Llama 70B, Qwen 72B, Mistral, etc.) on GPU. OpenAI-compatible API included. Pull models on demand.