You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should add support for NVIDIA GPU acceleration in local LLM inference. This will allow users with NVIDIA hardware to leverage CUDA for faster processing when using local models instead of relying solely on remote APIs like OpenAI.
The integration involves enabling optional local LLM usage via environment variables, which can implicitly utilize NVIDIA GPUs if the local server (e.g., Ollama, LM Studio, or similar) is configured with CUDA support.