OpenAI-compatible providers
Configure OpenAI-compatible LLM providers such as Mistral, DeepSeek, or any other provider that implements the OpenAI API format in kgateway.
Overview
Many LLM providers offer APIs that are compatible with OpenAI’s API format. You can configure these providers in agentgateway by using the openai provider type with custom host, port, path, and authHeader overrides.
Note that when you specify a custom host override, agentgateway requires explicit TLS configuration via BackendTLSPolicy for HTTPS endpoints. This differs from well-known providers (like OpenAI) where TLS is automatically enabled when using default hosts.
Before you begin
Install and set up an agentgateway proxy.Set up access to an OpenAI-compatible provider
Review the following examples for common OpenAI-compatible provider endpoints:
Mistral AI example
Set up OpenAI-compatible provider access to Mistral AI models.
-
Get a Mistral AI API key.
-
Save the API key in an environment variable.
export MISTRAL_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Mistral AI API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: mistral-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $MISTRAL_API_KEY EOF -
Create an AgentgatewayBackend resource to configure your LLM provider and reference the AI API key secret that you created earlier.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: mistral namespace: agentgateway-system spec: ai: provider: openai: model: mistral-medium-2505 host: api.mistral.ai port: 443 path: "/v1/chat/completions" policies: auth: secretRef: name: mistral-secret tls: sni: api.mistral.ai EOFReview the following table to understand this configuration. For more information, see the API reference.
Setting Description ai.provider.openaiDefine the OpenAI-compatible provider. openai.modelThe model to use, such as mistral-medium-2505.openai.hostRequired: The hostname of the OpenAI-compatible provider, such as api.mistral.ai.openai.portRequired: The port number (typically 443for HTTPS). Bothhostandportmust be set together.openai.pathOptional: Override the API path. Defaults to /v1/chat/completionsif not specified.policies.authConfigure the authentication token for OpenAI API. The example refers to the secret that you previously created. policies.tls.sniThe hostname for which to validate the server certificate (must match the hostvalue). -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. The following example sets up a route on the
/openaipath to the AgentgatewayBackend that you previously created. TheURLRewritefilter rewrites the path from/openaito the path of the API in the LLM provider that you want to use,/v1/chat/completions.kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: mistral namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /mistral filters: - type: URLRewrite urlRewrite: hostname: api.mistral.ai backendRefs: - name: mistral namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Send a request to the LLM provider API. Verify that the request succeeds and that you get back a response from the chat completion API.
curl "$INGRESS_GW_ADDRESS/mistral" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a short haiku about artificial intelligence." } ] }' | jqcurl "localhost:8080/mistral" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a short haiku about artificial intelligence." } ] }' | jqExample output:
{ "model": "mistral-medium-2505", "usage": { "prompt_tokens": 20, "completion_tokens": 18, "total_tokens": 38 }, "choices": [ { "message": { "content": "Silent circuits hum,\nLearning echoes through the void,\nWisdom without warmth.", "role": "assistant", "tool_calls": null }, "index": 0, "finish_reason": "stop" } ], "id": "d05ef3973085435a8db8b51b580eeef8", "created": 1764614501, "object": "chat.completion" }
DeepSeek example
Set up OpenAI-compatible provider access to DeepSeek models.
-
Get a DeepSeek API key.
-
Save the API key in an environment variable.
export DEEPSEEK_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your DeepSeek API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: deepseek-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $DEEPSEEK_API_KEY EOF -
Create an AgentgatewayBackend resource to configure your LLM provider and reference the AI API key secret that you created earlier.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: deepseek namespace: agentgateway-system spec: ai: provider: openai: model: deepseek-chat host: api.deepseek.com port: 443 path: "/v1/chat/completions" policies: auth: secretRef: name: deepseek-secret tls: sni: api.deepseek.com EOF -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. Note that agentgateway automatically rewrites the endpoint to the OpenAI chat completion endpoint of the LLM provider for you, based on the LLM provider that you set up in the AgentgatewayBackend resource.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: deepseek namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /deepseek backendRefs: - name: deepseek namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Send a request to the LLM provider API. Verify that the request succeeds and that you get back a response from the chat completion API.
curl "$INGRESS_GW_ADDRESS/deepseek" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a short haiku about artificial intelligence." } ] }' | jqcurl "localhost:8080/deepseek" -H content-type:application/json -d '{ "model": "", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a short haiku about artificial intelligence." } ] }' | jqExample output:
{ "id": "chatcmpl-deepseek-12345", "object": "chat.completion", "created": 1727967462, "model": "deepseek-chat", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Neural networks learn,\nPatterns emerge from data streams,\nMind in silicon grows." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 17, "total_tokens": 37 } }
Groq example
Set up OpenAI-compatible provider access to Groq for fast inference.
-
Get a Groq API key.
-
Save the API key in an environment variable.
export GROQ_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Groq API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: groq-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $GROQ_API_KEY EOF -
Create an AgentgatewayBackend resource to configure your LLM provider.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: groq namespace: agentgateway-system spec: ai: provider: openai: model: llama-3.3-70b-versatile host: api.groq.com port: 443 path: "/openai/v1/chat/completions" policies: auth: secretRef: name: groq-secret tls: sni: api.groq.com EOF -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: groq namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /groq backendRefs: - name: groq namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Send a request to verify the setup.
curl "$INGRESS_GW_ADDRESS/groq" -H content-type:application/json -d '{ "model": "llama-3.3-70b-versatile", "messages": [ { "role": "user", "content": "Explain quantum computing in one sentence." } ] }' | jqcurl "localhost:8080/groq" -H content-type:application/json -d '{ "model": "llama-3.3-70b-versatile", "messages": [ { "role": "user", "content": "Explain quantum computing in one sentence." } ] }' | jq
Together AI example
Set up OpenAI-compatible provider access to Together AI for open-source models.
-
Get a Together AI API key.
-
Save the API key in an environment variable.
export TOGETHER_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Together AI API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: together-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $TOGETHER_API_KEY EOF -
Create an AgentgatewayBackend resource to configure your LLM provider.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: together namespace: agentgateway-system spec: ai: provider: openai: model: meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo host: api.together.xyz port: 443 policies: auth: secretRef: name: together-secret tls: sni: api.together.xyz EOF -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: together namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /together backendRefs: - name: together namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Send a request to verify the setup.
curl "$INGRESS_GW_ADDRESS/together" -H content-type:application/json -d '{ "model": "meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo", "messages": [ { "role": "user", "content": "What are the benefits of open-source AI models?" } ] }' | jqcurl "localhost:8080/together" -H content-type:application/json -d '{ "model": "meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo", "messages": [ { "role": "user", "content": "What are the benefits of open-source AI models?" } ] }' | jq
Perplexity example
Set up OpenAI-compatible provider access to Perplexity for online search-augmented responses.
-
Get a Perplexity API key.
-
Save the API key in an environment variable.
export PERPLEXITY_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Perplexity API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: perplexity-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $PERPLEXITY_API_KEY EOF -
Create an AgentgatewayBackend resource to configure your LLM provider.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: perplexity namespace: agentgateway-system spec: ai: provider: openai: model: llama-3.1-sonar-large-128k-online host: api.perplexity.ai port: 443 policies: auth: secretRef: name: perplexity-secret tls: sni: api.perplexity.ai EOF -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: perplexity namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /perplexity backendRefs: - name: perplexity namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Send a request to verify the setup.
curl "$INGRESS_GW_ADDRESS/perplexity" -H content-type:application/json -d '{ "model": "llama-3.1-sonar-large-128k-online", "messages": [ { "role": "user", "content": "What are the latest developments in AI?" } ] }' | jqcurl "localhost:8080/perplexity" -H content-type:application/json -d '{ "model": "llama-3.1-sonar-large-128k-online", "messages": [ { "role": "user", "content": "What are the latest developments in AI?" } ] }' | jq
Fireworks AI example
Set up OpenAI-compatible provider access to Fireworks AI for fast inference on open-source models.
-
Get a Fireworks AI API key.
-
Save the API key in an environment variable.
export FIREWORKS_API_KEY=<insert your API key> -
Create a Kubernetes secret to store your Fireworks AI API key.
kubectl apply -f- <<EOF apiVersion: v1 kind: Secret metadata: name: fireworks-secret namespace: agentgateway-system type: Opaque stringData: Authorization: $FIREWORKS_API_KEY EOF -
Create an AgentgatewayBackend resource to configure your LLM provider.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: fireworks namespace: agentgateway-system spec: ai: provider: openai: model: accounts/fireworks/models/llama-v3p1-70b-instruct host: api.fireworks.ai port: 443 path: "/inference/v1/chat/completions" policies: auth: secretRef: name: fireworks-secret tls: sni: api.fireworks.ai EOF -
Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: fireworks namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /fireworks backendRefs: - name: fireworks namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Send a request to verify the setup.
curl "$INGRESS_GW_ADDRESS/fireworks" -H content-type:application/json -d '{ "model": "accounts/fireworks/models/llama-v3p1-70b-instruct", "messages": [ { "role": "user", "content": "Explain the advantages of running LLMs on optimized inference engines." } ] }' | jqcurl "localhost:8080/fireworks" -H content-type:application/json -d '{ "model": "accounts/fireworks/models/llama-v3p1-70b-instruct", "messages": [ { "role": "user", "content": "Explain the advantages of running LLMs on optimized inference engines." } ] }' | jq
Next steps
- Want to use other endpoints than chat completions, such as embeddings or models? Check out the multiple endpoints guide.
- Explore other guides for LLM consumption, such as function calling, model failover, and prompt guards.