Mock LLM with httpbun
Install httpbun for a mock LLM API endpoint that is compatible with the OpenAI API for chat completions.
Why httpbun?
httpbun is the LLM-testing equivalent of httpbin. Testing AI gateway policies against a real LLM costs money, requires API keys, and introduces network latency. Instead, you can use httpbun as a drop-in OpenAI-compatible LLM backend in Kubernetes, then route traffic to it through agentgateway.
After you deploy httpbun, you can send standard POST /v1/chat/completions requests through agentgateway (including streaming), apply AgentgatewayPolicy resources for rate limiting and auth, and build out the rest of your AI gateway configuration without touching a real LLM.
The /llm endpoint implements the OpenAI chat completions API in the same request format, same response structure, and same streaming SSE protocol. You can customize the mock response content via an httpbun field in the request body, making it useful for testing both the happy path and error cases.
Example OpenAI-compatible requests
# Non-streaming
curl -X POST httpbun.com/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
# Streaming
curl -N httpbun.com/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}], "stream": true}'
# Custom response body
curl -X POST httpbun.com/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [], "httpbun": {"content": "I am a mock LLM!"}}'Before you begin
Install and set up an agentgateway proxy.Step 1: Deploy httpbun in Kubernetes
httpbun ships as a single Docker image. The default bind port inside the container is 3090 when configured via environment variable.
kubectl apply -f- <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpbun
namespace: default
labels:
app: httpbun
spec:
replicas: 1
selector:
matchLabels:
app: httpbun
template:
metadata:
labels:
app: httpbun
spec:
containers:
- name: httpbun
image: sharat87/httpbun
env:
- name: HTTPBUN_BIND
value: "0.0.0.0:3090"
ports:
- containerPort: 3090
---
apiVersion: v1
kind: Service
metadata:
name: httpbun
namespace: default
labels:
app: httpbun
spec:
selector:
app: httpbun
ports:
- protocol: TCP
port: 3090
targetPort: 3090
type: ClusterIP
EOFVerify the pod is running:
kubectl get pods -l app=httpbunExpected output:
NAME READY STATUS RESTARTS AGE
httpbun-7d9f6b8c4-v8w2p 1/1 Running 0 20sStep 2: Create the backend
Create an AgentgatewayBackend to configure httpbun as an LLM provider. You set the openai provider type, because httpbun implements the OpenAI-compatible API. Then, override the host, port, and path to point at httpbun’s /llm/chat/completions endpoint.
policies.auth block in the following example. This also means that you don’t need to manage a Kubernetes Secret: one less prerequisite to set up!kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: httpbun-llm
namespace: agentgateway-system
spec:
ai:
provider:
openai:
model: gpt-4
host: httpbun.default.svc.cluster.local
port: 3090
path: "/llm/chat/completions"
EOFReview the following table to understand this configuration.
| Field | Value | Description |
|---|---|---|
ai.provider.host |
httpbun.default.svc.cluster.local |
Points to the in-cluster httpbun Service. Because the host is an in-cluster DNS name (not a public HTTPS endpoint), no TLS configuration is required. |
ai.provider.port |
3090 |
Matches the httpbun container port. |
ai.provider.path |
/llm/chat/completions |
httpbun’s LLM endpoint (not the default /v1/chat/completions). |
Step 3: Create the HTTPRoute
Route incoming traffic from the gateway to the AgentgatewayBackend. The route exposes the standard OpenAI-compatible path /v1/chat/completions, so any OpenAI SDK or client can point at the gateway without modification.
-
Create the HTTPRoute.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: httpbun-llm namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - matches: - path: type: PathPrefix value: /v1/chat/completions backendRefs: - name: httpbun-llm namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF -
Verify the route was accepted. Look for
Accepted: TrueandResolvedRefs: Truein the status. IfResolvedRefsisFalse, double-check that the AgentgatewayBackend name and namespace in the route match exactly what you created in Step 2.kubectl describe httproute httpbun-llm -n agentgateway-system
Step 4: Verify the connection
Get the external address of the gateway and save it in an environment variable.
export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system agentgateway-proxy \
-o=jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESSPort-forward the gateway proxy http pod on port 8080.
kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 8080:80Non-streaming request
Send a standard chat completion request.
curl -s http://$INGRESS_GW_ADDRESS/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Explain agentgateway in one sentence."}
]
}' | jqcurl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Explain agentgateway in one sentence."}
]
}' | jqExample output:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1748000000,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "This is a mock response from httpbun."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}Streaming request
curl -N http://$INGRESS_GW_ADDRESS/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Count to three."}
],
"stream": true
}'curl -N http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Count to three."}
],
"stream": true
}'Notice the stream of data: chunks in server-sent event format, followed by data: [DONE]. This output matches the format of an OpenAI streaming response.
...
data: {"choices":[{"delta":{"content":"text."},"finish_reason":null,"index":0}],"created":1771885787,"id":"chatcmpl-e2e54892bd932edb9549b442","model":"gpt-4","object":"chat.completion.chunk"}
data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}],"created":1771885787,"id":"chatcmpl-e2e54892bd932edb9549b442","model":"gpt-4","object":"chat.completion.chunk"}
data: [DONE]Custom response content
Control exactly what the mock LLM returns by including the httpbun field in the request body. This is useful for writing deterministic integration tests against policies. You control the response, so you can verify that your gateway transforms, rate limits, or rejects it correctly.
curl -s http://$INGRESS_GW_ADDRESS/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}],
"httpbun": {"content": "Gateway is working perfectly."}
}' | jq '.choices[0].message.content'
# "Gateway is working perfectly."curl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}],
"httpbun": {"content": "Gateway is working perfectly."}
}' | jq '.choices[0].message.content'
# "Gateway is working perfectly."Example output:
"Gateway is working perfectly."Cleanup
You can remove the resources that you created in this guide.kubectl delete httproute httpbun-llm -n agentgateway-system
kubectl delete AgentgatewayBackend httpbun-llm -n agentgateway-system
kubectl delete deployment httpbun -n default
kubectl delete service httpbun -n default