Create a new LLM
Creates a new LLM configuration for text generation services. LLMs represent connections to different language model API services (like OpenAI, vLLM, etc.) and include all the necessary configuration to use them for text generation.
DUPLICATE DETECTION: Returns HTTP 409 Conflict (ALREADY_EXISTS) if another LLM exists with identical {ownerId, providerType, endpointUrl, apiPath, modelIdentifier, credentialsFingerprint} after URL canonicalization. Uniqueness is enforced per-owner. Credentials are hashed (SHA-256) for uniqueness while remaining encrypted. The apiPath field defaults to '/chat/completions' if omitted. Requires CREATE_LLM_OWN permission (or CREATE_LLM_ANY for admin users).
In: header
LLM configuration details
User-facing name of the LLM
"GPT-4 Turbo"1 <= length <= 255Description of the LLM
"OpenAI's GPT-4 Turbo model for chat completions"Type of LLM provider
"OPENAI""OPENAI" | "LITELLM_PROXY" | "OPEN_ROUTER" | "VLLM" | "OLLAMA" | "LLAMA_CPP" | "CUSTOM_OPENAI_COMPATIBLE"API endpoint base URL (OpenAI-compatible base, typically ends with /v1)
"https://api.openai.com/v1"API path for chat/completions request (defaults to /chat/completions if not provided)
"/chat/completions"Model identifier
"gpt-4-turbo-preview"Supported content modalities (defaults to TEXT if not provided)
["TEXT"]Structured credential payload describing how to authenticate with the provider. Omit for deployments that do not require credentials.
{"kind":"CREDENTIAL_KIND_API_KEY","apiKey":{"inlineSecret":"sk-llm-demo-key"}}Structured credential payload describing how GoodMem should authenticate with an upstream provider.
Credential strategy — fixed to CREDENTIAL_KIND_API_KEY for this variant
"CREDENTIAL_KIND_API_KEY"Configuration when kind is CREDENTIAL_KIND_API_KEY
{"inlineSecret":"sk-1234567890abcdef","secretRef":{"uri":"vault://path/to/secret"},"headerName":"Authorization","prefix":"Bearer "}Optional annotations to aid operators (e.g., "owner=vertex")
properties <= 20Empty Object
Structured credential payload describing how GoodMem should authenticate with an upstream provider.
Credential strategy — fixed to CREDENTIAL_KIND_GCP_ADC for this variant
"CREDENTIAL_KIND_GCP_ADC"Configuration when kind is CREDENTIAL_KIND_GCP_ADC
{"scopes":["https://www.googleapis.com/auth/cloud-platform"],"quotaProjectId":"my-quota-project"}Optional annotations to aid operators (e.g., "owner=vertex")
properties <= 20Empty Object
User-defined labels for categorization
{"environment":"production","team":"ai"}properties <= 20Empty Object
Version information
"1.0.0"Monitoring endpoint URL
"https://monitoring.example.com/llms/status"LLM capabilities defining supported features and modes. Optional - server infers capabilities from model identifier if not provided.
{"supportsChat":true,"supportsCompletion":true,"supportsFunctionCalling":true,"supportsSystemMessages":true,"supportsStreaming":true,"supportsSamplingParameters":true}Default sampling parameters for generation requests
{"maxTokens":2048,"temperature":0.7,"topP":0.9,"topK":50,"frequencyPenalty":0,"presencePenalty":0,"stopSequences":["\n\n","END"]}Maximum context window size in tokens
32768int32Provider-specific client configuration as flexible JSON structure
Empty Object
Optional owner ID. If not provided, derived from the authentication context. Requires CREATE_LLM_ANY permission if specified.
uuidOptional client-provided UUID for idempotent creation. If not provided, server generates a new UUID. Returns ALREADY_EXISTS if ID is already in use.
uuidResponse Body
curl -X POST "http://localhost:8080/v1/llms" \ -H "Content-Type: application/json" \ -d '{ "displayName": "GPT-4 Turbo", "description": "OpenAI\'s GPT-4 Turbo model for chat completions", "endpointUrl": "https://api.openai.com/v1", "apiPath": "/chat/completions", "modelIdentifier": "gpt-4-turbo-preview", "supportedModalities": [ "TEXT" ], "credentials": { "apiKey": { "inlineSecret": "sk-your-api-key-here", "secretRef": { "uri": "vault://path/to/secret" }, "headerName": "Authorization", "prefix": "Bearer " }, "kind": "CREDENTIAL_KIND_API_KEY" }, "labels": { "environment": "production", "team": "ai" }, "version": "1.0.0", "monitoringEndpoint": "https://monitoring.example.com/llms/status", "capabilities": { "supportsChat": true, "supportsCompletion": true, "supportsFunctionCalling": true, "supportsSystemMessages": true, "supportsStreaming": true, "supportsSamplingParameters": true }, "defaultSamplingParams": { "maxTokens": 2048, "temperature": 0.7, "topP": 0.9, "topK": 50, "frequencyPenalty": 0, "presencePenalty": 0, "stopSequences": [ "\n\n", "END" ] }, "maxContextLength": 32768, "ownerId": "550e8400-e29b-41d4-a716-446655440000", "llmId": "550e8400-e29b-41d4-a716-446655440000", "providerType": "OPENAI" }'{
"llm": {
"llmId": "550e8400-e29b-41d4-a716-446655440000",
"displayName": "GPT-4 Turbo",
"description": "OpenAI's GPT-4 Turbo model for chat completions",
"providerType": "OPENAI",
"endpointUrl": "https://api.openai.com/v1",
"apiPath": "/chat/completions",
"modelIdentifier": "gpt-4-turbo-preview",
"supportedModalities": [
"TEXT"
],
"credentials": {
"apiKey": {
"inlineSecret": "sk-1234567890abcdef",
"secretRef": {
"uri": "vault://path/to/secret"
},
"headerName": "Authorization",
"prefix": "Bearer "
},
"gcpAdc": {
"scopes": [
"https://www.googleapis.com/auth/cloud-platform"
],
"quotaProjectId": "my-quota-project"
}
},
"labels": {
"environment": "production",
"team": "ai"
},
"version": "1.0.0",
"monitoringEndpoint": "https://monitoring.example.com/llms/status",
"capabilities": {
"supportsChat": true,
"supportsCompletion": true,
"supportsFunctionCalling": true,
"supportsSystemMessages": true,
"supportsStreaming": true,
"supportsSamplingParameters": true
},
"defaultSamplingParams": {
"maxTokens": 2048,
"temperature": 0.7,
"topP": 0.9,
"topK": 50,
"frequencyPenalty": 0,
"presencePenalty": 0,
"stopSequences": [
"\n\n",
"END"
]
},
"maxContextLength": 32768,
"clientConfig": {
"property1": {},
"property2": {}
},
"ownerId": "550e8400-e29b-41d4-a716-446655440000",
"createdAt": 1617293472000,
"updatedAt": 1617293472000,
"createdById": "550e8400-e29b-41d4-a716-446655440000",
"updatedById": "550e8400-e29b-41d4-a716-446655440000"
},
"statuses": [
{
"code": "PARTIAL_RESULTS",
"message": "Some embedders were unavailable, returning partial results"
}
]
}