ReferenceCLI
Llm Create
Llm Create CLI command reference
goodmem llm create
Create a new LLM
Synopsis
Create a new LLM in the GoodMem service with the specified configuration.
goodmem llm create [flags]Examples
# Create an OpenAI GPT-4 LLM with a client-provided ID
goodmem llm create \
--id "123e4567-e89b-12d3-a456-426614174000" \
--display-name "My GPT-4" \
--provider-type OPENAI \
--endpoint-url "https://api.openai.com/v1" \
--model-identifier "gpt-4o" \
--cred-api-key "sk-..." \
--supports-chat
# Create an OpenAI GPT-4 LLM (server-generated ID)
goodmem llm create \
--display-name "My GPT-4" \
--provider-type OPENAI \
--endpoint-url "https://api.openai.com/v1" \
--model-identifier "gpt-4o" \
--cred-api-key "sk-..." \
--supports-chat \
--supports-streaming \
--supports-function-calling \
--sampling-max-tokens 4096 \
--sampling-temperature 0.7
# Create a LiteLLM proxy LLM using a bearer token
goodmem llm create \
--display-name "LiteLLM Claude" \
--provider-type LITELLM_PROXY \
--endpoint-url "https://llm-proxy.internal/v1" \
--model-identifier "anthropic/claude-3-opus" \
--cred-api-key "Bearer token" \
--supports-chat \
--supports-system-messages \
--sampling-max-tokens 2048
# Create a Vertex AI LLM using ADC credentials
goodmem llm create \
--display-name "Vertex GPT" \
--provider-type OPENAI \
--endpoint-url "https://us-central1-aiplatform.googleapis.com" \
--model-identifier "text-bison" \
--cred-gcp \
--cred-gcp-scope https://www.googleapis.com/auth/cloud-platform \
--cred-gcp-quota my-billing-project \
--supports-chat
# Create a local VLLM LLM
goodmem llm create \
--display-name "Local Llama" \
--provider-type VLLM \
--endpoint-url "http://localhost:8000" \
--model-identifier "llama3-70b" \
--supports-chat \
--supports-completionOptions
--api-path string API path (defaults to /v1/chat/completions)
--client-config string Provider-specific client configuration as JSON string
--cred-api-key string Inline API key stored by GoodMem (sends Authorization: Bearer <key>)
--cred-gcp Use Google Application Default Credentials
--cred-gcp-quota string Quota project for Google ADC requests
--cred-gcp-scope strings Additional Google ADC OAuth scope (can be specified multiple times)
--description string Description of the LLM
--display-name string Display name for the LLM (required)
--endpoint-url string Endpoint URL for the LLM service (required)
-h, --help help for create
--id string Optional: Client-provided UUID for the LLM (16 bytes). Server generates if omitted.
-l, --label strings Labels in key=value format (can be specified multiple times)
--max-context-length int32 Maximum context length in tokens
--modalities strings Supported modalities (TEXT, IMAGE, AUDIO, VIDEO) (default [TEXT])
--model-identifier string Model identifier (required)
--monitoring-endpoint string Monitoring endpoint URL
--no-supports-chat LLM does not support chat/conversation mode
--no-supports-completion LLM does not support text completion mode
--no-supports-function-calling LLM does not support function calling
--no-supports-sampling-parameters LLM does not support sampling parameters
--no-supports-streaming LLM does not support streaming responses
--no-supports-system-messages LLM does not support system messages
--owner string Owner ID for the LLM (requires admin permissions)
--provider-type string Provider type (OPENAI, LITELLM_PROXY, OPEN_ROUTER, VLLM, OLLAMA, LLAMA_CPP, CUSTOM_OPENAI_COMPATIBLE) (required)
--sampling-frequency-penalty float32 Frequency penalty (-2.0 to 2.0)
--sampling-max-tokens int32 Maximum number of tokens to generate
--sampling-presence-penalty float32 Presence penalty (-2.0 to 2.0)
--sampling-stop-sequences strings Stop sequences (can be specified multiple times)
--sampling-temperature float32 Sampling temperature (0.0-2.0)
--sampling-top-k int32 Top-k sampling parameter
--sampling-top-p float32 Top-p sampling parameter (0.0-1.0)
--supports-chat LLM supports chat/conversation mode
--supports-completion LLM supports text completion mode
--supports-function-calling LLM supports function calling
--supports-sampling-parameters LLM supports sampling parameters (temperature, top_p, etc.)
--supports-streaming LLM supports streaming responses
--supports-system-messages LLM supports system messages
--version string Version of the LLMOptions inherited from parent commands
--api-key string API key for authentication (can also be set via GOODMEM_API_KEY environment variable)
--server string GoodMem server address (gRPC API) (default "https://localhost:9090")SEE ALSO
- goodmem llm - Manage GoodMem LLMs