Building a Basic RAG Agent using GoodMem

What you’ll build

Retrieval-augmented generation (RAG) injects knowledge from external sources (for example, documents) into an LLM—no retraining required—so it can answer questions or make decisions it otherwise couldn't. That knowledge is hot-swappable: add, remove, or replace sources to update what the LLM should know.

GoodMem is the memory layer between your agent and its data, providing persistent, searchable context. It manages embedders, LLMs, and rerankers as first-class resources you wire together at query time, and it can be extended with advanced optimization via GoodMem Cloud Tuner (private beta).

In this tutorial, you'll see how quickly you can build a RAG agent in GoodMem.

Prerequisites

Get an OpenAI API key. Then set the environment variable:
```
export OPENAI_API_KEY="your_openai_api_key"
```
Install GoodMem:
```
curl -s https://get.goodmem.ai | bash -s -- --handsfree \
     --db-password "your-secure-password-min-14-chars" \
     --tls-disabled
```
The installation script will print the GoodMem REST API endpoint and a GoodMem API key. Write them down; we'll use them throughout this tutorial. For convenience, export them to your shell environment:
```
export GOODMEM_BASE_URL="your_goodmem_base_url"
export GOODMEM_API_KEY="your_goodmem_api_key"
```
Note that GOODMEM_BASE_URL should not contain the /v1 suffix.
Install the SDK based on how you'll interface with GoodMem.
Nothing to do; the GoodMem CLI is installed as part of the GoodMem installation. Skip this section. If your server isn't running on the default gRPC endpoint (https://localhost:9090), add --server to the CLI commands (the CLI does not use GOODMEM_BASE_URL).
Nothing to do; cURL is standard on most Linux distributions. Skip this section.
pip install goodmem-client requests
Then set up your Python environment with the required imports and variables:
import requests import json import os import base64 # Set up variables (instead of environment variables) GOODMEM_BASE_URL = os.getenv("GOODMEM_BASE_URL") GOODMEM_API_KEY = os.getenv("GOODMEM_API_KEY") OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Step 1: Register your embedder and LLM

At minimum, a RAG agent needs two kinds of neural network models to process information:

an embedder, which converts information (for example, text, images, or audio) into a numerical vector (an embedding) for fast retrieval, and
an LLM, which combines retrieved information with the user query to generate the final answer.

The goal of this step is to bring the RAG agent to the following state:

┌──────────────────────────────────────┐
│    RAG Agent (under construction)    │
│      ┌──────────┐    ┌─────┐         │
│      │ Embedder │    │ LLM │         │
│      └──────────┘    └─────┘         │
└──────────────────────────────────────┘

Here we use OpenAI's text-embedding-3-large embedder and gpt-5.1 LLM for simplicity.

Step 1.1: Register the embedder

The commands below register the OpenAI text-embedding-3-large embedder in GoodMem and save the embedder ID as the environment variable $EMBEDDER_ID for later use.

As you'll see later, giving every component an ID is a GoodMem feature that makes building and managing RAG agents easier and enables automated tuning. In the sample outputs below, IDs are shown as [EMBEDDER_UUID]-style placeholders; your CLI will print different values unless you explicitly pass --id (see Client-Specified Identifiers).

goodmem embedder create \
  --display-name "OpenAI test" \
  --provider-type OPENAI \
  --endpoint-url "https://api.openai.com/v1" \
  --model-identifier "text-embedding-3-large" \
  --dimensionality 1536 \
  --cred-api-key ${OPENAI_API_KEY} | tee /tmp/embedder_output.txt

export EMBEDDER_ID=$(grep "^ID:" /tmp/embedder_output.txt | awk '{print $2}')
echo "Embedder ID: $EMBEDDER_ID"

curl -X POST "$GOODMEM_BASE_URL/v1/embedders" \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "displayName": "OpenAI test",
        "providerType": "OPENAI",
        "endpointUrl": "https://api.openai.com/v1",
        "modelIdentifier": "text-embedding-3-large",
        "credentials": {
            "kind": "CREDENTIAL_KIND_API_KEY",
            "apiKey": {
                "inlineSecret": "'"$OPENAI_API_KEY"'"
            }
        }, 
        "dimensionality": 1536,
        "distributionType": "DENSE"
    }' | tee /tmp/embedder_response.json | jq
export EMBEDDER_ID=$(jq -r '.embedderId' /tmp/embedder_response.json)
echo "Embedder ID: $EMBEDDER_ID"

# Register the embedder
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/embedders",
    headers={
        "x-api-key": GOODMEM_API_KEY,
        "Content-Type": "application/json"
    },
    json={
        "displayName": "OpenAI test",
        "providerType": "OPENAI",
        "endpointUrl": "https://api.openai.com/v1",
        "modelIdentifier": "text-embedding-3-large",
        "credentials": {
            "kind": "CREDENTIAL_KIND_API_KEY",
            "apiKey": {
                "inlineSecret": OPENAI_API_KEY
            }
        },
        "dimensionality": 1536,
        "distributionType": "DENSE"
    }
)
response.raise_for_status()

# Get response and pretty print (equivalent to jq)
embedder_response = response.json()
print(json.dumps(embedder_response, indent=2))

# Extract embedder ID (equivalent to jq -r '.embedderId')
EMBEDDER_ID = embedder_response["embedderId"]
print(f"Embedder ID: {EMBEDDER_ID}")

If the commands above execute successfully, you will see something like below at the bottom of the output:

Embedder created successfully!

ID:               [EMBEDDER_UUID]
Display Name:     OpenAI test
Owner:            [OWNER_UUID]
Provider Type:    OPENAI
Distribution:     DENSE
Endpoint URL:     https://api.openai.com/v1
API Path:         /embeddings
Model:            text-embedding-3-large
Dimensionality:   1536
Created by:       [CREATED_BY_UUID]
Created at:       2026-01-11T21:47:30Z

Embedder ID: [EMBEDDER_UUID]

Embedder ID: [EMBEDDER_UUID]

Otherwise, an error message will be printed. read more

Step 1.2: Register the LLM

The commands below register OpenAI's gpt-5.1 LLM in GoodMem and save the LLM ID as the environment variable $LLM_ID for later use.

goodmem llm create \
  --display-name "My GPT-5.1" \
  --provider-type OPENAI \
  --endpoint-url "https://api.openai.com/v1" \
  --model-identifier "gpt-5.1" \
  --cred-api-key ${OPENAI_API_KEY} \
  --supports-chat | tee /tmp/llm_output.txt

export LLM_ID=$(grep "^ID:" /tmp/llm_output.txt | awk '{print $2}')
echo "LLM ID: $LLM_ID"

curl -X POST "${GOODMEM_BASE_URL}/v1/llms" \
  -H "Content-Type: application/json" \
  -H "x-api-key: ${GOODMEM_API_KEY}" \
  -d '{
  "displayName": "My GPT-5.1",
  "providerType": "OPENAI",
  "endpointUrl": "https://api.openai.com/v1",
  "modelIdentifier": "gpt-5.1",
  "credentials": {
    "kind": "CREDENTIAL_KIND_API_KEY",
    "apiKey": {
      "inlineSecret": "'"${OPENAI_API_KEY}"'"
    }
  }
}' | tee /tmp/llm_response.json | jq

export LLM_ID=$(jq -r '.llm.llmId' /tmp/llm_response.json)
echo "LLM ID: $LLM_ID"

# Register the LLM
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/llms",
    headers={
        "Content-Type": "application/json",
        "x-api-key": GOODMEM_API_KEY
    },
    json={
        "displayName": "My GPT-5.1",
        "providerType": "OPENAI",
        "endpointUrl": "https://api.openai.com/v1",
        "modelIdentifier": "gpt-5.1",
        "credentials": {
            "kind": "CREDENTIAL_KIND_API_KEY",
            "apiKey": {
                "inlineSecret": OPENAI_API_KEY
            }
        }
    }
)
response.raise_for_status()

# Get response and pretty print (equivalent to jq)
llm_response = response.json()
print(json.dumps(llm_response, indent=2))

# Extract LLM ID (equivalent to jq -r '.llm.llmId')
LLM_ID = llm_response["llm"]["llmId"]
print(f"LLM ID: {LLM_ID}")

If the commands above execute successfully, you will see something like below at the bottom of the output:

LLM created successfully!

ID:               [LLM_UUID]
Display Name:     My GPT-5.1
Owner:            [OWNER_UUID]
Provider Type:    OPENAI
Endpoint URL:     https://api.openai.com/v1
API Path:         /chat/completions
Model:            gpt-5.1
Modalities:       TEXT
Capabilities:     Chat, Completion, Functions, System Messages, Streaming
Created by:       [CREATED_BY_UUID]
Created at:       2026-01-11T21:47:53Z

Capability Inference:
  ✓ Completion Support: true (detected from model family 'gpt-5')
  ✓ Function Calling: true (detected from model family 'gpt-5')
  ✓ System Messages: true (detected from model family 'gpt-5')
  ✓ Streaming: true (detected from model family 'gpt-5')
  ✓ Sampling Parameters: false (detected from model family 'gpt-5')

LLM ID: [LLM_UUID]

LLM ID: [LLM_UUID]

Otherwise, an error message will be printed. read more

Step 2: Add knowledge to a space

A RAG agent's knowledge comes from memories. A memory can correspond to a PDF document, an email, a Markdown file, and so on. Memories can be grouped into collections called spaces (a.k.a. corpora) so you can easily control what an agent can access. In GoodMem, an agent can tap into multiple spaces. In this tutorial, we will create one space.

The goal of this step is to bring the RAG agent to the following state, where knowledge is written into the space from sources:

              ┌──────────────────────────────────────┐
              │    RAG Agent (under construction)    │
┌───────────┐ │      ┌──────────┐    ┌─────┐         │
│ Knowledge ├─┼──────▶ Embedder │    │ LLM │         │
│  Source   │ │      └────┬─────┘    └─────┘         │
└───────────┘ │           │ (write                   │
              │           │  into)                   │
              │      ┌────▼─────┐                    │
              │      │  Space   │                    │
              │      └──────────┘                    │
              └──────────────────────────────────────┘

Step 2.1: Create a space

A space needs to be associated with at least one embedder. GoodMem supports multi-embedder spaces; for example, you can use two embedders in parallel to benefit from both.

The commands below create a space with a single embedder. The embedder ID is referenced by the environment variable $EMBEDDER_ID set in Step 1.1. The space ID is saved to the environment variable $SPACE_ID for later use.

goodmem space create \
  --name "Goodmem test" \
  --embedder-id ${EMBEDDER_ID} | tee /tmp/space_output.txt

export SPACE_ID=$(grep "^ID:" /tmp/space_output.txt | awk '{print $2}')
echo "Space ID: $SPACE_ID"

curl -X POST "$GOODMEM_BASE_URL/v1/spaces" \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "name": "Goodmem test", 
        "spaceEmbedders": [
          {
            "embedderId": "'"$EMBEDDER_ID"'",
            "defaultRetrievalWeight": "1.0"
          }
        ]
    }' | tee /tmp/space_response.json | jq
export SPACE_ID=$(jq -r '.spaceId' /tmp/space_response.json)
echo "Space ID: $SPACE_ID"

# Create a space
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/spaces",
    headers={
        "x-api-key": GOODMEM_API_KEY,
        "Content-Type": "application/json"
    },
    json={
        "name": "Goodmem test",
        "spaceEmbedders": [
            {
                "embedderId": EMBEDDER_ID,
                "defaultRetrievalWeight": "1.0"
            }
        ]
    }
)
response.raise_for_status()

# Get response and pretty print (equivalent to jq)
space_response = response.json()
print(json.dumps(space_response, indent=2))

# Extract space ID (equivalent to jq -r '.spaceId')
SPACE_ID = space_response["spaceId"]
print(f"Space ID: {SPACE_ID}")

If the commands above run successfully, you will see something like below at the bottom of the output:

Space created successfully!

ID:         [SPACE_UUID]
Name:       Goodmem test
Owner:      [OWNER_UUID]
Created by: [CREATED_BY_UUID]
Created at: 2026-01-11T21:47:57Z
Public:     false
Embedder:   [EMBEDDER_UUID] (weight: 1)

Space ID: [SPACE_UUID]

Space ID: [SPACE_UUID]

Otherwise, an error message will be printed. read more

Step 2.2: Ingest data into the space

A memory can be ingested from two types of sources:

Plain text (for example, a conversation, a message, or a code snippet)
Files (for example, PDFs, Word documents, and text files)

We'll show how to ingest both kinds of data into the space.

Step 2.2.1: Ingest plain text

The commands below ingest plain text (in field originalContent ) into a space (in field spaceId ) whose ID is set to the environment variable $SPACE_ID in Step 2.1. It saves the ingested memory ID to the environment variable $MEMORY_ID for later use.

goodmem memory create \
  --space-id ${SPACE_ID} \
  --content "Transformers are a type of neural network architecture that are particularly well-suited for natural language processing tasks. A Transformer model leverages the attention mechanism to capture long-range dependencies in the input sequence." \
  --content-type "text/plain" | tee /tmp/memory_text_output.txt

export MEMORY_ID=$(grep "^ID:" /tmp/memory_text_output.txt | awk '{print $2}')
echo "Memory ID: $MEMORY_ID"

curl -X POST "$GOODMEM_BASE_URL/v1/memories" \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "spaceId":"'"$SPACE_ID"'",
        "originalContent": "'"Transformers are a type of neural network architecture that are particularly well-suited for natural language processing tasks. A Transformer model leverages the attention mechanism to capture long-range dependencies in the input sequence."'",
        "contentType": "text/plain"
    }' | tee /tmp/memory_text_response.json | jq 

export MEMORY_ID=$(jq -r '.memoryId' /tmp/memory_text_response.json)
echo "Memory ID: $MEMORY_ID"
echo "Processing Status: $(jq -r '.processingStatus' /tmp/memory_text_response.json)"

# Ingest plain text memory
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/memories",
    headers={
        "x-api-key": GOODMEM_API_KEY,
        "Content-Type": "application/json"
    },
    json={
        "spaceId": SPACE_ID,
        "originalContent": "Transformers are a type of neural network architecture that are particularly well-suited for natural language processing tasks. A Transformer model leverages the attention mechanism to capture long-range dependencies in the input sequence.",
        "contentType": "text/plain"
    }
)
response.raise_for_status()

# Get response and pretty print (equivalent to jq)
memory_text_response = response.json()
print(json.dumps(memory_text_response, indent=2))

# Extract memory ID and processing status (equivalent to jq -r)
MEMORY_ID = memory_text_response["memoryId"]
processing_status = memory_text_response["processingStatus"]
print(f"Memory ID: {MEMORY_ID}")
print(f"Processing Status: {processing_status}")

If the commands above execute successfully, you will see something like below at the bottom of the output.

Memory created successfully!

ID:            [MEMORY_UUID]
Space ID:      [SPACE_UUID]
Content Type:  text/plain
Status:        PENDING
Created by:    [CREATED_BY_UUID]
Created at:    2026-01-11T21:48:19Z

Memory ID: [MEMORY_UUID]
Processing Status: PENDING

Memory ID: [MEMORY_UUID]
Processing Status: PENDING

The field Processing Status is PENDING, because ingestion takes a while. If you are interested, read more

Step 2.2.2: Ingest a PDF with chunking

For this step, we'll use a sample PDF file. Download it with the following command, which saves it to your current directory.

wget https://raw.githubusercontent.com/PAIR-Systems-Inc/goodmem-samples/main/cookbook/1_Building_a_basic_RAG_Agent_with_GoodMem/sample_documents/employee_handbook.pdf

A PDF file is usually text-heavy. If we treat it as one memory, such a long memory will be ineffective for retrieval and inefficient or even impossible (out of context window) for the embedder or LLM to process. This is where chunking comes in: it splits the lengthy text into smaller segments. We provide a guide to help you design a chunking strategy for your use case. We'll use a placeholder chunking strategy below.

The command below ingests the PDF file into a new memory in the space specified by the environment variable $SPACE_ID (created in Step 2.1). The resulting memory ID is saved to the environment variable $MEMORY_ID for later use.

goodmem memory create \
  --space-id ${SPACE_ID} \
  --file employee_handbook.pdf | tee /tmp/memory_pdf_output.txt

export MEMORY_ID=$(grep "^ID:" /tmp/memory_pdf_output.txt | awk '{print $2}')
echo "Memory ID: $MEMORY_ID"

Chunking strategy not specified; GoodMem CLI defaults to recursive chunking.

REQUEST_JSON=$(jq -n \
  --arg spaceId "$SPACE_ID" \
  '{
    spaceId: $spaceId,
    contentType: "application/pdf",
    chunkingConfig: {
      recursive: {
        chunkSize: 512,
        chunkOverlap: 64,
        keepStrategy: "KEEP_END",
        lengthMeasurement: "CHARACTER_COUNT"
      }
    }
  }')

curl -X POST "$GOODMEM_BASE_URL/v1/memories" \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -F 'file=@employee_handbook.pdf' \
  -F "request=$REQUEST_JSON" | tee /tmp/memory_response.json | jq

export MEMORY_ID=$(jq -r '.memoryId' /tmp/memory_response.json)
echo "Memory ID: $MEMORY_ID"


# Build the request JSON
request_data = {
    "spaceId": SPACE_ID,
    "contentType": "application/pdf",
    "metadata": {"source": "FILE"},
    "chunkingConfig": {
        "recursive": {
            "chunkSize": 512,
            "chunkOverlap": 64,
            "keepStrategy": "KEEP_END",
            "lengthMeasurement": "CHARACTER_COUNT"
        }
    }
}

# Post multipart form data
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/memories",
    headers={'x-api-key': GOODMEM_API_KEY},
    files={'file': open('employee_handbook.pdf', 'rb')},
    data={'request': json.dumps(request_data)}
)

memory_response = response.json()
MEMORY_ID = memory_response['memoryId']
print(f"Memory ID: {MEMORY_ID}")

If the commands above execute successfully, you will see something like below at the bottom of the output.

Memory created successfully!

ID:            [MEMORY_UUID]
Space ID:      [SPACE_UUID]
Content Type:  application/pdf
Status:        PENDING
Created by:    [CREATED_BY_UUID]
Created at:    2026-01-11T21:48:19Z
Metadata:
  filename: employee_handbook.pdf

Memory ID: [MEMORY_UUID]

Memory ID: [MEMORY_UUID]

You can then check the status as you did in Step 2.2.1 for the plain-text memory.

If you encounter errors, click to see common issues. read more

Step 3: Run the RAG agent

The goal of this step is to bring the RAG agent to the following operational state:

       ┌─────────────────────────────────────┐                 
       |                                     |
       |          ┌──────────────────────────┼────────┐     
┌──────┴────┐     |                          |        |
│   User    │     │    ┌──────────┐       ┌──▼──┐     │     ┌──────────┐
│   Query   ├─────┼────▶ Embedder │       │ LLM ├─────┼─────▶ Response │
└───────────┘     │    └────┬─────┘       └──▲──┘     │     └──────────┘
                  │         │                |        │
                  │         |                |        │
                  │    ┌────▼─────┐    ┌───────────┐  │
                  │    │  Space   ├────▶ Retrieved │  │
                  │    └──────────┘    │ Knowledge │  │   
                  │                    └───────────┘  │
                  └───────────────────────────────────┘
                            RAG Agent in action

Now let's ask the question "What do you know about AI?" against the knowledge in the space $SPACE_ID (created in Step 2.1). The LLM used in this agent is specified by the environment variable $LLM_ID ID (registered in Step 1.2).

goodmem memory retrieve \
  --space-id ${SPACE_ID} \
  --post-processor-args '{"llm_id": "'"${LLM_ID}"'"}' \
  "What do you know about AI?"

curl -X POST "$GOODMEM_BASE_URL/v1/memories:retrieve" \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Content-Type: application/json" \
    -H "Accept: application/x-ndjson" \
    -d '{
        "message": "What do you know about AI?",
        "spaceKeys": [
            {
                "spaceId": "'"$SPACE_ID"'"
            }
        ], 
        "postProcessor": {
          "name": "com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
          "config": {
            "llm_id": "'"$LLM_ID"'"
          }
        }
    }'

# Spin the RAG agent
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/memories:retrieve",
    headers={
        "x-api-key": GOODMEM_API_KEY,
        "Content-Type": "application/json",
        "Accept": "application/x-ndjson"
    },
    json={
        "message": "What do you know about AI?",
        "spaceKeys": [
            {
                "spaceId": SPACE_ID
            }
        ],
        "postProcessor": {
            "name": "com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
            "config": {
                "llm_id": LLM_ID
            }
        }
    }
)
response.raise_for_status()

# Process NDJSON response (application/x-ndjson)
for line in response.text.strip().split('\n'):
    if line:
        print(json.dumps(json.loads(line), indent=2))

The variables SPACE_ID and LLM_ID were defined in previous steps.

An agent in GoodMem is like a lambda function in programming. You do not have to define a named agent. Instead, you compose an agent on the fly by specifying the spaces that supply knowledge and the LLM that turns retrieved knowledge into a response. We call this dynamic composition the Lambda Agent.

If it executes successfully, you'll get a very verbose response. Within it, you'll see a segment like this that summarizes the retrieved knowledge:

┌─ Abstract (relevance: 0.00)
│
│  From the retrieved data, one specific thing I know about AI is a key
│  architecture called the Transformer [20]. A Transformer is a type of neural
│  network that is particularly well‑suited for natural language
│  processing tasks. It uses an attention mechanism, which allows the model
│  to capture long‑range dependencies within an input sequence (for
│  example, words that are far apart in a sentence but still related).
└─

{
  "abstractReply": {
    "text": "From the retrieved data, the only direct information about AI is a short description of Transformer models [20]. It states that Transformers are a type of neural network architecture that are particularly well‑suited for natural language processing tasks [20]. A Transformer model uses an attention mechanism, which allows it to capture long‑range dependencies in an input sequence (for example, relationships between words that are far apart in a sentence) [20]. The rest of the retrieved content concerns organizational policies and employment information and does not address AI specifically."
  }
}

{
  "abstractReply": {
    "text": "From the retrieved data, the only direct information about AI is a short description of Transformer models [20]. It states that Transformers are a type of neural network architecture that are particularly well‑suited for natural language processing tasks [20]. A Transformer model uses an attention mechanism, which allows it to capture long‑range dependencies in an input sequence (for example, relationships between words that are far apart in a sentence) [20]. The rest of the retrieved content concerns organizational policies and employment information and does not address AI specifically."
  }
}

Do you recall what we ingested into the space? It's a text snippet about Transformers (Step 2.2.1) and a boilerplate employee handbook (Step 2.2.2). The response clearly states that it's based on the Transformers memory and rejects the employee handbook as not about AI. Makes sense, right?

In particular, the response includes citations, e.g., [20]. With these citations, your agent can point back to or highlight the relevant part of the original document.

If you failed, read more

Step 4 (optional): Add a reranker

The RAG agent above uses the embedders associated with the space(s) to retrieve knowledge. This retrieval uses the dot product, which is fast but can be less accurate. A reranker can greatly improve accuracy using a more sophisticated cross-encoder approach.

Step 4.1: Register a reranker

Since OpenAI doesn't offer a reranker, we'll use one from Voyage AI. First, obtain a Voyage AI API key and set it as an environment variable:

export VOYAGE_API_KEY="your_voyage_api_key"

Registering a reranker in GoodMem is similar to registering an embedder or LLM. The commands below register the Voyage AI rerank-2.5 reranker in GoodMem. The reranker ID is saved to the environment variable $RERANKER_ID for later use.

goodmem reranker create \
  --display-name "Voyage rerank-2.5" \
  --provider-type VOYAGE \
  --endpoint-url "https://api.voyageai.com" \
  --api-path "/v1/rerank" \
  --model-identifier "rerank-2.5" \
  --cred-api-key ${VOYAGE_API_KEY} | tee /tmp/reranker_output.txt

export RERANKER_ID=$(grep "^ID:" /tmp/reranker_output.txt | awk '{print $2}')
echo "Reranker ID: $RERANKER_ID"

curl -X POST "$GOODMEM_BASE_URL/v1/rerankers" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -d '{
    "displayName": "Voyage rerank-2.5",
    "providerType": "VOYAGE",
    "endpointUrl": "https://api.voyageai.com/v1",
    "modelIdentifier": "rerank-2.5",
    "credentials": {
      "kind": "CREDENTIAL_KIND_API_KEY",
      "apiKey": {
        "inlineSecret": "'"$VOYAGE_API_KEY"'"
      }
    }
  }' | tee /tmp/reranker_response.json | jq

export RERANKER_ID=$(jq -r '.rerankerId' /tmp/reranker_response.json)
echo "Reranker ID: $RERANKER_ID"

# Set up Voyage API key
VOYAGE_API_KEY = os.getenv("VOYAGE_API_KEY")

# Register the reranker
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/rerankers",
    headers={
        "Content-Type": "application/json",
        "x-api-key": GOODMEM_API_KEY
    },
    json={
        "displayName": "Voyage rerank-2.5",
        "providerType": "VOYAGE",
        "endpointUrl": "https://api.voyageai.com/v1",
        "modelIdentifier": "rerank-2.5",
        "credentials": {
            "kind": "CREDENTIAL_KIND_API_KEY",
            "apiKey": {
                "inlineSecret": VOYAGE_API_KEY
            }
        }
    }
)
response.raise_for_status()

# Get response and pretty print (equivalent to jq)
reranker_response = response.json()
print(json.dumps(reranker_response, indent=2))

# Extract reranker ID (equivalent to jq -r '.rerankerId')
RERANKER_ID = reranker_response["rerankerId"]
print(f"Reranker ID: {RERANKER_ID}")

If the commands above execute successfully, you will see something like below at the bottom of the output.

Reranker created successfully!

ID:               [RERANKER_UUID]
Display Name:     Voyage rerank-2.5
Owner:            [OWNER_UUID]
Provider Type:    VOYAGE
Endpoint URL:     https://api.voyageai.com
API Path:         /v1/rerank
Model:            rerank-2.5
Created:          2026-01-11T13:48:38-08:00
Updated:          2026-01-11T13:48:38-08:00

Reranker ID: [RERANKER_UUID]

Reranker ID: [RERANKER_UUID]

Step 4.2: Use the reranker in the agent

To use the reranker in RAG, expand the Step 3 call with the reranker ID:

goodmem memory retrieve \
  --space-id ${SPACE_ID} \
  --post-processor-args '{"llm_id": "'"${LLM_ID}"'", "reranker_id": "'"${RERANKER_ID}"'"}' \
  "What do you know about AI?"

curl -X POST "$GOODMEM_BASE_URL/v1/memories:retrieve" \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Content-Type: application/json" \
    -H "Accept: application/x-ndjson" \
    -d '{
        "message": "What do you know about AI?",
        "spaceKeys": [
            {
                "spaceId": "'"$SPACE_ID"'"
            }
        ], 
        "postProcessor": {
          "name": "com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
          "config": {
            "llm_id": "'"$LLM_ID"'", 
            "reranker_id": "'"$RERANKER_ID"'" # New line, compared to the RAG agent call in Step 3. 
          }
        }
    }'

# Spin the RAG agent with reranker
response = requests.post(
    f"{GOODMEM_BASE_URL}/v1/memories:retrieve",
    headers={
        "x-api-key": GOODMEM_API_KEY,
        "Content-Type": "application/json",
        "Accept": "application/x-ndjson"
    },
    json={
        "message": "What do you know about AI?",
        "spaceKeys": [
            {
                "spaceId": SPACE_ID
            }
        ],
        "postProcessor": {
            "name": "com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
            "config": {
                "llm_id": LLM_ID,
                "reranker_id": RERANKER_ID  # New line, compared to the RAG agent call in Step 3
            }
        }
    }
)
response.raise_for_status()

# Process NDJSON response (application/x-ndjson)
for line in response.text.strip().split('\n'):
    if line:
        print(json.dumps(json.loads(line), indent=2))

Congratulations! You now have a RAG agent that uses three core components of a RAG stack: an embedder, an LLM, and a reranker.

Why GoodMem

In this tutorial, we build a RAG agent with much less code than in other RAG frameworks. You only declare what you need: which embedder/LLM to use and what to add to memories. The rest is handled by GoodMem. This simplicity doesn't come at the expense of flexibility: you can create a space with more than one embedder, and a RAG agent can use more than one space. You can even gate space access by user role.

Beyond efficient building, GoodMem gives you deployment flexibility without vendor lock-in. You can deploy on-premise, air-gapped, or on any cloud provider you pick (GCP, Railway, Fly.io, etc.). You control your data sovereignty. You won't be locked out of your data. GoodMem has enterprise-grade security (encryption, RBAC) and compliance (SOC 2, FIPS, HIPAA) features built in. It is natively multimodal (paid feature), so no more OCR.

Max out RAG performance with GoodMem Tuner

cartoon_rag_control_panel

The tutorial above gives you a working RAG agent. But in the real world, you want a RAG stack that not only works but also works well. You want the highest accuracy at the lowest latency and cost, using everything at your disposal.

That means a large number of tweaks and knobs. For the LLM alone, here are some questions to ask:

Should I use a model from the Gemini, GPT, or Qwen series?
If I use the GPT series, is my use case simple enough to warrant GPT-5-mini or even GPT-5-nano, which are cheaper and faster than GPT-5?
Should I turn on thinking/reasoning mode? If so, what token budget or level (high, low, or default) should I use?
When is it worth fine-tuning the LLM?
How should I set the prompt so the LLM gives me a direct answer without rambling?

This is a rabbit hole.

This is where GoodMem Tuner comes in. It adjusts the knobs to find the best configuration for your use case. It doesn't tune just once; it tunes throughout the lifetime of your RAG agent. Think of it as lifetime fine-tuning. It can work with or without user feedback signals.

GoodMem Tuner is a paid feature. Contact us at [email protected] for details.

Building a Basic RAG Agent using GoodMem

On this page