Extract, monitor, list, and fetch PDF page images for previews and document viewers.

Work with PDF Page Images

GoodMem can extract per-page images for PDF memories. This is useful when you want:

Page previews in a document viewer
Thumbnail strips in the console or a custom UI
A visual fallback when text extraction is incomplete
A way to link retrieved chunks back to the pages they came from

This guide covers the full REST-facing workflow:

Request page-image extraction at ingest time
Check whether page images are ready
List available page-image renditions
Fetch one page image
Understand how retrieved chunks point back to their source pages

Before You Start

GoodMem server running (default REST address: https://localhost:8080)
API key with permission to create and read memories
A PDF file to upload

For PDF rendering, GoodMem selects a suitable page-rendering engine automatically in the default configuration. For server-side tuning details, see Server Runtime Footprint.

If you are calling the REST API directly, set:

export GOODMEM_REST_URL="https://localhost:8080"
export GOODMEM_API_KEY="gm_your_key"

For local development with the default self-signed certificate, add -k to curl and --verify=no to HTTPie.

Request Page Images at Ingest Time

Page-image extraction is controlled per memory.

REST and SDKs: set extractPageImages: true
CLI: enabled by default for eligible file uploads; use --no-extract-page-images to opt out

goodmem memory create \
  --space-id "$SPACE_ID" \
  --file document.pdf

curl -sS -k -X POST "$GOODMEM_REST_URL/v1/memories" \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -F '[email protected];type=application/pdf' \
  -F 'request={"spaceId":"'"$SPACE_ID"'","contentType":"application/pdf","extractPageImages":true};type=application/json'

http --verify=no -f POST "$GOODMEM_REST_URL/v1/memories" \
  x-api-key:"$GOODMEM_API_KEY" \
  [email protected];type=application/pdf \
  request:='{"spaceId":"'"$SPACE_ID"'","contentType":"application/pdf","extractPageImages":true}'

The response returns the created memory. Save its memoryId; you will use it for the page-image and memory-status calls below.

Check Whether Page Images Are Ready

Page-image extraction is tracked separately from the memory's main processing status. A memory can finish chunking and embedding while page images are still processing, and an image-only PDF can produce page images even if text extraction fails.

Check the memory:

curl -sS -k "$GOODMEM_REST_URL/v1/memories/$MEMORY_ID" \
  -H "x-api-key: $GOODMEM_API_KEY" | jq

http --verify=no GET "$GOODMEM_REST_URL/v1/memories/$MEMORY_ID" \
  x-api-key:"$GOODMEM_API_KEY"

Look at these fields on the memory:

pageImageStatus
pageImageCount

Expected status values:

PENDING
PROCESSING
COMPLETED
FAILED

Treat pageImageStatus == COMPLETED and pageImageCount > 0 as the signal that page images are ready to fetch.

pageImageCount counts stored page-image renditions, not guaranteed logical PDF pages. If every page has exactly one rendition, the values usually match. If some pages have multiple renditions, pageImageCount will be higher than the human-visible page count.

List Available Page Images

Use the page-image listing endpoint to discover which renditions exist for a memory.

Endpoint:

GET /v1/memories/{id}/pages

Basic request:

curl -sS -k "$GOODMEM_REST_URL/v1/memories/$MEMORY_ID/pages" \
  -H "x-api-key: $GOODMEM_API_KEY" | jq

http --verify=no GET "$GOODMEM_REST_URL/v1/memories/$MEMORY_ID/pages" \
  x-api-key:"$GOODMEM_API_KEY"

The response looks like this:

{
  "pageImages": [
    {
      "memoryId": "550e8400-e29b-41d4-a716-446655440000",
      "pageIndex": 0,
      "dpi": 150,
      "contentType": "image/png",
      "imageContentLength": 281233,
      "imageContentSha256": "2d711642b726b04401627ca9fbac32f5c8530fb1903cc4db02258717921a4881",
      "createdAt": 1714762260000,
      "updatedAt": 1714762260000
    }
  ],
  "nextToken": "..."
}

Notes:

pageIndex is 0-based
one logical page can have more than one rendition
renditions are distinguished by dpi and contentType
nextToken is opaque; if present, pass it back unchanged

Filter the List

Supported query parameters:

startPageIndex
endPageIndex
dpi
contentType
maxResults
nextToken

Snake_case aliases are also accepted:

start_page_index
end_page_index
content_type
max_results
next_token

Example: list just page 2:

curl -sS -k \
  "$GOODMEM_REST_URL/v1/memories/$MEMORY_ID/pages?startPageIndex=2&endPageIndex=2" \
  -H "x-api-key: $GOODMEM_API_KEY" | jq

Fetch One Page Image

Use this endpoint to download one page image as raw binary content:

GET /v1/memories/{id}/pages/{pageIndex}/image

In the common case, you can omit rendition hints entirely:

curl -sS -k \
  "$GOODMEM_REST_URL/v1/memories/$MEMORY_ID/pages/2/image" \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -o page-2.png

The server will return the unique rendition for that page if exactly one exists.

In the common case, you do not need to specify dpi or contentType. If GoodMem ever stores multiple renditions for the same page, the server may ask you to specify them explicitly.

HTTP Behavior

The image endpoint supports normal binary-download behavior:

GET for the image bytes
HEAD for headers only
Range requests
ETag and Digest headers when available

This is useful for browser caching and document-viewer prefetching.

Page Indices Are 0-Based

GoodMem uses 0-based page indices everywhere in the page-image APIs and chunk metadata.

Examples:

the first page in the PDF is pageIndex = 0
“page 3” in a human-facing UI is pageIndex = 2

If your UI shows human page numbers, convert them at the edge and keep the API calls 0-based.

How Retrieved Chunks Point Back to Pages

Page images are stored per page, but retrieved chunks can span one or more pages. GoodMem exposes page attribution in chunk metadata, not as first-class chunk fields.

When available, the metadata keys are:

source_page_start_index
source_page_end_index
source_page_count

Example retrieved chunk metadata:

{
  "metadata": {
    "source_page_start_index": 4,
    "source_page_end_index": 5,
    "source_page_count": 2
  }
}

Interpretation:

the chunk starts on page 4
ends on page 5
spans 2 pages total

These fields are optional. If GoodMem cannot infer page spans for a chunk, the keys are simply absent.

Common Patterns

Build a Viewer

Fetch the memory and wait for pageImageStatus == COMPLETED
List page metadata with GET /v1/memories/{id}/pages
Render each visible page with GET /v1/memories/{id}/pages/{pageIndex}/image
Use chunk metadata to highlight which pages a retrieval result came from

Handle Image-Only PDFs

Some PDFs do not yield usable extracted text. GoodMem can still render page images for them.

That means you may see:

processingStatus = FAILED
pageImageStatus = COMPLETED

This is expected for certain image-only or scan-heavy PDFs.

Next Steps

Optimize Document Ingestion for chunking strategy guidance
Building a Basic RAG Agent for end-to-end ingestion and retrieval
API Reference for the generated REST and gRPC surface

Work with PDF Page Images

On this page