OCR Quickstart

Use the goodmem ocr CLI command to run layout-aware OCR on PDFs and images. This guide covers input handling, page ranges, output formats, and how markdown is formatted on the client.

Availability

OCR is powered by the GoodMem OCR add-on service/image and is not included in the base install. To use OCR, run the add-on service and configure the GoodMem server with GOODMEM_OCR_BASE_URL or --ocr-base-url.

Before You Start

GoodMem server running (default gRPC address: https://localhost:9090).
goodmem CLI installed.
GoodMem OCR add-on service/image running and reachable by the server.
API key with the OCR_DOCUMENT permission (GOODMEM_API_KEY or --api-key).
Input file in a supported format: PDF, TIFF, PNG, JPEG, or BMP.

By default the REST server listens on HTTPS with a self-signed certificate (typically on https://localhost:8080). For local development, add -k to cURL and --verify=no to HTTPie. If you configure REST to run without TLS, switch the URL to http:// and drop those flags.

If you want to call the REST endpoint directly, set:

export GOODMEM_REST_URL="https://localhost:8080"
export GOODMEM_API_KEY="gm_your_key"

Run OCR on a File

goodmem ocr --file document.pdf --format json

content=$(base64 -w 0 document.pdf)
curl -sS -k --json @- "$GOODMEM_REST_URL/v1/ocr:document" \
  --header "x-api-key: $GOODMEM_API_KEY" <<JSON
{
  "content": "$content",
  "format": "PDF"
}
JSON

content=$(base64 -w 0 document.pdf)
http --verify=no POST "$GOODMEM_REST_URL/v1/ocr:document" \
  x-api-key:"$GOODMEM_API_KEY" \
  content="$content" \
  format="PDF"

For larger files, HTTPie can hit shell argument limits when passing base64 inline. In that case, write the JSON body to a file and pass it via stdin:

python3 - <<'PY' > request.json
import base64
import json

with open("document.pdf", "rb") as f:
    payload = {
        "content": base64.b64encode(f.read()).decode("ascii"),
        "format": "PDF",
    }
print(json.dumps(payload))
PY

jq -n --arg content "$(base64 -w 0 document.pdf)" \
  '{content: $content, format: "PDF"}' > request.json

http --verify=no POST "$GOODMEM_REST_URL/v1/ocr:document" \
  x-api-key:"$GOODMEM_API_KEY" \
  < request.json

The CLI sends the file to the OCR service and prints an OcrDocumentResponse JSON payload. Use --output (CLI) or a redirect (REST) to write the response to a file:

goodmem ocr --file scans.tiff --format json --output ocr-output.json

content=$(base64 -w 0 scans.tiff)
curl -sS -k --json @- "$GOODMEM_REST_URL/v1/ocr:document" \
  --header "x-api-key: $GOODMEM_API_KEY" \
  -o ocr-output.json <<JSON
{
  "content": "$content",
  "format": "TIFF"
}
JSON

content=$(base64 -w 0 scans.tiff)
http --verify=no POST "$GOODMEM_REST_URL/v1/ocr:document" \
  x-api-key:"$GOODMEM_API_KEY" \
  content="$content" \
  format="TIFF" \
  > ocr-output.json

Stream Input via stdin

If you are piping bytes, specify the input format explicitly to avoid ambiguity:

cat document.pdf | goodmem ocr --input-format pdf --format json

content=$(cat document.pdf | base64 -w 0)
curl -sS -k --json @- "$GOODMEM_REST_URL/v1/ocr:document" \
  --header "x-api-key: $GOODMEM_API_KEY" <<JSON
{
  "content": "$content",
  "format": "PDF"
}
JSON

content=$(cat document.pdf | base64 -w 0)
http --verify=no POST "$GOODMEM_REST_URL/v1/ocr:document" \
  x-api-key:"$GOODMEM_API_KEY" \
  content="$content" \
  format="PDF"

--input-format auto (the default) inspects file signatures and works for most inputs, but explicit formats are safer when streaming.

Choose Output Format

--format json (default) returns the full OCR response structure: detected format, per-page layout, image metadata, and timings.
--format markdown prints concatenated page markdown for human review.

REST always returns JSON. Use include flags to add markdown or raw OCR JSON fields:

goodmem ocr \
  --file document.pdf \
  --format json \
  --include-markdown \
  --include-raw-json

content=$(base64 -w 0 document.pdf)
curl -sS -k --json @- "$GOODMEM_REST_URL/v1/ocr:document" \
  --header "x-api-key: $GOODMEM_API_KEY" <<JSON
{
  "content": "$content",
  "format": "PDF",
  "includeMarkdown": true,
  "includeRawJson": true
}
JSON

content=$(base64 -w 0 document.pdf)
http --verify=no POST "$GOODMEM_REST_URL/v1/ocr:document" \
  x-api-key:"$GOODMEM_API_KEY" \
  content="$content" \
  format="PDF" \
  includeMarkdown:=true \
  includeRawJson:=true

Page Ranges (0-based, Inclusive)

Use --start-page and --end-page to limit work:

goodmem ocr --file document.pdf --start-page 0 --end-page 2

content=$(base64 -w 0 document.pdf)
curl -sS -k --json @- "$GOODMEM_REST_URL/v1/ocr:document" \
  --header "x-api-key: $GOODMEM_API_KEY" <<JSON
{
  "content": "$content",
  "format": "PDF",
  "startPage": 0,
  "endPage": 2
}
JSON

content=$(base64 -w 0 document.pdf)
http --verify=no POST "$GOODMEM_REST_URL/v1/ocr:document" \
  x-api-key:"$GOODMEM_API_KEY" \
  content="$content" \
  format="PDF" \
  startPage:=0 \
  endPage:=2

Omitting --start-page means "start at page 0".
Omitting --end-page means "process through the last page".
Errors are returned if the range is negative, inverted, or outside document bounds.

Markdown Behavior (CLI)

The server returns raw markdown derived from the layout text. When you request --format markdown, the CLI applies formatting by default:

Wraps paragraphs to --markdown-width (default 80, use 0 to disable wrapping).
Preserves code fences, lists, block quotes, tables, and indented blocks.
Keeps math block delimiters on their own lines ($$, \[ and \]).

To emit the raw server markdown without any client-side formatting, add --markdown-raw. REST responses include the raw markdown only (no client-side formatting).

Error Handling at a Glance

If OCR fails on specific pages, the response still succeeds and includes per-page error status. The CLI prints warnings to stderr and leaves failed pages blank in markdown output. See the OCR reference pages for full details on output structure and limits. If the request fails outright, verify your API key and permissions and confirm the OCR add-on service is reachable.

OCR Quickstart

On this page