OCR Output Anatomy

The OCR APIs return an OcrDocumentResponse with page-ordered results, layout structure, and timing data. This page walks through the fields you will see in JSON or gRPC responses.

Availability

OCR is provided by the GoodMem OCR add-on service/image and is not included in the base install. Enable the add-on and configure GOODMEM_OCR_BASE_URL (or --ocr-base-url) before using OCR.

Top-level Response

OcrDocumentResponse contains (JSON field names shown):

detectedFormat - The resolved input format (even if the request used INPUT_FORMAT_UNSPECIFIED).
pageCount - Number of pages processed after applying a page range.
pages - Ordered list of OcrPageResult entries.
timings - Aggregate timings for the request.

pageCount matches the length of pages. If you request pages 2-4 (0-based), pageCount is 3, and each OcrPageResult.pageIndex reflects the original document index.

JSON responses (REST and CLI) use lower camel case field names. The gRPC/proto fields use snake_case names in the .proto definition.

Per-page Results

Each OcrPageResult is a oneof:

page - The successful OCR payload.
status - Per-page error status. gRPC returns a google.rpc.Status message; REST returns a simplified object with code and optional message.

Per-page errors do not fail the entire request. The response still returns normally with error entries where OCR or rendering failed.

Page Payload

OcrPage includes:

rawJson (optional) - Raw OCR JSON from the backend when include_raw_json is true.
markdown (optional) - Markdown rendering when include_markdown is true.
layout - Parsed layout structure (OcrLayout).
timings - Per-page timing breakdown (PageTimings).
image - Rendered image metadata (ImageInfo).

Layout Cells

OcrLayout.cells is an ordered list in reading order. Each OcrCell includes:

bbox - Bounding box coordinates (x1, y1, x2, y2) in page space.
categoryLabel - Raw category string emitted by OCR.
category - Normalized category enum (OcrCategory).
text - OCR text for that cell.

Common OcrCategory values include:

REST JSON: TEXT, TITLE, SECTION_HEADER, LIST_ITEM, TABLE, FORMULA, PICTURE, CAPTION, FOOTNOTE, PAGE_HEADER, PAGE_FOOTER, OTHER, UNKNOWN.
gRPC/CLI JSON: OCR_CATEGORY_TEXT, OCR_CATEGORY_TITLE, OCR_CATEGORY_SECTION_HEADER, OCR_CATEGORY_LIST_ITEM, OCR_CATEGORY_TABLE, OCR_CATEGORY_FORMULA, OCR_CATEGORY_PICTURE, OCR_CATEGORY_CAPTION, OCR_CATEGORY_FOOTNOTE, OCR_CATEGORY_PAGE_HEADER, OCR_CATEGORY_PAGE_FOOTER, OCR_CATEGORY_OTHER, OCR_CATEGORY_UNKNOWN.

Use categoryLabel when you need the raw OCR label; use category for normalized grouping.

Timings

DocumentTimings summarizes end-to-end and aggregate work:

wallTimeMs - Total request time.
sumQueueWaitMs, sumRenderMs, sumOcrMs, sumPageTotalMs - Aggregates over pages.

PageTimings provides per-page values:

queueWaitMs - Time waiting to render.
renderMs - Time rendering or decoding.
ocrMs - Time spent in OCR.
totalMs - Sum of queue, render, and OCR time.

Image Metadata

ImageInfo includes:

widthPx, heightPx - Rendered image dimensions.
dpi - Rendering DPI (0 when unknown, such as some image inputs).

Next, review limits and error handling:

OCR Limits and Errors

OCR Output Anatomy

On this page