GoodMem
ReferenceOCR

OCR Output Anatomy

Understand the structure of OcrDocumentResponse and per-page OCR results.

OCR Output Anatomy

The OCR APIs return an OcrDocumentResponse with page-ordered results, layout structure, and timing data. This page walks through the fields you will see in JSON or gRPC responses.

Availability

OCR is provided by the GoodMem OCR add-on service/image and is not included in the base install. Enable the add-on and configure GOODMEM_OCR_BASE_URL (or --ocr-base-url) before using OCR.

Top-level Response

OcrDocumentResponse contains (JSON field names shown):

  • detectedFormat - The resolved input format (even if the request used INPUT_FORMAT_UNSPECIFIED).
  • pageCount - Number of pages processed after applying a page range.
  • pages - Ordered list of OcrPageResult entries.
  • timings - Aggregate timings for the request.

pageCount matches the length of pages. If you request pages 2-4 (0-based), pageCount is 3, and each OcrPageResult.pageIndex reflects the original document index.

JSON responses (REST and CLI) use lower camel case field names. The gRPC/proto fields use snake_case names in the .proto definition.

Per-page Results

Each OcrPageResult is a oneof:

  • page - The successful OCR payload.
  • status - Per-page error status. gRPC returns a google.rpc.Status message; REST returns a simplified object with code and optional message.

Per-page errors do not fail the entire request. The response still returns normally with error entries where OCR or rendering failed.

Page Payload

OcrPage includes:

  • rawJson (optional) - Raw OCR JSON from the backend when include_raw_json is true.
  • markdown (optional) - Markdown rendering when include_markdown is true.
  • layout - Parsed layout structure (OcrLayout).
  • timings - Per-page timing breakdown (PageTimings).
  • image - Rendered image metadata (ImageInfo).

Layout Cells

OcrLayout.cells is an ordered list in reading order. Each OcrCell includes:

  • bbox - Bounding box coordinates (x1, y1, x2, y2) in page space.
  • categoryLabel - Raw category string emitted by OCR.
  • category - Normalized category enum (OcrCategory).
  • text - OCR text for that cell.

Common OcrCategory values include:

  • REST JSON: TEXT, TITLE, SECTION_HEADER, LIST_ITEM, TABLE, FORMULA, PICTURE, CAPTION, FOOTNOTE, PAGE_HEADER, PAGE_FOOTER, OTHER, UNKNOWN.
  • gRPC/CLI JSON: OCR_CATEGORY_TEXT, OCR_CATEGORY_TITLE, OCR_CATEGORY_SECTION_HEADER, OCR_CATEGORY_LIST_ITEM, OCR_CATEGORY_TABLE, OCR_CATEGORY_FORMULA, OCR_CATEGORY_PICTURE, OCR_CATEGORY_CAPTION, OCR_CATEGORY_FOOTNOTE, OCR_CATEGORY_PAGE_HEADER, OCR_CATEGORY_PAGE_FOOTER, OCR_CATEGORY_OTHER, OCR_CATEGORY_UNKNOWN.

Use categoryLabel when you need the raw OCR label; use category for normalized grouping.

Timings

DocumentTimings summarizes end-to-end and aggregate work:

  • wallTimeMs - Total request time.
  • sumQueueWaitMs, sumRenderMs, sumOcrMs, sumPageTotalMs - Aggregates over pages.

PageTimings provides per-page values:

  • queueWaitMs - Time waiting to render.
  • renderMs - Time rendering or decoding.
  • ocrMs - Time spent in OCR.
  • totalMs - Sum of queue, render, and OCR time.

Image Metadata

ImageInfo includes:

  • widthPx, heightPx - Rendered image dimensions.
  • dpi - Rendering DPI (0 when unknown, such as some image inputs).

Next, review limits and error handling: