OCR Output Anatomy
Understand the structure of OcrDocumentResponse and per-page OCR results.
OCR Output Anatomy
The OCR APIs return an OcrDocumentResponse with page-ordered results, layout structure, and
timing data. This page walks through the fields you will see in JSON or gRPC responses.
Availability
OCR is provided by the GoodMem OCR add-on service/image and is not included in the base install.
Enable the add-on and configure GOODMEM_OCR_BASE_URL (or --ocr-base-url) before using OCR.
Top-level Response
OcrDocumentResponse contains (JSON field names shown):
detectedFormat- The resolved input format (even if the request usedINPUT_FORMAT_UNSPECIFIED).pageCount- Number of pages processed after applying a page range.pages- Ordered list ofOcrPageResultentries.timings- Aggregate timings for the request.
pageCount matches the length of pages. If you request pages 2-4 (0-based), pageCount is
3, and each OcrPageResult.pageIndex reflects the original document index.
JSON responses (REST and CLI) use lower camel case field names. The gRPC/proto fields use
snake_case names in the .proto definition.
Per-page Results
Each OcrPageResult is a oneof:
page- The successful OCR payload.status- Per-page error status. gRPC returns agoogle.rpc.Statusmessage; REST returns a simplified object withcodeand optionalmessage.
Per-page errors do not fail the entire request. The response still returns normally with error entries where OCR or rendering failed.
Page Payload
OcrPage includes:
rawJson(optional) - Raw OCR JSON from the backend wheninclude_raw_jsonis true.markdown(optional) - Markdown rendering wheninclude_markdownis true.layout- Parsed layout structure (OcrLayout).timings- Per-page timing breakdown (PageTimings).image- Rendered image metadata (ImageInfo).
Layout Cells
OcrLayout.cells is an ordered list in reading order. Each OcrCell includes:
bbox- Bounding box coordinates (x1,y1,x2,y2) in page space.categoryLabel- Raw category string emitted by OCR.category- Normalized category enum (OcrCategory).text- OCR text for that cell.
Common OcrCategory values include:
- REST JSON:
TEXT,TITLE,SECTION_HEADER,LIST_ITEM,TABLE,FORMULA,PICTURE,CAPTION,FOOTNOTE,PAGE_HEADER,PAGE_FOOTER,OTHER,UNKNOWN. - gRPC/CLI JSON:
OCR_CATEGORY_TEXT,OCR_CATEGORY_TITLE,OCR_CATEGORY_SECTION_HEADER,OCR_CATEGORY_LIST_ITEM,OCR_CATEGORY_TABLE,OCR_CATEGORY_FORMULA,OCR_CATEGORY_PICTURE,OCR_CATEGORY_CAPTION,OCR_CATEGORY_FOOTNOTE,OCR_CATEGORY_PAGE_HEADER,OCR_CATEGORY_PAGE_FOOTER,OCR_CATEGORY_OTHER,OCR_CATEGORY_UNKNOWN.
Use categoryLabel when you need the raw OCR label; use category for normalized grouping.
Timings
DocumentTimings summarizes end-to-end and aggregate work:
wallTimeMs- Total request time.sumQueueWaitMs,sumRenderMs,sumOcrMs,sumPageTotalMs- Aggregates over pages.
PageTimings provides per-page values:
queueWaitMs- Time waiting to render.renderMs- Time rendering or decoding.ocrMs- Time spent in OCR.totalMs- Sum of queue, render, and OCR time.
Image Metadata
ImageInfo includes:
widthPx,heightPx- Rendered image dimensions.dpi- Rendering DPI (0 when unknown, such as some image inputs).
Next, review limits and error handling: