Ocr

Services

OcrService Service

OCR service for extracting layout-aware text from documents and images.

Auth: gRPC metadata authorization: Bearer <api-key>. Permissions Required: OCR_DOCUMENT.

Global Errors: All RPCs may return DEADLINE_EXCEEDED, CANCELLED, UNAVAILABLE, RESOURCE_EXHAUSTED, INTERNAL.

OcrDocument

Summary: Runs layout-aware OCR on a document or image and returns per-page results.

	Type
Request	`goodmem.v1.OcrDocumentRequest`
Response	`goodmem.v1.OcrDocumentResponse`

Auth: gRPC metadata authorization: Bearer <api-key>. Permissions Required: OCR_DOCUMENT. Request: OcrDocumentRequest containing document bytes and optional output flags. Response: OcrDocumentResponse with page-ordered results and timing stats. Side Effects: None. Idempotency: Non-idempotent; repeated calls trigger new OCR requests. Error Codes:

UNAUTHENTICATED: missing/invalid auth
PERMISSION_DENIED: lacks OCR_DOCUMENT
INVALID_ARGUMENT: unsupported format, invalid document, invalid parameters
RESOURCE_EXHAUSTED: document exceeds configured size/pixel/page limits
INTERNAL: unexpected OCR or rendering failure

Messages

OcrDocumentRequest

Request to run OCR on a document or image.

If a page range is provided, only pages within the inclusive range are processed.

Field	Type	Description
`content`	`bytes`	Raw document bytes
`format`	`goodmem.v1.InputFormat`	Optional hint; UNSPECIFIED = auto-detect
`include_raw_json`	`bool`	Include raw OCR JSON payload
`include_markdown`	`bool`	Include markdown rendering from layout text
`start_page`	`int32`	0-based inclusive start page (defaults to 0)
`end_page`	`int32`	0-based inclusive end page (defaults to last page)

OcrDocumentResponse

Response containing page-ordered OCR results.

Field	Type	Description
`detected_format`	`goodmem.v1.InputFormat`	Detected format (resolved even if request was UNSPECIFIED)
`page_count`	`uint32`	Number of pages processed after applying the range
`pages`	`goodmem.v1.OcrPageResult`	Ordered per-page results (0-based indices)
`timings`	`goodmem.v1.DocumentTimings`	Aggregate timing statistics

OcrPageResult

Per-page OCR result with success payload or error status.

Field	Type	Description
`page_index`	`int32`	0-based page index
`page`	`goodmem.v1.OcrPage`	OCR output for the page
`status`	`google.rpc.Status`	Error status for the page

OcrPage

OCR output for a single page.

Field	Type	Description
`raw_json`	`string`	Raw OCR JSON payload when requested
`markdown`	`string`	Markdown rendering when requested
`layout`	`goodmem.v1.OcrLayout`	Parsed layout output
`timings`	`goodmem.v1.PageTimings`	Timing breakdown for the page
`image`	`goodmem.v1.ImageInfo`	Rendered image metadata

OcrLayout

Parsed layout output for a page.

Field	Type	Description
`cells`	`goodmem.v1.OcrCell`	Layout cells in reading order

OcrCell

A single layout element in OCR output.

Field	Type	Description
`bbox`	`goodmem.v1.BoundingBox`	Bounding box in page coordinates
`category_label`	`string`	Raw label emitted by OCR
`category`	`goodmem.v1.OcrCategory`	Normalized category
`text`	`string`	OCR text content (may be empty)

BoundingBox

Bounding box coordinates in page space.

Field	Type	Description
`x1`	`double`	Left coordinate
`y1`	`double`	Top coordinate
`x2`	`double`	Right coordinate
`y2`	`double`	Bottom coordinate

ImageInfo

Metadata about the rendered page image.

Field	Type	Description
`width_px`	`uint32`	Rendered image width in pixels
`height_px`	`uint32`	Rendered image height in pixels
`dpi`	`uint32`	Rendering DPI

PageTimings

Timing breakdown for a page.

Field	Type	Description
`queue_wait_ms`	`uint64`	Time spent waiting to render
`render_ms`	`uint64`	Time spent rendering the page
`ocr_ms`	`uint64`	Time spent running OCR
`total_ms`	`uint64`	Total page processing time

DocumentTimings

Aggregate timing statistics for the request.

Field	Type	Description
`wall_time_ms`	`uint64`	End-to-end request time
`sum_queue_wait_ms`	`uint64`	Sum of per-page queue wait times
`sum_render_ms`	`uint64`	Sum of per-page render times
`sum_ocr_ms`	`uint64`	Sum of per-page OCR times
`sum_page_total_ms`	`uint64`	Sum of per-page total times

Enums

InputFormat

Supported input formats for OCR.

When INPUT_FORMAT_UNSPECIFIED is provided, the server attempts to infer the format by sniffing file signatures. Unsupported formats (including GIF, PS, EPS, WebP, HEIC) are rejected with INVALID_ARGUMENT.

Name	Value	Description
`INPUT_FORMAT_UNSPECIFIED`	0	Auto-detect format by signature sniffing
`INPUT_FORMAT_PDF`	1	PDF document
`INPUT_FORMAT_TIFF`	2	TIFF image (single or multi-page)
`INPUT_FORMAT_PNG`	3	PNG image
`INPUT_FORMAT_JPEG`	4	JPEG image
`INPUT_FORMAT_BMP`	5	BMP image

OcrCategory

Known dots.ocr category labels for layout parsing.

Name	Value	Description
`OCR_CATEGORY_UNSPECIFIED`	0	Unspecified or unknown category
`OCR_CATEGORY_CAPTION`	1	Caption
`OCR_CATEGORY_FOOTNOTE`	2	Footnote
`OCR_CATEGORY_FORMULA`	3	Formula
`OCR_CATEGORY_LIST_ITEM`	4	List-item
`OCR_CATEGORY_PAGE_FOOTER`	5	Page-footer
`OCR_CATEGORY_PAGE_HEADER`	6	Page-header
`OCR_CATEGORY_PICTURE`	7	Picture
`OCR_CATEGORY_SECTION_HEADER`	8	Section-header
`OCR_CATEGORY_TABLE`	9	Table
`OCR_CATEGORY_TEXT`	10	Text
`OCR_CATEGORY_TITLE`	11	Title
`OCR_CATEGORY_OTHER`	12	Other
`OCR_CATEGORY_UNKNOWN`	13	Unknown

Ocr

On this page