Ocr
Ocr service API reference
Services
OcrService Service
OCR service for extracting layout-aware text from documents and images.
Auth: gRPC metadata authorization: Bearer <api-key>.
Permissions Required: OCR_DOCUMENT.
Global Errors: All RPCs may return DEADLINE_EXCEEDED, CANCELLED, UNAVAILABLE,
RESOURCE_EXHAUSTED, INTERNAL.
OcrDocument
Summary: Runs layout-aware OCR on a document or image and returns per-page results.
| Type | |
|---|---|
| Request | goodmem.v1.OcrDocumentRequest |
| Response | goodmem.v1.OcrDocumentResponse |
Auth: gRPC metadata authorization: Bearer <api-key>.
Permissions Required: OCR_DOCUMENT.
Request: OcrDocumentRequest containing document bytes and optional output flags.
Response: OcrDocumentResponse with page-ordered results and timing stats.
Side Effects: None.
Idempotency: Non-idempotent; repeated calls trigger new OCR requests.
Error Codes:
UNAUTHENTICATED: missing/invalid authPERMISSION_DENIED: lacksOCR_DOCUMENTINVALID_ARGUMENT: unsupported format, invalid document, invalid parametersRESOURCE_EXHAUSTED: document exceeds configured size/pixel/page limitsINTERNAL: unexpected OCR or rendering failure
Messages
OcrDocumentRequest
Request to run OCR on a document or image.
If a page range is provided, only pages within the inclusive range are processed.
| Field | Type | Description |
|---|---|---|
content | bytes | Raw document bytes |
format | goodmem.v1.InputFormat | Optional hint; UNSPECIFIED = auto-detect |
include_raw_json | bool | Include raw OCR JSON payload |
include_markdown | bool | Include markdown rendering from layout text |
start_page | int32 | 0-based inclusive start page (defaults to 0) |
end_page | int32 | 0-based inclusive end page (defaults to last page) |
OcrDocumentResponse
Response containing page-ordered OCR results.
| Field | Type | Description |
|---|---|---|
detected_format | goodmem.v1.InputFormat | Detected format (resolved even if request was UNSPECIFIED) |
page_count | uint32 | Number of pages processed after applying the range |
pages | goodmem.v1.OcrPageResult | Ordered per-page results (0-based indices) |
timings | goodmem.v1.DocumentTimings | Aggregate timing statistics |
OcrPageResult
Per-page OCR result with success payload or error status.
| Field | Type | Description |
|---|---|---|
page_index | int32 | 0-based page index |
page | goodmem.v1.OcrPage | OCR output for the page |
status | google.rpc.Status | Error status for the page |
OcrPage
OCR output for a single page.
| Field | Type | Description |
|---|---|---|
raw_json | string | Raw OCR JSON payload when requested |
markdown | string | Markdown rendering when requested |
layout | goodmem.v1.OcrLayout | Parsed layout output |
timings | goodmem.v1.PageTimings | Timing breakdown for the page |
image | goodmem.v1.ImageInfo | Rendered image metadata |
OcrLayout
Parsed layout output for a page.
| Field | Type | Description |
|---|---|---|
cells | goodmem.v1.OcrCell | Layout cells in reading order |
OcrCell
A single layout element in OCR output.
| Field | Type | Description |
|---|---|---|
bbox | goodmem.v1.BoundingBox | Bounding box in page coordinates |
category_label | string | Raw label emitted by OCR |
category | goodmem.v1.OcrCategory | Normalized category |
text | string | OCR text content (may be empty) |
BoundingBox
Bounding box coordinates in page space.
| Field | Type | Description |
|---|---|---|
x1 | double | Left coordinate |
y1 | double | Top coordinate |
x2 | double | Right coordinate |
y2 | double | Bottom coordinate |
ImageInfo
Metadata about the rendered page image.
| Field | Type | Description |
|---|---|---|
width_px | uint32 | Rendered image width in pixels |
height_px | uint32 | Rendered image height in pixels |
dpi | uint32 | Rendering DPI |
PageTimings
Timing breakdown for a page.
| Field | Type | Description |
|---|---|---|
queue_wait_ms | uint64 | Time spent waiting to render |
render_ms | uint64 | Time spent rendering the page |
ocr_ms | uint64 | Time spent running OCR |
total_ms | uint64 | Total page processing time |
DocumentTimings
Aggregate timing statistics for the request.
| Field | Type | Description |
|---|---|---|
wall_time_ms | uint64 | End-to-end request time |
sum_queue_wait_ms | uint64 | Sum of per-page queue wait times |
sum_render_ms | uint64 | Sum of per-page render times |
sum_ocr_ms | uint64 | Sum of per-page OCR times |
sum_page_total_ms | uint64 | Sum of per-page total times |
Enums
InputFormat
Supported input formats for OCR.
When INPUT_FORMAT_UNSPECIFIED is provided, the server attempts to infer the format by sniffing
file signatures. Unsupported formats (including GIF, PS, EPS, WebP, HEIC) are rejected with
INVALID_ARGUMENT.
| Name | Value | Description |
|---|---|---|
INPUT_FORMAT_UNSPECIFIED | 0 | Auto-detect format by signature sniffing |
INPUT_FORMAT_PDF | 1 | PDF document |
INPUT_FORMAT_TIFF | 2 | TIFF image (single or multi-page) |
INPUT_FORMAT_PNG | 3 | PNG image |
INPUT_FORMAT_JPEG | 4 | JPEG image |
INPUT_FORMAT_BMP | 5 | BMP image |
OcrCategory
Known dots.ocr category labels for layout parsing.
| Name | Value | Description |
|---|---|---|
OCR_CATEGORY_UNSPECIFIED | 0 | Unspecified or unknown category |
OCR_CATEGORY_CAPTION | 1 | Caption |
OCR_CATEGORY_FOOTNOTE | 2 | Footnote |
OCR_CATEGORY_FORMULA | 3 | Formula |
OCR_CATEGORY_LIST_ITEM | 4 | List-item |
OCR_CATEGORY_PAGE_FOOTER | 5 | Page-footer |
OCR_CATEGORY_PAGE_HEADER | 6 | Page-header |
OCR_CATEGORY_PICTURE | 7 | Picture |
OCR_CATEGORY_SECTION_HEADER | 8 | Section-header |
OCR_CATEGORY_TABLE | 9 | Table |
OCR_CATEGORY_TEXT | 10 | Text |
OCR_CATEGORY_TITLE | 11 | Title |
OCR_CATEGORY_OTHER | 12 | Other |
OCR_CATEGORY_UNKNOWN | 13 | Unknown |