GoodMem
ReferenceAPIgRPC API

Ocr

Ocr service API reference

Services

OcrService Service

OCR service for extracting layout-aware text from documents and images.

Auth: gRPC metadata authorization: Bearer <api-key>. Permissions Required: OCR_DOCUMENT.

Global Errors: All RPCs may return DEADLINE_EXCEEDED, CANCELLED, UNAVAILABLE, RESOURCE_EXHAUSTED, INTERNAL.

OcrDocument

Summary: Runs layout-aware OCR on a document or image and returns per-page results.

Type
Requestgoodmem.v1.OcrDocumentRequest
Responsegoodmem.v1.OcrDocumentResponse

Auth: gRPC metadata authorization: Bearer <api-key>. Permissions Required: OCR_DOCUMENT. Request: OcrDocumentRequest containing document bytes and optional output flags. Response: OcrDocumentResponse with page-ordered results and timing stats. Side Effects: None. Idempotency: Non-idempotent; repeated calls trigger new OCR requests. Error Codes:

  • UNAUTHENTICATED: missing/invalid auth
  • PERMISSION_DENIED: lacks OCR_DOCUMENT
  • INVALID_ARGUMENT: unsupported format, invalid document, invalid parameters
  • RESOURCE_EXHAUSTED: document exceeds configured size/pixel/page limits
  • INTERNAL: unexpected OCR or rendering failure

Messages

OcrDocumentRequest

Request to run OCR on a document or image.

If a page range is provided, only pages within the inclusive range are processed.

FieldTypeDescription
contentbytesRaw document bytes
formatgoodmem.v1.InputFormatOptional hint; UNSPECIFIED = auto-detect
include_raw_jsonboolInclude raw OCR JSON payload
include_markdownboolInclude markdown rendering from layout text
start_pageint320-based inclusive start page (defaults to 0)
end_pageint320-based inclusive end page (defaults to last page)

OcrDocumentResponse

Response containing page-ordered OCR results.

FieldTypeDescription
detected_formatgoodmem.v1.InputFormatDetected format (resolved even if request was UNSPECIFIED)
page_countuint32Number of pages processed after applying the range
pagesgoodmem.v1.OcrPageResultOrdered per-page results (0-based indices)
timingsgoodmem.v1.DocumentTimingsAggregate timing statistics

OcrPageResult

Per-page OCR result with success payload or error status.

FieldTypeDescription
page_indexint320-based page index
pagegoodmem.v1.OcrPageOCR output for the page
statusgoogle.rpc.StatusError status for the page

OcrPage

OCR output for a single page.

FieldTypeDescription
raw_jsonstringRaw OCR JSON payload when requested
markdownstringMarkdown rendering when requested
layoutgoodmem.v1.OcrLayoutParsed layout output
timingsgoodmem.v1.PageTimingsTiming breakdown for the page
imagegoodmem.v1.ImageInfoRendered image metadata

OcrLayout

Parsed layout output for a page.

FieldTypeDescription
cellsgoodmem.v1.OcrCellLayout cells in reading order

OcrCell

A single layout element in OCR output.

FieldTypeDescription
bboxgoodmem.v1.BoundingBoxBounding box in page coordinates
category_labelstringRaw label emitted by OCR
categorygoodmem.v1.OcrCategoryNormalized category
textstringOCR text content (may be empty)

BoundingBox

Bounding box coordinates in page space.

FieldTypeDescription
x1doubleLeft coordinate
y1doubleTop coordinate
x2doubleRight coordinate
y2doubleBottom coordinate

ImageInfo

Metadata about the rendered page image.

FieldTypeDescription
width_pxuint32Rendered image width in pixels
height_pxuint32Rendered image height in pixels
dpiuint32Rendering DPI

PageTimings

Timing breakdown for a page.

FieldTypeDescription
queue_wait_msuint64Time spent waiting to render
render_msuint64Time spent rendering the page
ocr_msuint64Time spent running OCR
total_msuint64Total page processing time

DocumentTimings

Aggregate timing statistics for the request.

FieldTypeDescription
wall_time_msuint64End-to-end request time
sum_queue_wait_msuint64Sum of per-page queue wait times
sum_render_msuint64Sum of per-page render times
sum_ocr_msuint64Sum of per-page OCR times
sum_page_total_msuint64Sum of per-page total times

Enums

InputFormat

Supported input formats for OCR.

When INPUT_FORMAT_UNSPECIFIED is provided, the server attempts to infer the format by sniffing file signatures. Unsupported formats (including GIF, PS, EPS, WebP, HEIC) are rejected with INVALID_ARGUMENT.

NameValueDescription
INPUT_FORMAT_UNSPECIFIED0Auto-detect format by signature sniffing
INPUT_FORMAT_PDF1PDF document
INPUT_FORMAT_TIFF2TIFF image (single or multi-page)
INPUT_FORMAT_PNG3PNG image
INPUT_FORMAT_JPEG4JPEG image
INPUT_FORMAT_BMP5BMP image

OcrCategory

Known dots.ocr category labels for layout parsing.

NameValueDescription
OCR_CATEGORY_UNSPECIFIED0Unspecified or unknown category
OCR_CATEGORY_CAPTION1Caption
OCR_CATEGORY_FOOTNOTE2Footnote
OCR_CATEGORY_FORMULA3Formula
OCR_CATEGORY_LIST_ITEM4List-item
OCR_CATEGORY_PAGE_FOOTER5Page-footer
OCR_CATEGORY_PAGE_HEADER6Page-header
OCR_CATEGORY_PICTURE7Picture
OCR_CATEGORY_SECTION_HEADER8Section-header
OCR_CATEGORY_TABLE9Table
OCR_CATEGORY_TEXT10Text
OCR_CATEGORY_TITLE11Title
OCR_CATEGORY_OTHER12Other
OCR_CATEGORY_UNKNOWN13Unknown