GoodMem
ReferenceOCR

OCR Limits and Errors

Supported inputs, range validation, and safety limits for OCR requests.

OCR Limits and Errors

GoodMem validates OCR inputs up front and applies safety limits per page. This page summarizes supported formats, page range rules, and the error patterns you should expect.

Availability

OCR is provided by the GoodMem OCR add-on service/image and is not included in the base install. Enable the add-on and configure GOODMEM_OCR_BASE_URL (or --ocr-base-url) before using OCR.

Supported Inputs

Accepted input formats:

  • PDF (multi-page)
  • TIFF (single or multi-page)
  • PNG
  • JPEG
  • BMP

INPUT_FORMAT_UNSPECIFIED (CLI --input-format auto) uses signature sniffing. Unsupported formats such as GIF, PS, EPS, WebP, or HEIC return INVALID_ARGUMENT.

Page Range Rules

start_page and end_page are 0-based and inclusive:

  • Omitting start_page defaults to the first page.
  • Omitting end_page defaults to the last page.
  • The range must be within document bounds.
  • Negative values or end_page < start_page return INVALID_ARGUMENT.

Safety Limits (Current Defaults)

OCR enforces per-page safety limits. The current defaults are:

  • Maximum pages per request: 100,000
  • Maximum pixels per page: 50,000,000
  • Maximum width or height: 20,000 pixels

Requests that exceed these limits return RESOURCE_EXHAUSTED. PDF inputs are rendered at 200 DPI by default before limits are evaluated.

Error Handling Overview

Common error patterns:

  • INVALID_ARGUMENT for unsupported formats, unreadable inputs, or invalid page ranges.
  • RESOURCE_EXHAUSTED when page size, pixel area, or page count limits are exceeded.
  • INTERNAL for unexpected OCR or rendering failures, including when the OCR add-on service is unreachable.

Per-page failures are surfaced in OcrPageResult.status while the overall request succeeds. The CLI prints warnings to stderr when individual pages fail.

For field-level details, see: