OCR Limits and Errors
Supported inputs, range validation, and safety limits for OCR requests.
OCR Limits and Errors
GoodMem validates OCR inputs up front and applies safety limits per page. This page summarizes supported formats, page range rules, and the error patterns you should expect.
Availability
OCR is provided by the GoodMem OCR add-on service/image and is not included in the base install.
Enable the add-on and configure GOODMEM_OCR_BASE_URL (or --ocr-base-url) before using OCR.
Supported Inputs
Accepted input formats:
- PDF (multi-page)
- TIFF (single or multi-page)
- PNG
- JPEG
- BMP
INPUT_FORMAT_UNSPECIFIED (CLI --input-format auto) uses signature sniffing. Unsupported
formats such as GIF, PS, EPS, WebP, or HEIC return INVALID_ARGUMENT.
Page Range Rules
start_page and end_page are 0-based and inclusive:
- Omitting
start_pagedefaults to the first page. - Omitting
end_pagedefaults to the last page. - The range must be within document bounds.
- Negative values or
end_page < start_pagereturnINVALID_ARGUMENT.
Safety Limits (Current Defaults)
OCR enforces per-page safety limits. The current defaults are:
- Maximum pages per request: 100,000
- Maximum pixels per page: 50,000,000
- Maximum width or height: 20,000 pixels
Requests that exceed these limits return RESOURCE_EXHAUSTED. PDF inputs are rendered at
200 DPI by default before limits are evaluated.
Error Handling Overview
Common error patterns:
INVALID_ARGUMENTfor unsupported formats, unreadable inputs, or invalid page ranges.RESOURCE_EXHAUSTEDwhen page size, pixel area, or page count limits are exceeded.INTERNALfor unexpected OCR or rendering failures, including when the OCR add-on service is unreachable.
Per-page failures are surfaced in OcrPageResult.status while the overall request succeeds.
The CLI prints warnings to stderr when individual pages fail.
For field-level details, see: