Question 1

What is OCR?

Accepted Answer

Optical Character Recognition — software that reads text out of an image. Scanned PDFs are pictures of paper, so the text isn't selectable until OCR turns the pixels back into letters.

Question 2

Does the OCR run on your servers?

Accepted Answer

No. The OCR engine (Tesseract) and your PDF both stay in your browser. The first run downloads ~15 MB (cached after) — every page after that processes locally.

Question 3

How accurate is the OCR?

Accepted Answer

On clean scans of printed text (laser printer, flatbed scanner, English): ~95%+. On phone photos with glare, low light, or handwriting: closer to 70-85%. Always proofread.

Question 4

How long does it take?

Accepted Answer

5-15 seconds per page on a modern laptop. 15-60 seconds per page on a phone. Page cap is 50 per run to avoid out-of-memory on mobile browsers.

Question 5

What languages are supported?

Accepted Answer

English in v1. More languages coming — each adds another ~5 MB of language data to download.

Question 6

Can it OCR handwriting?

Accepted Answer

No. Tesseract is built for printed text. Handwriting recognition needs a different model (we don't ship one yet).

OCR PDF

Frequently asked