PDF tools

OCR PDF

Pull text out of scanned PDFs — photos of paper become editable Word or plain text. Runs in your browser, nothing uploaded.

Frequently asked

What is OCR?
Optical Character Recognition — software that reads text out of an image. Scanned PDFs are pictures of paper, so the text isn't selectable until OCR turns the pixels back into letters.
Does the OCR run on your servers?
No. The OCR engine (Tesseract) and your PDF both stay in your browser. The first run downloads ~15 MB (cached after) — every page after that processes locally.
How accurate is the OCR?
On clean scans of printed text (laser printer, flatbed scanner, English): ~95%+. On phone photos with glare, low light, or handwriting: closer to 70-85%. Always proofread.
How long does it take?
5-15 seconds per page on a modern laptop. 15-60 seconds per page on a phone. Page cap is 50 per run to avoid out-of-memory on mobile browsers.
What languages are supported?
English in v1. More languages coming — each adds another ~5 MB of language data to download.
Can it OCR handwriting?
No. Tesseract is built for printed text. Handwriting recognition needs a different model (we don't ship one yet).