PDF Text Cannot Be Selected — How to Fix It
When you try to click and drag to select text in a PDF but the cursor shows a hand icon instead of a text cursor, or when every attempt to highlight text results in selecting the whole page as an image, your PDF does not contain real text — it contains images of text. This is not a permissions problem or a viewer setting. It is a fundamental difference in how the PDF was created. Scanned documents, photos of pages, and PDFs exported from some graphics applications produce image-based PDFs where the appearance of text is just pixels, not actual text data. The solution is OCR — Optical Character Recognition — which reads the images and adds a layer of real text to the PDF. This guide explains the problem, the solution, and how to get the best OCR results.
Image-Based vs. Text-Based PDFs
PDFs exist in two fundamentally different forms. Text-based PDFs contain actual text data — characters, fonts, positions — that can be selected, searched, copied, and indexed. Image-based PDFs contain page images that happen to look like text, but the underlying data is pixels. Most digital-native PDFs (exported from Word, generated by software) are text-based. Most scanned PDFs are image-based. You can quickly verify which type you have: press Ctrl+F to open search in any PDF viewer and search for a word you can see on the page. If the search finds nothing even though the word is visibly present, you have an image-based PDF. If search works, the text is real.
How to Add Selectable Text Using OCR
OCR processes the page images and creates a corresponding text layer. The result is a searchable PDF where the visible appearance is unchanged — the page still looks like the original scan — but underlying text data is present for selection, search, and copy.
- 1Go to lazy-pdf.com/ocr and upload your image-based PDF. LazyPDF uses Tesseract.js, a battle-tested OCR engine that runs in the browser without uploading your file to a server.
- 2Select the language of your document from the language dropdown. Selecting the correct language significantly improves accuracy — Tesseract is trained per language and performs much better on the intended language than a generic setting.
- 3Click 'Run OCR' and wait for processing. Processing time scales with document length — a 10-page document takes roughly 30–60 seconds.
- 4Download the OCR-processed PDF. Open it in your PDF viewer and try selecting text. The text cursor should now appear, and Ctrl+F search should find content.
OCR Accuracy and When It Falls Short
OCR accuracy depends on image quality. Clean, high-contrast black text on white paper at 300 DPI or higher achieves very high accuracy — 98%+ on standard printed text. Low-quality scans, handwriting, unusual fonts, watermarks over text, and folded or wrinkled documents reduce accuracy significantly. For critical documents where OCR accuracy matters — contracts, transcripts, legal records — always verify extracted text by reading through the OCR output. Search for key terms and click on them to verify the match. For documents with poor scan quality, improving the scan quality (rescanning at higher resolution, better contrast settings) produces more accurate OCR than processing a low-quality scan repeatedly.
Permissions-Restricted Text Selection
A less common cause of non-selectable text is permission restrictions. Some PDFs are created with content copying disabled as a security measure. In this case, the text is technically real (text-based PDF) but the viewer enforces a no-copy restriction. This is distinct from an image-based PDF. To check: in Adobe Reader, go to File > Properties > Security > Document Restrictions Summary. If 'Content Copying' shows as 'Not Allowed', the file has copy restrictions. Removing these requires the owner password using a PDF unlock tool. With the correct password, use LazyPDF's unlock tool to remove restrictions, after which text selection will work normally.
Frequently Asked Questions
Does OCR change how my PDF looks?
No. OCR adds a hidden text layer beneath the visible page images without modifying the visual appearance. The PDF looks exactly the same after OCR processing — same layout, same image quality, same fonts as they appear. The only difference is that the underlying text data is now present for search and selection. The visible text on the page is still the original scanned image.
Will OCR work on a PDF with multiple languages?
LazyPDF's OCR tool uses Tesseract, which handles individual language selection well. For documents that mix multiple languages on the same page, accuracy may vary depending on which language is set. For primarily single-language documents with occasional foreign words (citations, proper nouns), standard single-language OCR works well. Multi-language documents with full text in two languages are processed more accurately when OCR is run twice with each language setting and the better result selected.
After OCR, why does copying text from my PDF still give garbled characters?
Garbled copied text usually means OCR misread characters due to image quality issues. This is most common with low-resolution scans, unusual fonts, or documents with staining or damage. Improving scan quality before OCR processing is the best remedy. Rescanning at 300 DPI or higher with good contrast settings typically resolves OCR errors on otherwise readable source documents.