Poor Text Quality in Scanned PDFs: Diagnosis and Fixes

Scanned PDFs are fundamentally different from native digital PDFs. A native PDF contains actual text characters that can be selected, searched, and copied. A scanned PDF is essentially a photograph of a document, embedded in a PDF container. When that photograph is low quality — taken at the wrong resolution, processed aggressively, or saved with excessive compression — the result is a PDF that looks terrible on screen and is impossible to work with. The quality issues in scanned PDFs manifest in two ways: visual quality (does it look sharp and readable?) and OCR quality (can software correctly extract the text?). These are related but not identical problems. A scan can look visually acceptable to a human reader while having poor enough quality to confuse OCR software, resulting in garbled text extraction. This guide covers every major cause of poor scanned PDF quality and the specific fixes for each.

Diagnosing the Source of the Quality Problem

Before applying a fix, identify what's actually wrong. Poor scanned PDF quality typically falls into one of these categories: **Scan resolution too low**: If the scan was done at 72 DPI or even 100 DPI (web resolutions), the text will look blurry when zoomed in, and OCR will struggle with fine letter details. Text requires at least 200 DPI, and 300 DPI is the minimum professional standard. **Over-aggressive JPEG compression**: Many scanners apply JPEG compression to reduce file size. Too much compression creates visible artifacts — blotchy patches, halos around letters, loss of fine detail in thin strokes. This makes OCR particularly difficult. **Poor scan conditions**: Physical issues — misaligned document, dirty scanner glass, wrinkled paper, shadow from page curl, low light — create problems that no amount of post-processing can fully fix. **Grayscale compressed to black-and-white**: Some scanners or scan-to-email features automatically convert to black-and-white or bilevel images. While this produces tiny file sizes, it loses subtle gray tones that help distinguish letters, especially in older documents or documents with decorative fonts. **OCR errors in text layer**: Some scanners add an OCR text layer automatically. If this OCR was inaccurate, the underlying image may look fine but the text you can select or copy is wrong.

The Scan Settings That Matter Most

If you're doing the scanning yourself and have control over settings, these are the knobs to get right.

1Set resolution to at least 300 DPI for standard documents, 400 DPI for documents with small text or fine details
2Use grayscale mode rather than black-and-white (bilevel) for most documents — grayscale preserves subtle contrast that helps OCR accuracy
3Use color mode only if the document has meaningful color content — it produces larger files with no quality benefit for text
4Set JPEG quality to 80% or higher, or use PNG/TIFF format if the scanner allows it (these are lossless)
5Clean the scanner glass before scanning and press the document flat to avoid page curl shadows
6Perform a test scan of one page and zoom in at 100% before scanning the whole document — verify text is sharp and clear

Improving OCR Accuracy on Poor Scans

Even with imperfect source scans, OCR can often be improved through preprocessing. Before running OCR, consider these enhancements: **Deskewing**: If pages were scanned at a slight angle, text rows are diagonal rather than horizontal. OCR engines work best on straight text. Deskewing corrects the angle and dramatically improves recognition accuracy. **Contrast enhancement**: Increasing contrast in a low-contrast scan (old yellowed documents, faded photocopies) makes text and background more distinct, which helps OCR. **Noise reduction**: Scanner noise (random specks and dots in the background) confuses OCR engines into treating noise pixels as letter fragments. A mild blur or median filter reduces noise while preserving letter shapes. **Binarization threshold**: For black-and-white conversion, the threshold that separates 'text' from 'background' matters enormously. Wrong threshold means text becomes incomplete or background noise becomes text. LazyPDF's OCR tool uses Tesseract, a robust open-source OCR engine that handles a wide range of document quality levels. For best results, upload a clean scan at 300 DPI or higher. The tool will extract text from the images and make the document searchable.

1Upload your scanned PDF to LazyPDF's OCR tool
2Select the correct language for the document — OCR accuracy depends heavily on the language model
3Process and download the OCR'd PDF
4Open the result and test text selection by clicking and dragging — if text selects correctly, OCR worked
5Search for a specific word (Ctrl+F or Cmd+F) to verify the text layer is accurate

When the Source Scan Cannot Be Improved

Sometimes you're stuck with a poor scan and can't redo it. Here are strategies for working with low-quality scanned PDFs: **Read the image, not the text**: For personal use where you just need to read the document, zoom in to 200-300% to read blurry text more easily. Your eye is much better at interpreting blurry letter shapes than OCR software. **Manual transcription for critical content**: If you need accurate text from a poor scan, manual typing may be necessary. OCR errors in important documents (contracts, court documents, medical records) can have serious consequences. **Use multiple OCR tools**: Different OCR engines have different strengths. If Tesseract struggles with a scan, Google Docs' free OCR (upload PDF, open with Google Docs) or Microsoft OneNote's OCR may handle it differently and get more accurate results. **Enhance before compressing**: If you need to compress a scanned PDF (to reduce file size), compress after any OCR and enhancement steps — not before. Starting from the highest quality scan preserves options.

Balancing Quality and File Size

High-resolution scans produce large files. A 300 DPI scan of a 20-page document can easily be 50-100 MB. Compressing such a file is legitimate — but the compression needs to be done carefully to avoid destroying the quality you worked to create. LazyPDF's compress tool uses Ghostscript settings that downsample images to appropriate resolutions for their intended use. 'eBook' quality settings target 150 DPI — adequate for screen reading. 'Screen' quality goes lower. 'Printer' settings preserve higher resolution. For scanned documents, use 'eBook' or 'Printer' quality compression — never 'Screen' quality if you need readable text. The size difference between 'eBook' and 'Screen' compression is often not worth the quality loss for scanned documents with thin text strokes.

Frequently Asked Questions

I ran OCR on my scanned PDF but the extracted text is full of errors. What should I do?

Errors usually mean either the scan resolution is too low (below 200 DPI), there's too much compression noise, or the wrong language was specified for OCR. Try re-scanning at 300 DPI if possible, or use image enhancement to improve contrast before re-running OCR. Also verify you selected the correct language in the OCR tool.

My scanned PDF looks fine but when I compress it, text becomes blurry. Why?

Compression reduces image resolution by downsampling. If the scanned text is at 200 DPI and compression downsamples to 72 DPI, the result will be blurry. Use less aggressive compression settings (Printer quality instead of Screen quality) that maintain a minimum 150 DPI for text-bearing images.

Is there a way to make a scanned PDF searchable without paid software?

Yes — LazyPDF's OCR tool is free and adds a searchable text layer to scanned PDFs. Google Docs also provides free OCR: upload the PDF to Google Drive, open it with Google Docs, and the OCR is applied automatically. The result can be copied to a Word document or exported back as PDF.

After OCR, my PDF has hidden text behind the scan. Is it possible for the two layers to be out of sync?

Yes, this is a known issue with OCR. If OCR wasn't accurately aligned with the original text positions, the text layer will be offset from the visual text. This causes text selection and copy-paste to produce content from the wrong position. Re-running OCR with better settings, or using a tool that reports OCR alignment accuracy, can help identify and fix this.

Make your scanned PDFs searchable and copyable with LazyPDF's free OCR tool.

Run OCR on PDF

Tips & Tricks