How to Compress a Scanned PDF Without Losing Text
Scanned PDFs are notoriously large. A simple 10-page scanned contract might be 50MB because each page is stored as a high-resolution image — typically 300 DPI or higher, in full color, even when the original document was black and white. The good news is that scanned PDFs compress exceptionally well. The challenge is compressing aggressively enough to make the file manageable without degrading the text to the point where it becomes hard to read. Scanned text at moderate compression levels remains perfectly legible; at very high compression, fine text can become blurry and unprofessional. This guide explains exactly how to compress scanned PDFs effectively, what settings to use, and how to combine compression with OCR for maximum utility.
How to Compress a Scanned PDF with LazyPDF
Scanned PDF compression works through image resampling — reducing the DPI of embedded page images. This is the same process used by professional document management systems.
- 1Go to lazy-pdf.com/compress and upload your scanned PDF. LazyPDF accepts files of any size. Check the displayed file size after upload to know your starting point.
- 2Select 'Recommended' compression for balanced results. This setting typically reduces scanned PDF file sizes by 60–80% while keeping text fully legible. Use 'High' if storage space is critical and the document will only ever be viewed on screen.
- 3Click 'Compress PDF'. The file is processed on LazyPDF's server using Ghostscript, which is specifically designed to handle scanned image content efficiently.
- 4Download the output and open it at 100% zoom in a PDF viewer. Read several lines of text at actual size. If the text is clear and sharp, the compression level is appropriate. If lines appear jagged or blurry, compress the original again at a lower setting.
Why Scanned PDFs Are So Large — and Why They Compress So Well
A scanner capturing a letter-sized page at 300 DPI produces an image of 2,550 × 3,300 pixels — roughly 8 megapixels. At 24-bit color, that single page is about 24MB uncompressed before any PDF encoding. A 10-page document in full color can easily reach 50–100MB. The compression opportunity is significant because most scanned documents contain highly compressible content: black text on white paper. JPEG compression applied to this content at the right quality level reduces each page image dramatically while preserving text clarity. Ghostscript applies intelligent compression that accounts for document content type, which is why it outperforms generic image compression for scanned PDFs.
Black-and-White vs. Color Compression
If your scanned document is inherently black and white — a typed letter, a printed contract, a form — you will get better compression results by converting the color profile to grayscale before or during compression. Color scans of black-and-white documents waste storage encoding color information that does not exist in the content. LazyPDF's compression handles this automatically. Documents that are primarily black and white respond to compression differently than color-heavy documents like brochures or photo-embedded reports. For archival purposes, grayscale or black-and-white output at 150 DPI is sufficient for most legal and administrative documents.
Adding OCR After Compression for Searchable Text
Compressed scanned PDFs are smaller, but the text is still embedded as an image — it cannot be searched, selected, or copied. For documents you will reference frequently or need to search, adding OCR after compression is highly valuable. The recommended workflow is: compress first, then run OCR on the compressed file. This minimizes the size of the OCR-processed output. Use LazyPDF's OCR tool (lazy-pdf.com/ocr) on the compressed PDF — it adds a searchable text layer without modifying the visible page images. The result is a compact, fully searchable PDF that behaves like a natively digital document.
Understanding PDF Processing Technology
Modern PDF tools leverage WebAssembly and JavaScript libraries to process documents directly within your web browser. This client-side processing approach offers significant advantages over traditional server-based solutions. Your files remain on your device throughout the entire operation, eliminating privacy concerns associated with uploading sensitive documents to remote servers. The processing speed depends primarily on your device's capabilities rather than internet connection speed, which means operations complete almost instantaneously even for larger files. Browser-based PDF tools have evolved considerably in recent years. Libraries like pdf-lib enable sophisticated document manipulation including page reordering, merging, splitting, rotation, watermarking, and metadata editing without requiring any server communication. This technological advancement has democratized access to professional-grade PDF tools that previously required expensive desktop software licenses. Whether you are a student organizing research papers, a professional preparing business reports, or a freelancer managing client deliverables, these tools provide enterprise-level functionality at zero cost.
Frequently Asked Questions
How much can I compress a scanned PDF?
Scanned PDFs typically achieve the highest compression ratios of any PDF type — often 60–90% size reduction. A 50MB scanned document commonly compresses to 5–15MB at recommended settings. Color scans compress more than black-and-white scans because color images have more redundant data. The actual ratio depends on scan resolution, color depth, and content complexity.
Will compression make scanned text harder to read?
At recommended compression levels, text readability is maintained for standard font sizes and above. Very small print — footnotes, fine-print disclaimers in contracts — may show slight degradation at high compression. For legal or archival documents where fine text matters, use low or medium compression and verify the output by reading the smallest text in the document.
Should I run OCR before or after compressing a scanned PDF?
Compress first, then run OCR. Compressing a large scan first reduces the time needed for OCR processing and produces a smaller final file. Running OCR on an uncompressed 100MB scan and then compressing the OCR output sometimes yields worse results than compressing first. The quality difference in OCR accuracy between compressed and uncompressed inputs is negligible at recommended compression settings.