How to Reduce Scanned PDF File Size
Scanned PDFs are among the largest files in any document collection. A typical office scanner set to default settings produces files of 1-5 MB per page — a 20-page scanned document can easily exceed 50 MB. For comparison, a 20-page document created digitally in Word and saved as PDF is typically under 1 MB. That size difference comes entirely from how scanners store pages: as high-resolution bitmap images rather than efficient text and vector data. Large scanned PDFs create real operational problems: they exceed email attachment limits, fill up cloud storage quotas quickly, take too long to upload and download, and slow down document management systems. Reducing their size is one of the highest-impact document optimization tasks you can perform. The good news is that scanned PDFs offer the most dramatic compression opportunities of any PDF type. Because their content is stored as image data with significant redundancy, intelligent compression can often reduce file size by 70-90% while maintaining perfectly readable documents. This guide explains the techniques, tools, and trade-offs involved in reducing scanned PDF file size effectively.
Why Scanned PDFs Are So Large
Understanding why scanned PDFs are large helps you choose the right compression approach. When a scanner captures a document page, it creates a raster image — a grid of colored pixels representing the visual appearance of the page. At 300 DPI (a common scanning resolution), a letter-sized page (8.5 × 11 inches) contains 300 × 8.5 = 2,550 pixels wide and 300 × 11 = 3,300 pixels tall, for a total of about 8.4 million pixels. In full color, each pixel requires 3 bytes of data, producing a raw image of about 25 MB for a single page. Scanner software applies compression to reduce this raw data before saving. JPEG compression is the most common method for color scans and can reduce a 25 MB raw image to 500 KB at reasonable quality levels. But the compression settings chosen at scan time are often conservative (high quality, larger file size) rather than optimized for distribution. Additionally, many document scanners capture in full color even when the document is black-and-white text. A color image requires three times as much data as a grayscale image, and six times as much as a true black-and-white (1-bit) image. A standard text document scanned in black-and-white with JBIG2 compression can be under 50 KB per page — versus 500 KB for the same document scanned in color with JPEG compression. That is a 10x difference based solely on color mode and compression method.
- 1Check the current file size and page count of your scanned PDF.
- 2Determine the scan resolution by checking document properties or image DPI information.
- 3Check whether the scan is color, grayscale, or black-and-white.
- 4Estimate the potential compression ratio based on content type and current settings.
- 5Choose the compression approach appropriate for the document's purpose.
- 6Apply compression and verify the result quality before discarding originals.
Choosing the Right Compression Strategy
The right compression strategy depends on the document content and its intended use. Applying the same compression settings to a photography portfolio and a scanned invoice would produce different results — good for one type and inappropriate for the other. For black-and-white text documents — typed letters, printed invoices, printed forms, business correspondence — the most aggressive and lossless compression can be applied. Text documents contain predominantly black ink on white paper with clean edges. Reducing the scan to 1-bit (black-and-white) and applying JBIG2 or G4 compression produces very small files with perfectly sharp text. This is the preferred approach for archiving typed documents, and it can reduce file size by 90% or more compared to color JPEG scans. For documents that contain both text and images — annual reports with photographs, product catalogs, illustrated manuals — a mixed approach works best. Use high-quality compression for image regions and text-optimized compression for text regions. Downsampling color images to 150-200 DPI (from the original 300+ DPI scan) reduces image data significantly while maintaining adequate quality for both screen and print viewing. For documents where color is meaningful — color-coded forms, documents with highlighted text, color photographs — you must preserve color but can still apply JPEG compression and reduce resolution to 150 DPI for screen distribution. For archival copies of color documents, use 200 DPI with moderate JPEG compression to balance quality and size.
- 1For black-and-white text documents: convert to 1-bit and use lossless compression.
- 2For mixed text and image documents: use moderate JPEG compression at 150-200 DPI.
- 3For color-critical documents: use 150-200 DPI with moderate JPEG quality settings.
- 4For archival copies: prioritize quality slightly over size (200 DPI, lower compression).
Running OCR Before or After Compression
OCR (Optical Character Recognition) and compression interact in interesting ways, and the order in which you apply them matters. Running OCR on a scanned PDF adds a text layer underneath the images, making the document searchable. This text layer is very small (text is efficient to store) but it does add some data to the file — usually less than 10 KB per page. Running compression after OCR is generally preferable. OCR engines perform better on higher-resolution images because they have more pixel data to analyze when recognizing characters. If you compress the scanned images first, reducing their resolution and potentially introducing JPEG artifacts, the OCR accuracy may decrease. Apply OCR to the original high-resolution scan, then compress the resulting searchable PDF. However, for large batches of historical documents where running OCR on every file is not practical, compression first is acceptable — just use moderate settings that preserve text legibility (at least 150 DPI in the compressed version) to keep the option open for future OCR. LazyPDF's OCR tool runs OCR on scanned PDFs to create searchable versions, and the Compress tool reduces file size. For best results with scanned document archives, run OCR first on the original high-resolution scans to create searchable PDFs, then compress those searchable PDFs. The resulting files are both small and searchable — ideal for document management systems and long-term archives.
- 1Upload the original high-resolution scanned PDF to LazyPDF's OCR tool.
- 2Run OCR to create a searchable PDF with a text layer.
- 3Download the searchable PDF and upload it to the Compress tool.
- 4Apply compression and download the resulting small, searchable PDF.
Quality Verification After Compression
After compressing a scanned PDF, always verify the output quality before discarding the originals. The key questions to answer are: Is the text still clearly readable? Are all parts of the document present? For documents with significance (legal, financial, medical), is the quality sufficient for the document's intended purpose? Open the compressed PDF and zoom to 100% — actual pixel size. Examine representative pages including any with small text, faint printing, or complex forms. The text should be sharp and clear, not blurry or pixelated. If text appears blurry at 100% zoom, the compression was too aggressive for this document and needs to be redone with lighter settings. For documents that will be printed, print a test page and compare it to a print from the original file. Paper reveals compression artifacts that are not always obvious on screen — if text prints soft or images look muddy, use lighter compression settings. For legally important documents such as signed contracts, deeds, court filings, and similar records, ensure that signatures remain clearly legible, notary stamps and seals are visible, dates and amounts are unambiguous, and any handwriting is readable. These elements are often the most important parts of legal documents and are the first to be degraded by aggressive compression. Maintain a copy of the original uncompressed scan in cold storage (an external drive, long-term cloud archive) even after compressing for active use. Storage is cheap; the ability to reprocess an original scan with better compression settings in the future is valuable.
Frequently Asked Questions
How much can I compress a scanned PDF without losing readability?
For black-and-white text-only scanned documents, you can typically reduce file size by 80-95% without any noticeable loss of readability. A 10 MB scanned invoice can become 500 KB or less while remaining perfectly legible. For color scanned documents with photographs, a 70-80% reduction is achievable while maintaining adequate screen and print quality at 150 DPI. The key limit is text legibility — if compressed text appears blurry or small characters become indistinct, you have compressed too aggressively and need to dial back the settings.
Should I compress scanned PDFs before or after adding them to a document archive?
Compress before archiving for most use cases. Archiving compressed files saves storage space from the start and reduces backup storage costs. However, if your archive system applies its own compression, or if you plan to run OCR on the archived files later, consider archiving the originals and compressing on demand when files need to be distributed. For legal and compliance archives where original scan quality may need to be demonstrated, maintain an original-quality archive alongside the compressed working copies. The right approach depends on your storage costs, retrieval patterns, and compliance requirements.
Can I compress a scanned PDF that has already been OCR'd?
Yes, you can compress an OCR'd (searchable) scanned PDF, and the text layer from OCR will be preserved in the compressed output. The compression process reduces the size of the embedded image data (the actual scanned images) while the text layer data remains largely unchanged. After compression, the document will still be searchable, and text can still be selected and copied. The compressed version will be smaller but functionally identical to the pre-compression version from the reader's perspective.
Why is my scanned PDF still large after compression?
If compression did not significantly reduce your scanned PDF's size, several factors may be at play. The document may have already been compressed at scan time, leaving little redundancy for further compression. Very high resolution scans (600 DPI or higher) retain significant size even after compression — try reducing DPI as part of the optimization. Documents with many photographs or complex graphics have lower compression ratios than text-only documents. If the PDF was created by printing a digital document to PDF and then scanning, it may contain embedded fonts or vector graphics mixed with the scan images, which behave differently under compression. In these cases, a dedicated PDF optimizer tool may achieve better results than basic compression.