How-To GuidesMarch 13, 2026

How to OCR PDF Documents for Legal Discovery

Electronic discovery — the process of identifying, collecting, and producing electronically stored information for litigation — has transformed how legal teams handle documentary evidence. Modern e-discovery requires text-searchable documents: keyword searches across thousands of documents, concept-based review, and machine-assisted categorization all depend on the underlying text being machine-readable rather than locked in scanned image format. Scanned documents present a significant challenge in legal discovery. Physical records, historical contracts, older correspondence, handwritten notes, and many client-produced documents arrive as image-based PDFs where a scanner captured the document as a photograph rather than as extractable text. These documents must be OCR-processed to convert the visual text to searchable text before they can be properly indexed, searched, and reviewed in e-discovery platforms. This guide covers the OCR process for legal documents, the quality considerations specific to legal discovery, and how to integrate OCR into document processing workflows before loading into review platforms.

How to OCR a Scanned Legal Document

OCR (Optical Character Recognition) converts the text in scanned images to machine-readable text that can be indexed and searched. For legal discovery, the OCR output quality determines how accurately keyword searches will find relevant documents — poor OCR quality means relevant documents may be missed in searches, creating discovery compliance risk. LazyPDF's OCR tool uses Tesseract.js to process scanned PDF pages and extract text that can then be searched. For individual documents and small sets, this browser-based tool provides quick OCR without requiring specialized litigation software. Upload the scanned PDF, process through OCR, and download a searchable version that text search tools can index. For large-scale e-discovery productions involving thousands of documents, dedicated litigation support platforms (Relativity, Nuix, DISCO, Everlaw) have native OCR processing built into their document loading workflows and are the appropriate tool for production-scale processing. LazyPDF OCR is best suited for smaller volumes — individual scanned contracts, specific documents for attorney review, or documents that need to be searchable before being loaded into a platform.

1Identify which PDFs in your document set are image-based (test: can you select text with your cursor?)
2Open lazy-pdf.com/ocr and upload the scanned PDF document
3Process the document through OCR to convert image text to searchable text
4Download the OCR-processed PDF and verify text is now searchable by testing Ctrl+F

OCR Quality Considerations for Legal Documents

OCR accuracy is critical in legal contexts where a missed keyword can mean a relevant document is not produced, creating discovery sanctions risk. Several factors affect OCR quality on legal documents and should be assessed before processing. Document scan quality is the primary determinant of OCR accuracy. A clean, straight scan at 300 DPI or higher produces excellent OCR results. A slanted, faded, or low-contrast scan produces errors. If you have access to the original physical document, rescanning at 300 DPI in black and white (not grayscale or color, which increases file size without improving OCR quality for text documents) before OCR processing will improve accuracy significantly. Handwritten text is not reliably OCR-processable with standard tools — Tesseract and similar engines are trained on typed text and produce unreliable results on cursive or informal handwriting. Handwritten documents should be flagged for manual review and human transcription rather than relying on automated OCR output. For printed documents with minor stamping, annotations, or form fields, OCR handles the main text reliably while stamps and handwritten annotations may not be captured accurately.

1Assess scan quality before OCR: confirm 300 DPI, straight alignment, no significant fading
2Test OCR output accuracy by searching for known words that appear in the document
3For critical documents, manually verify key terms in the OCR output against the visual document
4Flag handwritten documents for manual review rather than relying on automated OCR

Integrating OCR into Document Review Workflows

In a typical legal discovery workflow, documents arrive from multiple sources with varying levels of text searchability. Native electronic files (Word documents, emails, spreadsheets) are already text-searchable when converted to PDF. Scanned documents require OCR processing before they can be properly searched. The document processing workflow must identify image-based PDFs and route them through OCR before indexing. For litigation support workflows, Bates numbering is typically applied during the document processing stage. If OCR is part of the workflow, process OCR before Bates numbering — Bates stamps added to image-based PDFs before OCR processing will be OCR-processed as part of the page text, which is correct behavior. If Bates stamps are added after OCR, the stamp text is a separate layer and may not be captured in the OCR text, depending on how the Bates stamping tool works. Privilege review is another consideration — OCR-processed documents that turn out to be privileged still need to be identified, logged, and withheld from production. The OCR makes them searchable, which actually helps identify potentially privileged documents through keyword searches for attorney names, 'privileged and confidential' language, or legal advice indicators.

1Process OCR before Bates numbering in your document processing workflow
2After OCR, run keyword searches to identify potentially relevant and potentially privileged documents
3Verify OCR quality on a sample of processed documents before loading into review platform
4Document your OCR processing approach in the litigation support log for potential testimony

Using OCR for Legal Document Accessibility Requirements

Beyond e-discovery, OCR processing of legal documents has accessibility implications. Court filings, legal briefs, and public records that are image-based PDFs are inaccessible to screen readers used by individuals with visual impairments. Many jurisdictions have accessibility requirements for court-filed documents and government records that mandate text-searchable PDFs. For legal aid organizations, bar associations, and courts publishing documents publicly, OCR processing of scanned archival records and legacy documents creates searchable resources that serve both accessibility requirements and public access goals. A historical court opinion scanned from paper archives becomes fully searchable and accessible once OCR-processed — judges' names, party names, and legal principles can all be found by researchers and practitioners. For law firms maintaining their own document archives, OCR processing older client files and historical contracts makes those documents searchable in the firm's document management system. This is particularly valuable when matters that appeared closed are reopened years later and the relevant historical documents need to be found efficiently.

Frequently Asked Questions

How do I know if a PDF needs OCR processing for legal discovery?

Try to select text in the PDF using your cursor. If you can click and drag to highlight text, the PDF has an existing text layer and is already searchable — no OCR needed. If clicking and dragging selects a rectangular area without highlighting individual words or letters, the PDF is image-based and requires OCR processing. You can also test by pressing Ctrl+F and searching for a word you know appears in the document — if it is found, the PDF is searchable; if not found, OCR is needed.

Will OCR make poor-quality scans searchable?

OCR can process poor-quality scans, but the accuracy will be lower than with clean scans. Common quality issues that reduce OCR accuracy include low resolution (below 200 DPI), skewed or tilted pages, faded or light ink, coffee stains or document damage, and double-sided show-through where text from the reverse side bleeds through. For critical legal documents with quality issues, consider rescanning the original document at 300 DPI before OCR processing. If the original is unavailable, OCR the document anyway and manually verify accuracy on key passages.

Can I OCR documents for discovery if they are password-protected?

You must remove password protection before OCR processing. Use an unlock tool to remove the password, then OCR the unprotected document. For court-ordered discovery, you must comply with production obligations regardless of how the document was originally protected — third-party produced documents that are protected should be addressed through meet-and-confer discussions about production format, and your own documents must be produced in searchable format per standard discovery protocols.

Convert scanned legal documents to searchable PDFs — OCR processing free, no account required.

OCR PDF Free

How-To Guides