How to Prepare a PDF for Long-Term Archival Using PDF/A
Digital documents face an underappreciated risk: formats and software change, but records must remain readable for decades. A PDF that looks perfect today may become unreadable in 20 years if it depends on embedded fonts that are no longer licensed, external color profiles that have been removed, or proprietary encryption that newer software cannot process. PDF/A was created specifically to address this problem. PDF/A (ISO 19005) is an archival subset of PDF designed for long-term preservation. It eliminates features that threaten long-term readability — encryption, external references, multimedia content, and JavaScript — while requiring self-contained elements like fully embedded fonts, embedded color profiles, and complete document metadata. The result is a file that can be reliably opened and displayed by any future PDF reader, regardless of what software environment exists decades from now. Government agencies, law firms, healthcare organizations, financial institutions, and academic institutions increasingly mandate PDF/A for permanent records. This guide walks through the process of converting PDFs to PDF/A format, choosing the right conformance level, validating compliance, and managing file sizes when archiving large document collections with tools like LazyPDF.
Understanding PDF/A Conformance Levels
PDF/A is not a single format — it has multiple parts and conformance levels, each suited to different preservation needs. PDF/A-1 (ISO 19005-1, based on PDF 1.4) is the original standard. It has two conformance levels: PDF/A-1b (basic) requires visual reproducibility — the document must look the same when rendered. PDF/A-1a (accessible) adds structural requirements including tagged content and logical reading order, making it suitable for accessibility-compliant archives. PDF/A-2 (ISO 19005-2, based on PDF 1.7) adds support for JPEG 2000 compression, transparency, layers, and digital signatures. This allows smaller file sizes for image-heavy documents. It has three conformance levels: a (accessible), b (basic), and u (unicode) which requires all text to have Unicode mappings for full text extraction. PDF/A-3 (ISO 19005-3) is identical to PDF/A-2 but allows embedding of arbitrary file types (like Excel spreadsheets or XML data) as attachments. This is used for hybrid archival of source data alongside the visual representation. For most government and legal records, PDF/A-1b or PDF/A-2b is appropriate. For accessible archives, PDF/A-1a or PDF/A-2a adds structural tagging requirements. Choose PDF/A-3 only when you need to embed source files alongside the visual PDF.
Converting Documents to PDF/A
There are several paths to creating PDF/A-compliant documents, depending on your starting point and tools available. From Microsoft Word: Install the Microsoft Save as PDF add-in or use the built-in export in Word 2010+. Go to File > Save As > PDF, click Options, and check 'ISO 19005-1 compliant (PDF/A).' This creates a PDF/A-1b compliant document. From Adobe Acrobat Pro: Open the PDF, go to File > Save As Other > Archivable PDF (PDF/A). Choose the conformance level from the dropdown. Acrobat will analyze the document and either save it directly or report issues that prevent conversion (such as encrypted content or non-embeddable fonts). From Ghostscript (command line): Use the -dPDFA=1 flag with -dPDFACompatibilityPolicy=1 to convert any PDF. Ghostscript is free and handles batch conversions through scripts. From LibreOffice: Export to PDF and check 'Archive PDF/A-1a (ISO 19005-1)' in the PDF Options dialog. LibreOffice generates reasonably compliant PDF/A documents for office documents. Note that not all PDFs can be converted to PDF/A automatically. Encrypted PDFs must be decrypted first. PDFs using non-embeddable fonts (where the font license prohibits embedding) will fail conversion. Transparency effects may need to be flattened for PDF/A-1 compliance.
- 1Open your source document in its original application (Word, Excel, InDesign) or open the existing PDF in Adobe Acrobat
- 2Remove any encryption or password protection — encrypted documents cannot be PDF/A compliant
- 3If converting in Word or LibreOffice, use the PDF export dialog and select the PDF/A option before exporting
- 4If converting an existing PDF in Acrobat, use File > Save As Other > Archivable PDF (PDF/A) and select your conformance level
- 5Review any conversion warnings — Acrobat will list elements that could not be made compliant (non-embeddable fonts, RGB color issues)
- 6Validate the resulting file using a PDF/A validator like veraPDF or the PAC checker to confirm compliance
Validating PDF/A Compliance
Creating a PDF/A document does not guarantee it actually conforms — validation is essential. Many tools claim to produce PDF/A output but have subtle non-conformances that only a dedicated validator will catch. veraPDF is the industry-standard open-source PDF/A validator, maintained by the PDF Association. It is available as a free download and can validate against all PDF/A conformance levels. Run your converted document through veraPDF and review the detailed compliance report. Common failures include: fonts not fully embedded, missing or incorrect color space specifications, metadata schema violations, and use of prohibited features like JavaScript or encryption. PAC 2021 (PDF Accessibility Checker) also includes PDF/A validation alongside accessibility checks — useful when targeting PDF/A-1a or PDF/A-2a conformance. Adobe Acrobat Pro has a built-in PDF Standards panel (View > Show/Hide > Navigation Panes > Standards) that shows conformance status and can provide a detailed preflight report. For batch validation in automated workflows, veraPDF's command-line interface can process entire folders of documents and generate CSV or XML reports, making it suitable for integration into document management systems.
Managing File Sizes in PDF/A Archives
A common concern with PDF/A archival is file size. Embedded fonts, full-resolution images, and comprehensive metadata can make PDF/A files larger than their source PDFs. For large archives containing thousands of documents, storage management becomes important. PDF/A-2 and PDF/A-3 support JPEG 2000 compression, which can significantly reduce file sizes compared to the JPEG or CCITT compression permitted in PDF/A-1. If file size is critical and you control the conversion pipeline, targeting PDF/A-2b instead of PDF/A-1b can yield substantially smaller files for image-heavy documents. For text-heavy documents (reports, contracts, policies), the font embedding required by PDF/A adds modest overhead — typically 50-300KB per unique font. Subsetting fonts (embedding only the characters used) is permitted in PDF/A and dramatically reduces this overhead. Before archiving, use LazyPDF's compress tool to reduce image sizes within the PDF while maintaining compliance. After compression, re-validate with veraPDF to confirm the compressed file still meets PDF/A requirements. Note that some aggressive compression settings can strip metadata or alter color spaces in ways that break compliance, so validation after compression is essential. For archives consisting of many related documents, LazyPDF's merge tool can combine them into a single PDF/A-compliant file with a table of contents, which can be more efficient to store and distribute than hundreds of small files.
Metadata Requirements for Archival PDFs
PDF/A requires specific document metadata to be present and conformant. Missing or malformed metadata is one of the most common causes of PDF/A validation failure. Required metadata in PDF/A includes: document title (must match the Title field in Document Properties), creation date, modification date, creator application, and the PDF/A conformance level declaration in XMP metadata. The XMP metadata must include a pdfaid:conformance and pdfaid:part field declaring the conformance level. In Adobe Acrobat, go to File > Properties > Description to set the Title, Author, Subject, and Keywords. These are embedded as both the traditional PDF DocInfo dictionary and XMP metadata. Acrobat automatically adds the PDF/A declaration when saving in PDF/A format. For batch metadata management, Exiftool (free, cross-platform command-line tool) can read and write PDF metadata fields in bulk. This is invaluable when adding consistent metadata to large document collections before archival. Organizations with formal records management programs should establish a metadata schema — standardizing which fields are required, controlled vocabularies for subject and keywords, and author format conventions — before beginning large-scale archival projects.
Frequently Asked Questions
What is the difference between PDF and PDF/A?
PDF/A is a restricted subset of PDF designed specifically for long-term archival. Regular PDF supports features like encryption, JavaScript, external links to resources outside the file, and multimedia — all of which can cause long-term readability problems. PDF/A prohibits these features and requires fully embedded fonts, embedded color profiles, and comprehensive document metadata, ensuring the document can be rendered identically by any future PDF reader.
Do I need to convert existing PDFs to PDF/A, or can I just change the file extension?
You must properly convert the file — simply renaming it does not change the format. PDF/A conversion involves embedding all fonts, converting or embedding color profiles, removing prohibited features like encryption and JavaScript, and adding the required PDF/A conformance declaration to the XMP metadata. A file named .pdf but not converted will fail PDF/A validation.
Which PDF/A level should I use for government records?
Most government records programs specify PDF/A-1b or PDF/A-2b as the minimum requirement. If accessibility compliance is also required (for example under Section 508 in the US or EN 301 549 in the EU), use PDF/A-1a or PDF/A-2a, which additionally require tagged document structure and Unicode text mapping. Check the specific mandate from your agency or jurisdiction, as requirements vary.
Can password-protected PDFs be converted to PDF/A?
Not directly. Encryption is one of the prohibited features in PDF/A. You must first remove the password protection, then convert to PDF/A. Use a tool like LazyPDF's unlock tool (for user password removal) or provide the owner password to a PDF tool to remove restrictions before attempting PDF/A conversion.
How do I batch convert hundreds of PDFs to PDF/A?
For batch conversion, Ghostscript (free, command-line) with the -dPDFA flag is the most accessible option. Adobe Acrobat Pro's Action Wizard supports batch PDF/A conversion with a GUI. For enterprise scale, tools like callas pdfToolbox or Enfocus PitStop offer robust batch processing with detailed compliance reporting. After batch conversion, run veraPDF in batch mode to validate all output files automatically.