How to Extract Images from a PDF on Linux

Extracting embedded images from a PDF on Linux can be done several ways. If you need a quick extraction of all images from a document and want a simple visual interface, a browser-based tool works immediately without installation. If you prefer command-line tools or need to process PDFs in a script, Linux's Poppler utilities include pdfimages, which is specifically designed for image extraction. Images embedded in PDFs come in several forms: photographs, illustrations, diagrams, charts, and scanned document images. All of these can be extracted and saved as separate image files — JPEGs, PNGs, or other formats depending on the original encoding. This guide covers both the browser-based approach and the command-line approach, with practical guidance on when each is appropriate.

Extracting Images via Browser (No Installation Required)

The browser-based method is the fastest path to extracted images on any Linux desktop. No packages to install, no dependencies to resolve — just a browser and an internet connection.

1Open your browser (Firefox, Chromium, or any browser available on your system)
2Navigate to lazy-pdf.com/extract-images
3Click the upload area or drag your PDF into the tool
4Wait for the tool to process the PDF and identify all embedded images
5Review the extracted images in the interface
6Download individual images or all images at once as a ZIP archive
7Extract the ZIP to your preferred directory using: unzip images.zip -d ./extracted-images/

Using pdfimages (Poppler) on the Command Line

pdfimages is part of the Poppler PDF utilities, a standard package in most Linux distributions. It extracts all embedded images from a PDF and saves them as separate files. Install Poppler utilities: `sudo apt install poppler-utils` (Ubuntu/Debian) or `sudo dnf install poppler-utils` (Fedora). Basic usage: `pdfimages input.pdf output-prefix` — this extracts all images and saves them with the format output-prefix-000.ppm, output-prefix-001.ppm, etc. PPM is a raw image format; to get JPEG or PNG instead, add the `-j` flag for JPEG or `-png` for PNG. For JPEG: `pdfimages -j input.pdf output-prefix` For PNG: `pdfimages -png input.pdf output-prefix` To extract from specific pages only: `pdfimages -f 3 -l 7 input.pdf output-prefix` (pages 3 through 7)

1Install poppler-utils: sudo apt install poppler-utils
2Navigate to the directory containing your PDF
3Run: pdfimages -png document.pdf extracted
4List the output: ls -la extracted-*.png
5View an image: eog extracted-000.png (or use any image viewer)

Comparing pdfimages and Browser-Based Extraction

pdfimages and browser-based tools extract images differently, which can matter for certain use cases. pdfimages extracts images exactly as they're encoded in the PDF at their native resolution and color depth. It doesn't apply any compression or quality changes — you get the image data exactly as stored. This is ideal when you need the highest quality versions of embedded images. Browser-based tools like LazyPDF extract images and may apply some processing before delivery. For most use cases — getting photos out of a document, extracting charts or diagrams — the results are equivalent. For scientific or medical PDFs where the exact image data matters (research figures, medical imaging), pdfimages is the better choice because it guarantees no quality modification. For casual use (getting photos out of a brochure, extracting diagrams from a manual), either approach works well.

Handling PDFs With Complex Image Encoding

Some PDFs store images in ways that make extraction more complex. Scanned PDFs contain the entire page as a single image, so pdfimages extracts the whole page scan rather than individual elements. PDFs with transparency (alpha channels) require handling the alpha mask separately from the color data, which pdfimages handles automatically when you use the -png flag. For PDFs where images are part of complex graphical constructs (charts built from vector graphics rather than embedded images, for example), extraction may not be possible in the traditional sense — the chart is vector data, not an embedded image. You'd need to render the page and screenshot the chart in that case. The browser tool at LazyPDF handles alpha channel transparency correctly, combining RGB color data with alpha masks to produce clean PNG images with transparency where the original image had it.

Automating Batch Image Extraction on Linux

One of Linux's advantages is easy scripting. If you need to extract images from many PDFs — processing a directory of documents, for example — a simple shell script handles this efficiently: ```bash for pdf in /path/to/pdfs/*.pdf; do basename=$(basename "$pdf" .pdf) mkdir -p "/output/$basename" pdfimages -png "$pdf" "/output/$basename/img" done ``` This script loops through all PDFs in a directory, creates a subdirectory for each document's images, and extracts all images as PNGs into that directory. Adjust paths as needed for your use case. For batch processing in a CI/CD pipeline or automated document workflow, pdfimages is the appropriate tool. For interactive, occasional extraction, the browser tool is faster.

Frequently Asked Questions

Why does pdfimages show no images found for some PDFs?

A PDF may report no embedded images if all visual content is vector graphics rather than rasterized images. Vector elements (drawn shapes, vector illustrations, text rendered as paths) aren't 'images' in the PDF sense — they're mathematical instructions to draw lines and shapes. pdfimages can only extract rasterized image objects. To capture vector content, you'd need to render the page and save the render.

The extracted images have PPM format from pdfimages — how do I convert them to JPEG?

Use the -j flag with pdfimages to extract directly as JPEG: pdfimages -j input.pdf output-prefix. If you already have PPM files, convert with ImageMagick: mogrify -format jpg *.ppm (installs with sudo apt install imagemagick). Or use the -png flag for PNG output: pdfimages -png input.pdf output-prefix.

Can I extract images from a specific page range rather than the entire PDF?

Yes, pdfimages supports page range specification with the -f (first page) and -l (last page) flags. For example: pdfimages -f 5 -l 10 -png document.pdf output extracts images from pages 5 through 10 only. The browser tool extracts from the entire document by default.

Are extracted images at their original resolution or reduced quality?

pdfimages extracts at the exact resolution they were embedded — no quality change. Images embedded at 300 DPI are extracted at 300 DPI. Images embedded at 72 DPI are extracted at 72 DPI. The PDF itself doesn't determine the resolution of embedded images — that's set when the PDF is created. If original images look low quality when extracted, they were embedded at low resolution.

Pull all images out of your PDF instantly — works in any Linux browser with no installation required.

Extract Images from PDF

How-To Guides