New to Telerik Document Processing? Download free 30-day trial

Extracting Images from a PDF Document and Saving Them as Image Files

Environment

Version Product Author
2023.3.1106 RadPdfProcessing Desislava Yordanova

Description

This article demonstrates how to extract the images from a PDF document and save them as separate image files.

Solution

You can use the following code snippet to extract images from each page of a PDF document and save them as image files:

static void Main(string[] args)
{
    List<byte[]> extractedImages = new List<byte[]>();

    PdfFormatProvider formatProvider = new PdfFormatProvider();
    string filePath = @"..\..\sample.pdf";
    using (FileStream pdfStream = File.Open(filePath, FileMode.Open))
    {
        RadFixedDocument document = formatProvider.Import(pdfStream);

        // Iterate through each page in the document
        foreach (RadFixedPage page in document.Pages)
        {
            foreach (var contentElement in page.Content)
            {
                Telerik.Windows.Documents.Fixed.Model.Objects.Image imageContent = contentElement as Telerik.Windows.Documents.Fixed.Model.Objects.Image;
                if (imageContent != null)
                {
                    EncodedImageData imageData = imageContent.ImageSource.GetEncodedImageData();
                    byte[] imageBytes = imageData.Data;
                    extractedImages.Add(imageBytes);

                    SaveImageToFile(imageContent, imageData);
                }
            }
        }
    }
}

static int counter = 0;
private static void SaveImageToFile(Image image, EncodedImageData encodedImageData)
{
    string outputDirectoryName = @"..\..\{sample folder here}";
    bool isTransperant = image.ImageSource.GetEncodedImageData().AlphaChannel != null;
    if (encodedImageData.Filters.Contains("DCTDecode") && !isTransperant)
    {
        File.WriteAllBytes(Path.Combine(outputDirectoryName, $"fileName{++counter}.jpeg"), encodedImageData.Data);
    }
    else if (encodedImageData.Filters.Contains("FlateDecode") | isTransperant)
    {
        BitmapSource bitmapSource = image.GetBitmapSource();
        using (FileStream fileStream = new FileStream(Path.Combine(outputDirectoryName, $"fileName{++counter}.png"), FileMode.Create))
        {
            BitmapEncoder encoder = new PngBitmapEncoder();
            encoder.Frames.Add(BitmapFrame.Create(bitmapSource));
            encoder.Save(fileStream);
        }
    }
    else if (encodedImageData.Filters.Contains("JPXDecode"))
    {
        File.WriteAllBytes(Path.Combine(outputDirectoryName, $"fileName{++counter}.jp2"), encodedImageData.Data);
    }
}

Make sure to replace the filePath variable with the actual path to your PDF document.

Please note that the above code snippet regarding to FlateDecode filter is compatible with .NET Framework because the image data extracted from the PDF is encoded. If you want to use the image data, you will need to decode it and then encode it with the desired image format.

Keep in mind that the current implementation of the .NET Standard version of the PdfProcessing library doesn't provide an option to get the decoded image data of images with the FlateDecode filter applied. However, you can create a custom method to decode the flate encoded data.

For more information about converting images and scaling their quality, refer to our online documentation: Cross-Platform Support.

In this article