Validate a Locally Stored PDF File

During the test run there is a PDF file created and stored locally on the hard disc. I would like to open that one and read its content.

Solution

Test Studio cannot interact with PDF files opened in a browser - these do not contain DOM (Document Object Model) or HTML, so the Test Studio recorder cannot parse that content. However, a PDF file can be read in a coded step using an external dll to handle the PDF content. There are few steps to go through in order to prepare the coded step to open and read the PDF file.

1.  The third party dll, which you can use for that is called iTextSharp.dll and can be downloaded here.

2.  Copy the unzipped dll into the project root folder and reference this in the Test Studio project from that location.

3.  Create a coded step in your test and add the following usings, or Imports for VB.Net, on top:

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.IO;
Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Imports System.IO

4.  Below is listed sample code snippet to open a PDF file, which is stored locally on the disc, then read and output its content to the test execution log file. To store the text from the PDF is also used the C# StringBuidler Class

// Define the name of the file to open
string fileName = "C:\\pathToYourPDF\\PDFName.pdf";

// Define the file to store the read from PDF content
StringBuilder text = new StringBuilder();

// Verify if the PDF file exists and open it
if (File.Exists(fileName))
    {
    // Initilize the pdfReader
    PdfReader pdfReader = new PdfReader(fileName);

    // Go through the pages of the PDF file, read its content and append it
    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
            string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

            currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
            text.Append(currentText);
        }

    // Output the collected text in the test execution log file
    Log.WriteLine(text.ToString());

    // Close the pdfReader
    pdfReader.Close();
    }
' Define the name of the file to open
Dim fileName AsString = "folder\\pdfFileName.pdf"

' Define the file to store the read from PDF content
Dim text AsNew StringBuilder()

' Verify if the PDF file exists and open it
If File.Exists(fileName)
    Then
        ' Initilize the pdfReader
        Dim pdfReader AsNew PdfReader(fileName)

        ' Go through the pages of the PDF file, read its content and append it
        For page AsInteger = 1 To pdfReader.NumberOfPages

            Dim strategy As ITextExtractionStrategy = New SimpleTextExtractionStrategy()
            Dim currentText AsString = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)

            currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default], Encoding.UTF8, Encoding.[Default].GetBytes(currentText)))
            text.Append(currentText)
        Next
    ' Output the collected text in the test execution log file
    Log.WriteLine(text.ToString())

    ' Close the pdfReader
    pdfReader.Close()
EndIf
In this article
Not finding the help you need? Improve this article