WebSite-Watcher and PDF files

In WebSite-Watcher 4.x, PDF files were handled as binary files. You only got a notification when a PDF file was changed, but it was not possible to highlight changes in the text.

In a previous post I talked about the new Plugin system of WebSite-Watcher 5.0. WebSite-Watcher 5 contains a pre-defined PDF Plugin that can extract the text content of a PDF file and convert it into a readable HTML file. That means that you can check PDF files like normal web pages, search for keywords, highlight changes and use the rest of features which are available for web page monitoring. This was a very often requested feature that is now finally implemented in version 5.

 

Basic program configuration

For the PDF-to-text conversion, the third party tool XPDF is required and must be downloaded and installed separately. Then you have to enter the path to the file pdftotext.exe in the WebSite-Watcher program configuration.

Program configuration for PDF files

 

Bookmark configuration

For each PDF bookmark, you have to make the following configuration:

  1. Create a new bookmark and enter the address of the PDF file into the URL field
  2. Select the “Advanced” tab
  3. Select “Plugins” on the left side
  4. Click the “Select Plugin” button
  5. Click the radio button “Compatible”
  6. Select the PDF Plugin from the list

Select PDF Plugin

 

Available PDF Plugins

There are currently two PDF Plugins available:

  1. Conversion with Layout
    This Plugin tries to keep the layout of the PDF content what might be useful when a PDF file contains tables. The converted page will probably have a horizontal scrollbar and you have to scroll horizontally to see the whole content of a paragraph.
  2. Simple text extraction
    This Plugin performs a simple text extraction and you might loose some formattings, but you should never see a horizontal scrollbar when viewing the converted document.

Both Plugins have advantages and disadvantages (dependent on the XPDF tool), you have to test and select the Plugin individually for each bookmark.

See also: http://www.aignes.com/wswhelp/topics/pdf-files.htm

Leave a Reply