Awesome

PDF Link Check (Python script)

pdf_link_check.py checks the hyperlinks in a Portable Document Format (PDF) file. The script is a command line app.

Release: V1.1.1 2020.1.23

You can either install the dependencies for this script by using PIP and the requirements file or installing each individual dependent module.

Navigate your CLI to the folder containing the repository with the requirements.txt file.
Run the following command:
```
pip install -r requirements.txt
```

The script requires the following dependencies:

Python 3.6 or greater.
Python module: PyPDF2.

Install with PIP: pip install PyPDF2

For more information, see pypi.org.
Python module: Requests

Install with PIP: pip install requests.

For more information, see pypi.org.
Python module: CSV

Part of the Python core packages. No need to install with PIP. CSV stands for comma separated value.

For more information, see CSV File Reading and Writing
Python module: operator

Part of the Python core packages. No need to install with PIP.

For more information, see operator
Python module: Threading

Part of the Python core packages. No need to install with PIP.

For more information, see threading — Thread-based parallelism

Run pdf_link_check.py from your command line:

Open your command line and run: python <path to script>/pdf_link_check.py
The script will ask for the path of the PDF you would like to parse. Enter the absolute path name.<br>On a Windows 10 machine, this might look like: c:\<pathtoyourpdf>/pdffile.pdf
The script will ask for a location and filename where you would like to save the output.<br>On a Windows 10 machine, this might look like: c:\<pathtoyourreport>/pdflinkreport.csv
The script will run. The script displays in the terminal:
- PDF page number
- URI checked
- Response code. You can find more information about response codes at List of HTTP status codes.
- Error information for requests that fail. These are the exceptions raised by the Requests module.
The script will produce an "NA" rather than a response code for URIs that timeout after five seconds. The script will display the capture and display the error code in the terminal.
When the script is done, it saves the result to the pathname that you indicated. You can open the CSV in Microsoft Excel.

From the script directory, run pytest to validate the code. The tests use the PDFs in the data folder.