Removing Passwords Automatically in Paperless-ngx

Removing Passwords Automatically in Paperless-ngx

I was on the hunt for a solution to automatically remove passwords from PDF files before they get processed and cataloged by Paperless-ngx. My bank statements come with a password, and I needed a way to automate the removal of these passwords. Here’s how I did it.

Step-by-Step Guide

1. Set Up Environment Variables and Path Redirection

First, you need to add the following variable and path redirection. This example is from my Unraid installation, but Paperless-ngx can be installed on any system. Just ensure you have this variable and path set up:

<Config Name="PAPERLESS_PRE_CONSUME_SCRIPT" Target="PAPERLESS_PRE_CONSUME_SCRIPT" Default="" Mode="" Description="" Type="Variable" Display="always" Required="false" Mask="false">/usr/src/paperless/scripts/removepassword.py</Config>
<Config Name="Script Path" Target="/usr/src/paperless/scripts" Default="" Mode="rw" Description="" Type="Path" Display="always" Required="false" Mask="false">/mnt/cache/appdata/paperless-ngx/scripts/</Config>

2. Create and Place the Script File

Create a file named removepasswords.py in the scripts folder (as specified in the path above). Here is the content of the file:

#!/usr/bin/env python
import pikepdf
import os

def unlock_pdf(file_path):
    password = None
    print("reading passwords")
    with open("/usr/src/paperless/scripts/passwords.txt", "r") as f:
        passwords = f.readlines()
    for p in passwords:
        password = p.strip()
        try:
            with pikepdf.open(file_path, password=password, allow_overwriting_input=True) as pdf:
                # print("password is working:" + password)
                print("unlocked succesfully")
                pdf.save(file_path)
                break
        except pikepdf.PasswordError:
            # print("password isn't working:" + password)
            continue
    if password is None:
        print("Empty password file")

def is_pdf(file_path):
    return file_path.lower().endswith(".pdf")

def is_pdf_encrypted(file_path):
    try:
        with pikepdf.open(file_path) as pdf:
            return pdf.is_encrypted
    except:
        return True

file_path = os.environ.get('DOCUMENT_WORKING_PATH')

if is_pdf(file_path) and is_pdf_encrypted(file_path):
    print("unlocking pdf")
    unlock_pdf(file_path)
else:
    print("not pdf / not encrypted")

3. Create a Password File

In the same scripts folder, create a file called passwords.txt and write your passwords in this file, one per line.

Important Tip

Don’t create these files in Windows. Linux expects a different type of line breaks. Try to create them in Linux, or, if you can’t or don’t know how, at least use Notepad++ and set up Linux-type line breaks in it.

Credit

Credit goes to this post for the original idea. You can refer to it for more information about the different configuration options.

Summary

By following these steps, you can automate the removal of passwords from PDF files before they are processed and cataloged by Paperless-ngx. This method ensures that your documents are accessible and organized without manual intervention.


1 Comment

Leave a Reply