I was on the hunt for a solution to automatically remove passwords from PDF files before they get processed and cataloged by Paperless-ngx. My bank statements come with a password, and I needed a way to automate the removal of these passwords. Here’s how I did it.
Step-by-Step Guide
1. Set Up Environment Variables and Path Redirection
First, you need to add the following variable and path redirection. This example is from my Unraid installation, but Paperless-ngx can be installed on any system. Just ensure you have this variable and path set up:
<Config Name="PAPERLESS_PRE_CONSUME_SCRIPT" Target="PAPERLESS_PRE_CONSUME_SCRIPT" Default="" Mode="" Description="" Type="Variable" Display="always" Required="false" Mask="false">/usr/src/paperless/scripts/removepassword.py</Config>
<Config Name="Script Path" Target="/usr/src/paperless/scripts" Default="" Mode="rw" Description="" Type="Path" Display="always" Required="false" Mask="false">/mnt/cache/appdata/paperless-ngx/scripts/</Config>

2. Create and Place the Script File
Create a file named removepasswords.py
in the scripts folder (as specified in the path above). Here is the content of the file:
#!/usr/bin/env python
import pikepdf
import os
def unlock_pdf(file_path):
password = None
print("reading passwords")
with open("/usr/src/paperless/scripts/passwords.txt", "r") as f:
passwords = f.readlines()
for p in passwords:
password = p.strip()
try:
with pikepdf.open(file_path, password=password, allow_overwriting_input=True) as pdf:
# print("password is working:" + password)
print("unlocked succesfully")
pdf.save(file_path)
break
except pikepdf.PasswordError:
# print("password isn't working:" + password)
continue
if password is None:
print("Empty password file")
def is_pdf(file_path):
return file_path.lower().endswith(".pdf")
def is_pdf_encrypted(file_path):
try:
with pikepdf.open(file_path) as pdf:
return pdf.is_encrypted
except:
return True
file_path = os.environ.get('DOCUMENT_WORKING_PATH')
if is_pdf(file_path) and is_pdf_encrypted(file_path):
print("unlocking pdf")
unlock_pdf(file_path)
else:
print("not pdf / not encrypted")
3. Create a Password File
In the same scripts folder, create a file called passwords.txt
and write your passwords in this file, one per line.
Important Tip
Don’t create these files in Windows. Linux expects a different type of line breaks. Try to create them in Linux, or, if you can’t or don’t know how, at least use Notepad++ and set up Linux-type line breaks in it.
Credit
Credit goes to this post for the original idea. You can refer to it for more information about the different configuration options.
Summary
By following these steps, you can automate the removal of passwords from PDF files before they are processed and cataloged by Paperless-ngx. This method ensures that your documents are accessible and organized without manual intervention.
Pingback: Top 10 Home Assistant Integrations I Recommend - Dr. Smart Home