Print multiple pdf files in python - python

My process to achieve, is to print multiple pdf files from a folder, closing Adobe Acrobat afterwards and lastly mapping the files to new folders, using another of my scripts.
The Mapping works as intended, but I´m struggling with printing all files. I can either print only one, or none of the files and Adobe still remains open. I've been playing around with asyncio, but I dont know if it gets the job done.
The code is not very well documented or of outstanding quality, it just has to get the task done and willl be very likely never be touched agein. Here it is:
import os
import sys
import keyboard
import asyncio
import psutil
from win32 import win32api, win32print
import map_files
import utility
def prepareFilesToPrint(folder):
# Scans folder for files with naming convention and puts them in a seperate array to print
filesToPrint = []
for file in os.listdir(folder.value):
if utility.checkFileName(file):
filesToPrint.append(file)
return filesToPrint
def preparePrinter():
# Opens the printer and defines attributes such as duplex mode
name = win32print.GetDefaultPrinter()
printdefaults = {"DesiredAccess": win32print.PRINTER_ALL_ACCESS}
handle = win32print.OpenPrinter(name, printdefaults)
attributes = win32print.GetPrinter(handle, 2)
attributes['pDevMode'].Duplex = 2 # Lange Seite spiegeln
win32print.SetPrinter(handle, 2, attributes, 0)
return handle
async def printFiles(filesToPrint):
for file in filesToPrint:
await win32api.ShellExecute(
0, "print", file, '"%s"' % win32print.GetDefaultPrinter(), ".", 0)
def cleanup(handle):
# Closes Adobe after printing ALL files (!working)
win32print.ClosePrinter(handle)
for p in psutil.process_iter():
if 'AcroRd' in str(p):
p.kill()
async def printTaskFiles():
# Iterates over files in downloads folder and prints them if they are task sheets (!working)
os.chdir("C:/Users/Gebker/Downloads/")
filesToPrint = prepareFilesToPrint(utility.Folder.DOWNLOADS)
if filesToPrint.__len__() == 0:
print("No Files to print. Exiting...")
sys.exit()
print("=============================================================")
print("The following files will be printed:")
for file in filesToPrint:
print(file)
print("=============================================================")
input("Press ENTER to print. Exit with ESC")
while True:
try:
if keyboard.is_pressed('ENTER'):
print("ENTER pressed. Printing...")
handle = preparePrinter()
await printFiles(filesToPrint)
cleanup(handle)
print("Done printing. Mapping files now...")
# map_files.scanFolders()
break
elif keyboard.is_pressed('ESC'):
print("ESC pressed. Exiting...")
sys.exit()
except:
break
if __name__ == "__main__":
asyncio.run(printTaskFiles())

Related

python reading header from word docx

I am trying to read a header from a word document using python-docx and watchdog.
What I am doing is, whenever a new file is created or modified the script reads the file and get the contents in the header, but I am getting an
docx.opc.exceptions.PackageNotFoundError: Package not found at 'Test6.docx'
error and I tried everything including opening it as a stream but nothing has worked, and yes the document is populated.
For reference, this is my code.
**main.py**
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import watchdog.observers
import watchdog.events
import os
import re
import xml.dom.minidom
import zipfile
from docx import Document
class Watcher:
DIRECTORY_TO_WATCH = "/path/to/my/directory"
def __init__(self):
self.observer = Observer()
def run(self):
event_handler = Handler()
self.observer.schedule(event_handler,path='C:/Users/abdsak11/OneDrive - Lärande', recursive=True)
self.observer.start()
try:
while True:
time.sleep(5)
except:
self.observer.stop()
print ("Error")
self.observer.join()
class Handler(FileSystemEventHandler):
#staticmethod
def on_any_event(event):
if event.is_directory:
return None
elif event.event_type == 'created':
# Take any action here when a file is first created.
path = event.src_path
extenstion = '.docx'
base = os.path.basename(path)
if extenstion in path:
print ("Received created event - %s." % event.src_path)
time.sleep(10)
print(base)
doc = Document(base)
print(doc)
section = doc.sections[0]
header = section.header
print (header)
elif event.event_type == 'modified':
# Taken any action here when a file is modified.
path = event.src_path
extenstion = '.docx'
base = os.path.basename(path)
if extenstion in base:
print ("Received modified event - %s." % event.src_path)
time.sleep(10)
print(base)
doc = Document(base)
print(doc)
section = doc.sections[0]
header = section.header
print (header)
if __name__ == '__main__':
w = Watcher()
w.run()
Edit:
Tried to change the extension from doc to docx and that worked but is there anyway to open docx because thats what i am finding.
another thing. When opening the ".doc" file and trying to read the header all i am getting is
<docx.document.Document object at 0x03195488>
<docx.section._Header object at 0x0319C088>
and what i am trying to do is to extract the text from the header
You are trying to print the object itself, however you should access its property:
...
doc = Document(base)
section = doc.sections[0]
header = section.header
print(header.paragraphs[0].text)
according to https://python-docx.readthedocs.io/en/latest/user/hdrftr.html)
UPDATE
As I played with python-docx package, it turned out that PackageNotFoundError is very generic as it can occur simply because file is not accessible by some reason - not exist, not found or due to permissions, as well as if file is empty or corrupted. For example, in case of watchdog, it may very well happen that after triggering "created" event and before creating Document file can be renamed, deleted, etc. And for some reason you make this situation more probable by waiting 10 seconds before creating Document? So, try checking if file exists before:
if not os.path.exists(base):
raise OSError('{}: file does not exist!'.format(base))
doc = Document(base)
UPDATE2
Note also, that this may happen when opening program creates some lock file based on file name, e.g. running your code on linux and opening the file with libreoffice causes
PackageNotFoundError: Package not found at '.~lock.xxx.docx#'
because this file is not docx file! So you should update your filtering condition with
if path.endswith(extenstion):
...

Using Adobe Readers Export as text function in python

I want to convert lots of PDFs into text files.
The formatting is very important and only Adobe Reader seems to get it right (PDFMiner or PyPDF2 do not.)
Is there a way to automate the "export as text" function from Adobe Reader?
The following code will do what you want for one file. I recommend organizing the script into a few little functions and then calling the functions in a loop to process many files. You'll need to install the keyboard library using pip, or some other tool.
import pathlib as pl
import os
import keyboard
import time
import io
KILL_KEY = 'esc'
read_path = pl.Path("C:/Users/Sam/Downloads/WS-1401-IP.pdf")
####################################################################
write_path = pl.Path(str(read_path.parent/read_path.stem) + ".txt")
overwrite_file = os.path.exists(write_path)
# alt -- activate keyboard shortcuts
# `F` -- open file menu
# `v` -- select "save as text" option
# keyboard.write(write_path)
# `alt+s` -- save button
# `ctrl+w` -- close file
os.startfile(read_path)
time.sleep(1)
keyboard.press_and_release('alt')
time.sleep(1)
keyboard.press_and_release('f') # -- open file menu
time.sleep(1)
keyboard.press_and_release('v') # -- select "save as text" option
time.sleep(1)
keyboard.write(str(write_path))
time.sleep(1)
keyboard.press_and_release('alt+s')
time.sleep(2)
if overwrite_file:
keyboard.press_and_release('y')
# wait for program to finish saving
waited_too_long = True
for _ in range(5):
time.sleep(1)
if os.path.exists(write_path):
waited_too_long = False
break
if waited_too_long:
with io.StringIO() as ss:
print(
"program probably saved to somewhere other than",
write_path,
file = ss
)
msg = ss.getvalue()
raise ValueError(msg)
keyboard.press_and_release('ctrl+w') # close the file

Invalid file path or buffer object type: <class 'list'> when trying to loop through files

I have the below code updated; in effort to allow my script to loop through multiple files in a directory (as opposed to one):
#classmethod
def find_file(cls):
all_files = list()
""""Finds the excel file to process"""
archive = ZipFile(config.FILE_LOCATION)
for file in archive.filelist:
if file.filename.__contains__('Horrible Data Site '):
all_files.append(archive.extract(file.filename, config.UNZIP_LOCATION))
return all_files
Before declaring 'all files = list()' above in my find_files method, this was working on one file in the directory. I added the all_files in attempt to allow loop through all files in a directory.
Also, in the below main.py I just added the for right before PENDING_RECORDS for this objective.
"""Start Point"""
from data.find_pending_records import FindPendingRecords
from vital.vital_entry import VitalEntry
from time import sleep
if __name__ == "__main__":
try:
for PENDING_RECORDS in FindPendingRecords().get_excel_data():
# Do operations on PENDING_RECORDS
# Reads excel to map data from excel to vital
MAP_DATA = FindPendingRecords().get_mapping_data()
# Configures Driver
VITAL_ENTRY = VitalEntry()
# Start chrome and navigate to vital website
VITAL_ENTRY.instantiate_chrome()
# Begin processing Records
VITAL_ENTRY.process_records(PENDING_RECORDS, MAP_DATA)
print (PENDING_RECORDS)
print("All done")
except Exception as exc:
print(exc)
The addition of adding the all_files() and for now outputs the following error in the Anaconda Prompt:
(base) C:\Python>python main.py
Invalid file path or buffer object type: <class 'list'>
this is the config.py
FILE_LOCATION = r"C:\Zip\DATA Docs.zip"
UNZIP_LOCATION = r"C:\Zip\Pending"
VITAL_URL = 'http://horriblewebsite:8080/START'
HEADLESS = False
PROCESSORS = 4
MAPPING_DOC = ".//map/mappingDOC.xlsx"

Python: Write to global text file from within a multiprocessing.Process

I'd like to launch a mp.Process which can write to a text file. But I'm finding that at the end of the script, the data written to the file isn't actually saved to disk. I don't know what's happening. Here's a minimum working example:
import os, time, multiprocessing
myfile = open("test.dat", "w")
def main():
proc = multiprocessing.Process(target=writer)
proc.start()
time.sleep(1)
print "Times up! Closing file..."
myfile.flush()
os.fsync(myfile.fileno())
print "Closing %s" % (myfile)
myfile.close()
print "File closed. Have a nice day!"
print "> cat test.dat"
def writer():
data = "0000"
for _ in xrange(5):
print "Writing %s to %s" % (data, myfile)
myfile.write(str(data) + '\n')
# if you comment me, writing to disk works!
# myfile.flush()
# os.fsync(myfile.fileno())
if __name__ == "__main__":
main()
Does anyone have suggestions? The context is that this Process will be eventually listening for incoming data, so it really needs to run independently of other things happening in the script.
The problem is that you're opening the file in the main process. Open files are not passed to the subprocesses, so you need to open it inside your function.
Also every code outside the function is executed once for each process, so you're overwriting the file multiple times.
def main():
# create the file empty so it can be appended to
open("test.dat", "w").close()
proc = multiprocessing.Process(target=writer)
proc.start()
def writer():
with open('test.dat', 'a') as myfile: # opens the file for appending
...
myfile.write(...)
...
Now, some OSes don't allow a file to be opened by multiple processes at the same time. The best solution is to use a Queue and pass the data to the main process which then writes to the file.

Stop an operation without stopping the module in python

Well, I have made a module that allows you to copy a file to a directory easier. Now, I also have some "try's" and "except's" in there to make sure it doesn't fail in the big messy way and doesn't close the terminal, but I also want it to display different error messages when a wrong string or variable is put in, and end the module, but not the...if I may say, Terminal running it, so I did this:
def copy():
import shutil
import os
try:
cpy = input("CMD>>> Name of file(with extension): ")
open(cpy, "r")
except:
print("ERROR>>> 02x00 No such file")
try:
dri = input("CMD>>> Name of Directory: ")
os.chdir(dri)
os.chdir("..")
except:
print("ERROR>>> 03x00 No such directory")
try:
shutil.copy(cpy, dri)
except:
print("ERROR>>> 04x00 Command Failure")
Problem is that it doesn't end the module if there is no file or directory, only at the finish.
You may be thinking that when an exception is raised, Python just stops what it's doing, but that's not quite true. The except: block actually catches the exception raised, and is supposed to handle it. After an except: block finishes, Python will continue on executing the rest of the code in the file.
In your case, I'd put a return after each print(...). That way, after Python prints out an error message, it will also return from the copy() function rather than continuing to ask for more input.
If you did want to make the module exit on error...
Here's how you'd do it.
def copy():
import shutil
import os
import sys
try:
cpy = input("CMD>>> Name of file(with extension): ")
open(cpy, "r")
except:
sys.exit("ERROR>>> 02x00 No such file")
try:
dri = input("CMD>>> Name of Directory: ")
os.chdir(dri)
os.chdir("..")
except:
sys.exit("ERROR>>> 03x00 No such directory")
try:
shutil.copy(cpy, dri)
except:
sys.exit("ERROR>>> 04x00 Command Failure")
sys.exit(0) (for success) and sys.exit(1) (for failure) are usually used but, since you want to output the error, the above example will output the error string to stderr.
Here's a link for more info on sys.exit().

Categories