Merge PDFs in Ghostscript with Python list of UNC Paths - python

I am trying to build a multipage PDF using Ghostscript by combining a list of single-page PDFs from their UNC file-paths (working in Python 3.7).
Here is the function:
import subprocess
import os
def ghostscript_merge_pdfs(in_PDF_list, out_PDF):
"""some doc string"""
# pdfPathsAsStr = '"' + ' "'.join(f'{pdf}"' for pdf in in_PDF_list)
pdfPathsAsStr = ' '.join(pdf for pdf in in_PDF_list)
print("The 'pdfPathsAsStr' variable is:")
print(pdfPathsAsStr + "\n")
args = [r"\\someDir\subDir\T\Tools\Ghostscript_Tools\GS_Install\gs9.54.0\bin\gswin64c",
'-sDEVICE=pdfwrite',
'-dNOPAUSE',
"-sOUTPUTFILE=" + out_PDF,
pdfPathsAsStr
]
p = subprocess.Popen(args, stdout=subprocess.PIPE)
print("\nCompleted: \n" + str(p.communicate()))
pdf_dir = r"\\someDir\subDir\T\Tools\Ghostscript_Tools\GS_Testing\IndividualPages"
out_pdf_path = os.path.join(pdf_dir, "Combo_PDF.pdf")
pdfs_list = [os.path.join(pdf_dir, "PDF_1.pdf"), os.path.join(pdf_dir, "PDF_2.pdf")]
ghostscript_merge_pdfs(pdfs_list, out_pdf_path)
The script outputs the following (note that slashes in pdfPathsAsStr are not quadruplicated):
The 'pdfPathsAsStr' variable is:
\\someDir\subDir\T\Tools\Ghostscript_Tools\GS_Testing\IndividualPages\PDF_1.pdf \\someDir\subDir\T\Tools\Ghostscript_Tools\GS_Testing\IndividualPages\PDF_2.pdf
Completed:
(b'GPL Ghostscript 9.54.0 (2021-03-30)\nCopyright (C) 2021 Artifex Software, Inc. All rights reserved.\nThis software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:\nsee the file COPYING for details.\nError: /undefinedfilename in (\\\\\\\\someDir\\\\subDir\\\\T\\\\Tools\\\\Ghostscript_Tools\\\\GS_Testing\\\\IndividualPages\\\\PDF_1.pdf \\\\\\\\someDir\\\\subDir\\\\T\\\\Tools\\\\Ghostscript_Tools\\\\GS_Testing\\\\IndividualPages\\\\PDF_2.pdf)\nOperand stack:\n\nExecution stack:\n %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push\nDictionary stack:\n --dict:732/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)--\nCurrent allocation mode is local\nLast OS error: No such file or directory\n', None)
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
I have looked a few places for help with UNC paths in Ghostscript, but can't find much help. I have tried a few variations of pdfPathsAsStr inside the function, with no luck.
What am I doing wrong?

The following command outside python runs for me no problem.
gswin64c -sDEVICE=pdfwrite -o"\\advent\share\Merged.pdf" "\\advent\share\cover.pdf" "\\advent\share\PDF files in a folder.pdf"
Showing that remote windows folders are not a problem for Ghostscript input or output.
Your problem is the way python handles windows pathing, and that can be minimised by reversing the folder names so that only the server name needs a \\prefix in windows.
gswin64c -sDEVICE=pdfwrite -o"\\advent/share/Merged.pdf" "\\advent/share/cover.pdf" "\\advent/share/PDF files in a folder.pdf"
So in python use \\\\ for server prefix when needed but use / in paths to make life easier (yes I know its not best practice, but life is short and its less rsi from the keyboard).
to test what cmd is getting from python just run cmd /k echo "\\\\blah/blah" as your executable command

Related

Python subprocess.run can't find /bin/sh in chroot

(First off, apologies for the roughness of this question's writing-- would love any constructive feedback.)
Ok what I'm doing is a bit involved-- I'm trying to make a Python script that executes Bash scripts that each compile a component of a Linux From Scratch (LFS) system. I'm following the LFS 11.2 book pretty closely (but not 100%, although I've been very careful to check where my deviations break things. If you're familiar with LFS, this is a deviation that breaks things).
Basically, my script builds a bunch of tools (bash, tar, xz, make, gcc, binutils) with a cross compiler, and tells their build systems to install them into a directory lfs/temp-tools. Then the script calls os.chroot('lfs') to chroot into the lfs directory, and immediately resets all the environment variables (most importantly PATH) with:
os.environ = {"PATH" : "/usr/bin:/usr/sbin:/temp-tools/bin", ...other trivial stuff like HOME...}
But after the chroot, my calls of
subprocess.run([f"{build_script_path} >{log_file_path} 2>&1"], shell=True)
are failing with FileNotFoundError: [Errno 2] No such file or directory: '/bin/sh', even though
bin/sh in the chroot directory is a sym link to bash
there's a perfectly good copy of bash in /temp-tools/bin
calling print(os.environ) after the python chroot shows /temp-tools/bin is in PATH
I thought maybe subprocess.run is stuck using the old environment variables, before I reset them upon entering the chroot, but adding env=os.environ to subprocess.run does not help. :/ I'm stuck for now
For context if it helps, here is where the subprocess.run call gets made:
def vanilla_build(target_name, src_dir_name=None):
def f():
nonlocal src_dir_name
if src_dir_name == None:
src_dir_name = target_name
tarball_path = find_tarball(src_dir_name)
src_dir_path = tarball_path.split(".tar")[0]
if "tcl" in src_dir_path:
src_dir_path = src_dir_path.rsplit("-",1)[0]
snap1 = lfs_dir_snapshot()
os.chdir(os.environ["LFS"] + "srcs/")
subprocess.run(["tar", "-xf", tarball_path], check=True, env=os.environ)
os.chdir(src_dir_path)
build_script_path = f"{os.environ['LFS']}root/build-scripts/{target_name.replace('_','-')}.sh"
log_file_path = f"{os.environ['LFS']}logs/{target_name}"
####### The main call #######
proc = subprocess.run([f"{build_script_path} >{log_file_path} 2>&1"],
shell=True, env=os.environ)
subprocess.run(["rm", "-rf", src_dir_path], check=True)
if proc.returncode != 0:
red_print(build_script_path + " failed!")
return
tracked_file_record_path = f"{os.environ['LFS']}logs/tracked/{target_name}"
with open(tracked_file_record_path, 'w') as f:
new_files = lfs_dir_snapshot() - snap1
f.writelines('\n'.join(new_files))
f.__name__ = "build_" + target_name
return f
And how I enter the chroot:
def enter_chroot():
os.chdir(os.environ["LFS"])
os.chroot(os.environ["LFS"])
os.environ = {"HOME" : "/root",
"TERM" : os.environ["TERM"],
"PATH" : "/usr/bin:/usr/sbin:/temp-tools/bin",
"LFS" : '/'}
Thank you! In the meantime I'm going to chop away as much code as possible to isolate the problem to either understand whatever I'm not getting or rewrite this question to be less context specific

Open installed apps on Windows intelligently

I am coding a voice assistant to automate my pc which is running Windows 11 and I want to open apps using voice commands, I don't want to hard code every installed app's .exe path. Is there any way to get a dictionary of the app's name and their .exe path. I am able to get currently running apps and close them using this:
def close_app(app_name):
running_apps=psutil.process_iter(['pid','name'])
found=False
for app in running_apps:
sys_app=app.info.get('name').split('.')[0].lower()
if sys_app in app_name.split() or app_name in sys_app:
pid=app.info.get('pid')
try:
app_pid = psutil.Process(pid)
app_pid.terminate()
found=True
except: pass
else: pass
if not found:
print(app_name + " is not running")
else:
print('Closed ' + app_name)
Possibly using both wmic and use either which or gmc to grab the path and build the dict?
Following is a very basic code, not tested completely.
import subprocess
import shutil
Data = subprocess.check_output(['wmic', 'product', 'get', 'name'])
a = str(Data)
appsDict = {}
x = (a.replace("b\\'Name","").split("\\r\\r\\n"))
for i in range(len(x) - 1):
appName = x[i+1].rstrip()
appPath = shutil.which(appName)
appsDict.update({appName: appPath})
print(appsDict)
Under Windows PowerShell there is a Get-Command utility. Finding Windows executables using Get-Command is described nicely in this issue. Essentially it's just running
Get-Command *
Now you need to use this from python to get the results of command as a variable. This can be done by
import subprocess
data = subprocess.check_output(['Get-Command', '*'])
Probably this is not the best, and not a complete answer, but maybe it's a useful idea.
This can be accomplished via the following code:
import os
def searchfiles(extension, folder):
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}\n")
searchfiles('.exe', 'H:\\')
Inspired from: https://pythonprogramming.altervista.org/find-all-the-files-on-your-computer/

Python script called from PHP can't write a file

I have a problem with converting docx to pdf files in my script.
At first I tried to use a pure php-based solution, described here:
https://stackoverflow.com/a/20035739/12812601
Unfortunatelly this does not work (it creates an empty com object, then throws a fatal error).
So I've tried to use a python script to do this.
I use a great script from here:
https://stackoverflow.com/a/20035739/12812601
So here is a problem.
The Python script standalone (run via a command line) works just fine, and saves the converted pdf. Unfortunatelly when I try to call it via PHP it can't save a converted file.
PHP scripts can create and write files oin the same directory without any problem
This supposed to be a local configuration, so I do not care about any portability
Scripts:
*******PHP*******
<?php
//Script only for testing Python calls, tried different methods
error_reporting(E_ALL);
echo '<h1>Begin</h1>';
echo '<h2>Before call</h2>';
exec ('python dp.py');
echo '<h2>After exec call</h2>';
system('python dp.py');
echo '<h2>After Sys Call</h2>';
passthru('python dp.py');
echo '<h2>After Pass Call</h2>';
$w = get_current_user();
var_dump($w);
?>
*****Python*****
import sys
import os
import comtypes.client
import win32com.client
wdFormatPDF = 17
#static file names for testing
in_file = 'C:\\Users\\fake_user\\OneDrive\\Stuff\\f1.docx'
out_file = 'C:\\Users\\fake_user\\OneDrive\\Stuff\\f3.pdf'
print('BEGIN<br>\n')
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
print('\nOpened Docx\n<br>')
print(in_file);
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
print('\nSaved\n<br>')
doc.Close()
word.Quit()
print('DONE\n')
*****Output from the browser*****
Begin
Before call
After exec call
BEGIN
Opened Docx
C:\Users\fake_user\OneDrive\Stuff\f1.docx
After Sys Call
BEGIN
Opened Docx
C:\Users\fake_user\OneDrive\Stuff\f1.docx
After Pass Call
string(5) "fake_user"
System configuration
Windows 7 Professional Edition Service Pack 1
Apache/2.4.26 (Win32)
OpenSSL/1.0.2l
PHP/7.1.7
Python 3.8.1
I tried to run Apache both as a system service and as a user who owns the OneDrive (name changed to "fake_user" here), so it shouldn't be a permissions issue (I think)
Any help appreciated

Get total length of videos in a particular directory in python

I have downloaded a bunch of videos from coursera.org and have them stored in one particular folder. There are many individual videos in a particular folder (Coursera breaks a lecture into multiple short videos). I would like to have a python script which gives the combined length of all the videos in a particular directory. The video files are .mp4 format.
First, install the ffprobe command (it's part of FFmpeg) with
sudo apt install ffmpeg
then use subprocess.run() to run this bash command:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 -- <filename>
(which I got from http://trac.ffmpeg.org/wiki/FFprobeTips#Formatcontainerduration), like this:
from pathlib import Path
import subprocess
def video_length_seconds(filename):
result = subprocess.run(
[
"ffprobe",
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
"--",
filename,
],
capture_output=True,
text=True,
)
try:
return float(result.stdout)
except ValueError:
raise ValueError(result.stderr.rstrip("\n"))
# a single video
video_length_seconds('your_video.webm')
# all mp4 files in the current directory (seconds)
print(sum(video_length_seconds(f) for f in Path(".").glob("*.mp4")))
# all mp4 files in the current directory and all its subdirectories
# `rglob` instead of `glob`
print(sum(video_length_seconds(f) for f in Path(".").rglob("*.mp4")))
# all files in the current directory
print(sum(video_length_seconds(f) for f in Path(".").iterdir() if f.is_file()))
This code requires Python 3.7+ because that's when text= and capture_output= were added to subprocess.run. If you're using an older Python version, check the edit history of this answer.
Download MediaInfo and install it (don't install the bundled adware)
Go to the MediaInfo source downloads and in the "Source code, All included" row, choose the link next to "libmediainfo"
Find MediaInfoDLL3.py in the downloaded archive and extract it anywhere.
Example location: libmediainfo_0.7.62_AllInclusive.7z\MediaInfoLib\Source\MediaInfoDLL\MediaInfoDLL3.py
Now make a script for testing (sources below) in the same directory.
Execute the script.
MediaInfo works on POSIX too. The only difference is that an so is loaded instead of a DLL.
Test script (Python 3!)
import os
os.chdir(os.environ["PROGRAMFILES"] + "\\mediainfo")
from MediaInfoDLL3 import MediaInfo, Stream
MI = MediaInfo()
def get_lengths_in_milliseconds_of_directory(prefix):
for f in os.listdir(prefix):
MI.Open(prefix + f)
duration_string = MI.Get(Stream.Video, 0, "Duration")
try:
duration = int(duration_string)
yield duration
print("{} is {} milliseconds long".format(f, duration))
except ValueError:
print("{} ain't no media file!".format(f))
MI.Close()
print(sum(get_lengths_in_milliseconds_of_directory(os.environ["windir"] + "\\Performance\\WinSAT\\"
)), "milliseconds of content in total")
In addition to Janus Troelsen's answer above, I would like to point out a small problem I
encountered when implementing his answer. I followed his instructions one by one but had different results on windows (7) and linux (ubuntu). His instructions worked perfectly under linux but I had to do a small hack to get it to work on windows. I am using a 32-bit python 2.7.2 interpreter on windows so I utilized MediaInfoDLL.py. But that was not enough to get it to work for me I was receiving this error at this point in the process:
"WindowsError: [Error 193] %1 is not a valid Win32 application".
This meant that I was somehow using a resource that was not 32-bit, it had to be the DLL MediaInfoDLL.py was loading. If you look at the MediaInfo intallation directory you will see 3 dlls MediaInfo.dll is 64-bit while MediaInfo_i386.dll is 32-bit. MediaInfo_i386.dll is the one which I had to use because of my python setup. I went to
MediaInfoDLL.py (which I already had included in my project) and changed this line:
MediaInfoDLL_Handler = windll.MediaInfo
to
MediaInfoDLL_Handler = WinDLL("C:\Program Files (x86)\MediaInfo\MediaInfo_i386.dll")
I didn't have to change anything for it to work in linux
Nowadays pymediainfo is available, so Janus Troelsen's answer could be simplified.
You need to install MediaInfo and pip install pymediainfo. Then the following code would print you the total length of all video files:
import os
from pymediainfo import MediaInfo
def get_track_len(file_path):
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
return int(track.duration)
return 0
print(sum(get_track_len(f) for f in os.listdir('directory with video files')))
This link shows how to get the length of a video file https://stackoverflow.com/a/3844467/735204
import subprocess
def getLength(filename):
result = subprocess.Popen(["ffprobe", filename],
stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
return [x for x in result.stdout.readlines() if "Duration" in x]
If you're using that function, you can then wrap it up with something like
import os
for f in os.listdir('.'):
print "%s: %s" % (f, getLength(f))
Here's my take. I did this on Windows. I took the answer from Federico above, and changed the python program a little bit to traverse a tree of folders with video files. So you need to go above to see Federico's answer, to install MediaInfo and to pip install pymediainfo, and then write this program, summarize.py:
import os
import sys
from pymediainfo import MediaInfo
number_of_video_files = 0
def get_alternate_len(media_info):
myJson = media_info.to_data()
myArray = myJson['tracks']
for track in myArray:
if track['track_type'] == 'General' or track['track_type'] == 'Video':
if 'duration' in track:
return int(track['duration'] / 1000)
return 0
def get_track_len(file_path):
global number_of_video_files
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
number_of_video_files += 1
if type(track.duration) == int:
len_in_sec = int(track.duration / 1000)
elif type(track.duration) == str:
len_in_sec = int(float(track.duration) / 1000)
else:
len_in_sec = get_alternate_len(media_info)
if len_in_sec == 0:
print("File path = " + file_path + ", problem in type of track.duration")
return len_in_sec
return 0
sum_in_secs = 0.0
os.chdir(sys.argv[1])
for root, dirs, files in os.walk("."):
for name in files:
sum_in_secs += get_track_len(os.path.join(root, name))
hours = int(sum_in_secs / 3600)
remain = sum_in_secs - hours * 3600
minutes = int(remain / 60)
seconds = remain - minutes * 60
print("Directory: " + sys.argv[1])
print("Total number of video files is " + str(number_of_video_files))
print("Length: %d:%02d:%02d" % (hours, minutes, seconds))
Run it: python summarize.py <DirPath>
Have fun. I found I have about 1800 hours of videos waiting for me to have some free time. Yeah sure

Python - How do you run a .py file?

I've looked all around Google and its archives. There are several good articles, but none seem to help me out. So I thought I'd come here for a more specific answer.
The Objective: I want to run this code on a website to get all the picture files at once. It'll save a lot of pointing and clicking.
I've got Python 2.3.5 on a Windows 7 x64 machine. It's installed in C:\Python23.
How do I get this script to "go", so to speak?
=====================================
WOW. 35k views. Seeing as how this is top result on Google, here's a useful link I found over the years:
http://learnpythonthehardway.org/book/ex1.html
For setup, see exercise 0.
=====================================
FYI: I've got zero experience with Python. Any advice would be appreciated.
As requested, here's the code I'm using:
"""
dumpimages.py
Downloads all the images on the supplied URL, and saves them to the
specified output file ("/test/" by default)
Usage:
python dumpimages.py http://example.com/ [output]
"""
from BeautifulSoup import BeautifulSoup as bs
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys
def main(url, out_folder="C:\asdf\"):
"""Downloads all the images at 'url' to /test/"""
soup = bs(urlopen(url))
parsed = list(urlparse.urlparse(url))
for image in soup.findAll("img"):
print "Image: %(src)s" % image
filename = image["src"].split("/")[-1]
parsed[2] = image["src"]
outpath = os.path.join(out_folder, filename)
if image["src"].lower().startswith("http"):
urlretrieve(image["src"], outpath)
else:
urlretrieve(urlparse.urlunparse(parsed), outpath)
def _usage():
print "usage: python dumpimages.py http://example.com [outpath]"
if __name__ == "__main__":
url = sys.argv[-1]
out_folder = "/test/"
if not url.lower().startswith("http"):
out_folder = sys.argv[-1]
url = sys.argv[-2]
if not url.lower().startswith("http"):
_usage()
sys.exit(-1)
main(url, out_folder)
On windows platform, you have 2 choices:
In a command line terminal, type
c:\python23\python xxxx.py
Open the python editor IDLE from the menu, and open xxxx.py, then press F5 to run it.
For your posted code, the error is at this line:
def main(url, out_folder="C:\asdf\"):
It should be:
def main(url, out_folder="C:\\asdf\\"):
Usually you can double click the .py file in Windows explorer to run it. If this doesn't work, you can create a batch file in the same directory with the following contents:
C:\python23\python YOURSCRIPTNAME.py
Then double click that batch file. Or, you can simply run that line in the command prompt while your working directory is the location of your script.
Since you seem to be on windows you can do this so python <filename.py>. Check that python's bin folder is in your PATH, or you can do c:\python23\bin\python <filename.py>. Python is an interpretive language and so you need the interpretor to run your file, much like you need java runtime to run a jar file.
use IDLE Editor {You may already have it} it has interactive shell for python and it will show you execution and result.
Your command should include the url parameter as stated in the script usage comments.
The main function has 2 parameters, url and out (which is set to a default value)
C:\python23\python "C:\PathToYourScript\SCRIPT.py" http://yoururl.com "C:\OptionalOutput\"
If you want to run .py files in Windows, Try installing Git bash
Then download python(Required Version) from python.org and install in the main c drive folder
For me, its :
"C:\Python38"
then open Git Bash and go to the respective folder where your .py file is stored :
For me, its :
File Location : "Downloads"
File Name : Train.py
So i changed my Current working Directory From "C:/User/(username)/" to "C:/User/(username)/Downloads"
then i will run the below command
" /c/Python38/python Train.py "
and it will run successfully.
But if it give the below error :
from sklearn.model_selection import train_test_split
ModuleNotFoundError: No module named 'sklearn'
Then Do not panic :
and use this command :
" /c/Python38/Scripts/pip install sklearn "
and after it has installed sklearn go back and run the previous command :
" /c/Python38/python Train.py "
and it will run successfully.
!!!!HAPPY LEARNING !!!!

Categories