Python script violates the command-line character limit - python

I am using a Python script to batch convert many images in different folders into single pdfs (with https://pypi.org/project/img2pdf/):
import os
import subprocess
import img2pdf
from shutil import copyfile
def main():
folders = [name for name in os.listdir(".") if os.path.isdir(name)]
for f in folders:
files = [f for f in os.listdir(f)]
p = ""
for ffile in files:
p += f+'\\' + ffile + " "
os.system("py -m img2pdf *.pn* " + p + " --output " + f + "\combined.pdf")
if __name__ == '__main__':
main()
However, despite running the command via Powershell on Windows 10, and despite using very short filenames, when the number of images is very high (eg over 600 or so), Powershell gives me the error "The command line is too long" and it does not create the pdf. I know there is a command-line string limitation (https://learn.microsoft.com/en-us/troubleshoot/windows-client/shell-experience/command-line-string-limitation), but I also know that for powershell this limit is higher (Powershell to avoid cmd 8191 character limit), and I can't figure out how to fix the script. I would like to ask you if you could help me fix the script to avoid violating the character limit. Thank you
PS: I use the script after inserting it in the parent folder that contains the folders with the images; then in each subfolder the output pdf file is created.

Using img2pdf library you can use this script:
import img2pdf
import os
for r, _, f in os.walk("."):
imgs = []
for fname in f:
if fname.endswith(".jpg") or fname.endswith(".png"):
imgs.append(os.path.join(r, fname))
if len(imgs) > 0:
with open(r+"\output.pdf","wb") as f:
f.write(img2pdf.convert(imgs))

Related

white space in filename python 3.4.2

I have created this small prog to search all PDF's in a directory, determine if they are searchable or not and then move them to the appropriate directory.
I am new to Python and it is probably not the best way but it does work until the file name has White Space in it and I get the following returned.
Any help would be appreciated.
>>> os.system("pdffonts.exe " + pdfFile + "> output.txt")
99
import os
import glob
import shutil
directory = os.chdir("C:\MyDir") # Change working directory
fileDir = glob.glob('*.pdf') # Create a list of all PDF's in declared directory
numFiles = len(fileDir) # Lenght of list
startFile = 0 # Counter variable
seekWord = "TrueType"
while startFile < numFiles:
pdfFile=fileDir[startFile]
os.system("pdffonts.exe " + pdfFile + "> output.txt")
file1output = open("output.txt","r")
fileContent = file1output.read()
if seekWord in fileContent:
shutil.move(pdfFile , "NO_OCR")
else: shutil.move(pdfFile, "OCR")
startFile = startFile + 1
os.system() uses the shell to execute your command. You'd have to quote your filename for the shell to recognise spaces as part of the file, you could do so with the shlex.quote() function:
os.system("pdffonts.exe " + shlex.quote(pdfFile) + "> output.txt")
However, there is no reason at all to use os.system() and the shell. You should use the subprocess.run() function and configure that to pass back the output without using redirection or a shell:
import subprocess
seekWord = b"TrueType"
for pdfFile in fileDir:
result = subprocess.run(["pdffonts.exe", pdfFile], stdout=subprocess.PIPE)
fileContent = result.stdout
if seekWord in fileContent:
# ...
Because pdfFile is passed to pdffonts.exe directly there is no need to worry about a shell parsing and whitespace no longer matters.
Note that I changed seekWord to be a bytes literal instead as result.stdout is a bytes value (no need to try to decode the result to Unicode here).
It seems the problem doesn't come from python, but the Windows shell. You need to enclose in quotation mark. As I don't have your program pdffonts.exe, I cannot debug. I also made your code more pythonic
import os
import glob
import shutil
directory = os.chdir("C:\MyDir") # Change working directory
fileDir = glob.glob('*.pdf') # Create a list of all PDF's in declared directory
seekWord = "TrueType"
for pdfFile in fileDir:
os.system('pdffonts.exe "{0}"> output.txt'.format(pdfFile))
file1output = open("output.txt","r")
fileContent = file1output.read()
if seekWord in fileContent:
shutil.move(pdfFile , "NO_OCR")
else:
shutil.move(pdfFile, "OCR")

Using terminal command to search through files for a specific string within a python script

I have a parent directory, and I'd like to go through that directory and grab each file with a specific string for editing in python. I have been using grep -r 'string' filepath in terminal, but I want to be able to do everything using python. I'm hoping to get all the files into an array and go through each of them to edit them.
Is there a way to do this by only running a python script?
changing current folder to parent
import os
os.chdir("..")
changing folder
import os
os.chdir(dir_of_your_choice)
finding files with a rule in the current folder
import glob
import os
current_dir = os.getcwd()
for f in glob.glob('*string*'):
do_things(f)
import os
#sourceFolder is the folder you're going to be looking inside for backslashes are a special character in python so they're escaped as double backslashes
sourceFolder = "C:\\FolderBeingSearched\\"
myFiles = []
# Find all files in the directory
for file in os.listdir(sourceFolder):
myFiles.append(file)
#open them for editing
for file in myFiles:
try:
open(sourceFolder + file,'r')
except:
continue
#run whatever code you need to do on each open file here
print("opened %s" % file)
EDIT: If you want to separate all files that contain a string (this just prints the list at the end currently):
import os
#sourceFolder is the folder you're going to be looking inside for backslashes are a special character in python so they're escaped as double backslashes
sourceFolder = "C:\\FolderBeingSearched\\"
myFiles = []
filesContainString = []
stringsImLookingFor = ['first','second']
# Find all files in the directory
for file in os.listdir(sourceFolder):
myFiles.append(file)
#open them for editing
for file in myFiles:
looking = sourceFolder + file
try:
open(looking,'r')
except:
continue
print("opened %s" % file)
found = 0
with open(looking,encoding="latin1") as in_file:
for line in in_file:
for x in stringsImLookingFor:
if line.find(x) != -1:
#do whatever you need to do to the file or add it to a list like I am
filesContainString.append(file)
found = 1
break
if found:
break
print(filesContainString)

Python: Trying to put the contents of a folder into a text file:

I'm in the process of writing a python script that takes two arguments that will allow me to output the contents of a folder to a text file for me to use for another process. The snippet of I have is below:
#!/usr/bin/python
import cv2
import numpy as np
import random
import sys
import os
import fileinput
#Variables:
img_path= str(sys.argv[1])
file_path = str(sys.argv[2])
print img_path
print file_path
cmd = 'find ' + img_path + '/*.png | sed -e "s/^/\"/g;s/$/\"/g" >' + file_path + '/desc.txt'
print "command: ", cmd
#Generate desc.txt file:
os.system(cmd)
When I try and run that from my command line, I get the following output, and I have no idea how to fix it.
sh: 1: s/$//g: not found
I tested the command I am using by running the following command in a fresh terminal instance, and it works out fine:
images/*.png | sed -e "s/^/\"/g;s/$/\"/g" > desc.txt
Can anyone see why my snippet isn't working? When I run it, I get an empty file...
Thanks in advance!
its not sending the full text for your regular expression through to bash because of how python processes and escapes string content, so the best quickest solution would be to just manually escape the back slashes in the string, because python thinks they currently are escape codes. so change this line:
cmd = 'find ' + img_path + '/*.png | sed -e "s/^/\"/g;s/$/\"/g" >' + file_path + '/desc.txt'
to this:
cmd = 'find ' + img_path + '/*.png | sed -e "s/^/\\"/g;s/$/\\"/g" >' + file_path + '/desc.txt'
and that should work for you.
although, the comment on your question has a great point, you could totally just do it from python, something like:
import os
import sys
def main():
# variables
img_path= str(sys.argv[1])
file_path = str(sys.argv[2])
with open(file_path,'w') as f:
f.writelines(['{}\n'.format(line) for line in os.listdir(img_path) if line.endswith('*.png')])
if __name__ == "__main__":
main()
I fully agree with Kyle. My recommendation is to do using only python code better than call bash commands from your code. Here it is my recommended code, it is longer and not as optimal than the aforementioned one, but IMHO it is a more easy to understand solution.
#!/usr/bin/python
import glob
import sys
import os
# Taking arguments
img_path = str(sys.argv[1])
file_path = str(sys.argv[2])
# lets put the target filename in a variable (it is better than hardcoding it)
file_name = 'desc.txt'
# folder_separator is used to define how your operating system separates folders (unix / and windows \)
folder_separator = '\\' # Windows folders
# folder_separator = '/' # Unix folders
# better if you make sure that the target folder exists
if not os.path.exists(file_path):
# if it does not exist, you create it
os.makedirs(file_path)
# Create the target file (write mode).
outfile = open(file_path + '/' + file_name, 'w')
# loop over folder contents
for fname in glob.iglob("%s/*" % img_path):
# for every file found you take only the name (assuming that structure is folder/file.ext)
file_name_in_imgPath = fname.split('\\')[1]
# we want to avoid to write 'folders' in the target file
if os.path.isfile(file_name_in_imgPath):
# write filename in the target file
outfile.write(str(file_name_in_imgPath) + '\n')
outfile.close()

Need to process all files in a directory, but am only getting one

I have a Python script that is successfully processing single files. I am trying to write a for loop to have it get all the files in a directory that the user inputs. However, it is only processing one of the files in the directory.
The code below is followed by the rest of the script that does the analysis on data. Do I need to somehow close the for loop?
import os
print "Enter the location of the files: "; directory = raw_input()
path = r"%s" % directory
for file in os.listdir(path):
current_file = os.path.join(path, file)
data = open(current_file, "rb")
# Here's an abridged version of the data analysis
for i in range(0, 10):
fluff = data.readline()
Input_Parameters = fluff.split("\t")
output.write("%f\t%f\t%f\t%f\t%.3f\t%.1f\t%.2f\t%.2f\t%s\n" % (Voc, Isc, Vmp, Imp, Pmax, 100 * Eff, IscErr, 100 * (1 - (P2 / Pmax)), file))
data.close()
In general, if something does not work, you can try to get something simpler working. I removed the data analysis part and kept the code below. This works with me. I noticed that if I have a directory in my path, the open will fail. I am not sure this is the case with you as well.
import os
import sys
path = '.'
for file in os.listdir(path):
current = os.path.join(path, file)
if os.path.isfile(current):
data = open(current, "rb")
print len(data.read())
The current code in your answer looks basically OK to me. I noticed that it's writing to the same output file for each of the files it processes, so that may lead you to think it's not processing them all.
for you debugging:
for file in os.listdir(path):
current_file = os.path.join(path, file)
print current_file
check the output of current_file
BTW: are you indent your code by tab?
because there are different indent length in your code.
This is bad style

Write a simple python script to convert all .wav files in a specific folder to .mp3 using lame

I'd like to write a simple script to convert a few dozen .wav files I have in a folder to v0 mp3. It doesn't need to be complicated, just enough to do the job and help me learn a little bit of python in the process ;)
I've gathered that I'll need to use something like "from subprocess import call" to do the calling of "lame", but I'm stuck as to how I can write the rest. I've written bash scripts to do this before, but on windows they're not much good to me.
I understand basic python programming.
Here's a sample that works on Ubuntu Linux at least. If you're on Windows, you'll need to change the direction of the slashes.
import os
import os.path
import sys
from subprocess import call
def main():
path = '/path/to/directory/'
filenames = [
filename
for filename
in os.listdir(path)
if filename.endswith('.wav')
]
for filename in filenames:
call(['lame', '-V0',
os.path.join(path, filename),
os.path.join(path, '%s.mp3' % filename[:-4])
])
return 0
if __name__ == '__main__':
status = main()
sys.exit(status)
This is what i came up with so far
#!/usr/bin/env python
import os
lamedir = 'lame'
searchdir = "/var/test"
name = []
for f in os.listdir(searchdir):
name.append(f)
for files in name:
iswav = files.find('.wav')
#print files, iswav
if(iswav >0):
print lamedir + ' -h -V 6 ' + searchdir + files + ' ' + searchdir + files[:iswav]+'.mp3'
os.system(lamedir + ' -h -V 6 ' + searchdir + files + " " + searchdir + files[:iswav]+".mp3")

Categories