So I've got a python script that, at it's core, makes .7z archives of selected directories for the purpose of backing up data. For simplicty sake I've simply invoked 7-zip through the windows command line, like so:
def runcompressor(target, contents):
print("Compressing {}...".format(contents))
archive = currentmodule
archive += "{}\\{}.7z".format(target, target)
os.system('7z u "{}" "{}" -mx=9 -mmt=on -ssw -up1q0r2x2y2z1w2'.format(archive, contents))
print("Done!")
Which creates a new archive if one doesn't exist and updates the old one if it does, but if something goes wrong the archive will be corrupted, and if this command hits an existing, corrupted archive, it just gives up. Now 7zip has a command for testing the integrity of an archive, but the documentation says nothing about giving an output, and then comes the trouble of capturing that output in python.
Is there a way I can test the archives first, to determine if they've been corrupted?
The 7z executable returns a value of two or greater if it encounters a problem. In a batch script, you would generally use errorlevel to detect this. Unfortunately, os.system() under Windows gives the return value of the command interpreter used to run your program, not the exit value of your program itself.
If you want the latter, you'll probably going to have to get your hands a little dirtier with the subprocess module, rather than using the os.system() call.
If you have version 3.5 (or better), this is as simple as:
import subprocess as sp
x = sp.run(['7z', 'a', 'junk.7z', 'junk.txt'], stdout=sp.PIPE, stderr=sp.STDOUT)
print(x.returncode)
That junk.txt in my case is a real file but junk.7z is just a copy of one of my text files, hence an invalid archive. The output from the program is 2 so it's easily detectable if something went wrong.
If you print out x rather than just x.returncode, you'll see something like (reformatted and with \r\n sequences removed for readability):
CompletedProcess(
args=['7z', 'a', 'junk.7z', 'junk.txt'],
returncode=2,
stdout=b'
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Error: junk.7z is not supported archive
System error:
Incorrect function.
'
)
Related
(Background: On an NTFS partition, files and/or folders can be set to "compressed", like it's a file attribute. They'll show up in blue in Windows Explorer, and will take up less disk space than they normally would. They can be accessed by any program normally, compression/decompression is handled transparently by the OS - this is not a .zip file. In Windows, setting a file to compressed can be done from a command line with the "Compact" command.)
Let's say I've created a file called "testfile.txt", put some data in it, and closed it. Now, I want to set it to be NTFS compressed. Yes, I could shell out and run Compact, but is there a way to do it directly in Python code instead?
In the end, I ended up cheating a bit and simply shelling out to the command line Compact utility. Here is the function I ended up writing. Errors are ignored, and it returns the output text from the Compact command, if any.
def ntfscompress(filename):
import subprocess
_compactcommand = 'Compact.exe /C /I /A "{}"'.format(filename)
try:
_result = subprocess.run(_compactcommand, timeout=86400,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,text=True)
return(_result.stdout)
except:
return('')
I am doing a script using SoX to merge multiple audio file together.
This command works in the terminal
sox &(ls *.mp3) out.mp3
but if I try using it inside a python script by calling subprocess.run() it doesn't
subprocess.run(['sox', '$(ls *.mp3)', 'out.mp3'])
> sox FAIL formats: can't open input file `$(ls *.mp3)': No such file or
> directory
I image that is because of the subshell operation, but I don't know how to pass it correctly.
I also tried, as some other post suggested, passing the argument shell=True but then it says
> sox FAIL sox: Not enough input filenames specified
I am in the same working directory and I also tried supplying the full path but doesn't work either.
I could just write a bash script and call it, but I would like to know how to deal in this scenario with Python.
you want to use shell=True to force subprocess to run your command through the shell interpreter and parse the wildcards/sub-commands. However this (depending on the platform) imposes that the argument is passed as string, not as list of parameters. A lot of constraints for a lazy & unsafe way of doing it.
Wait. You can do without shell=True using glob.glob:
subprocess.run(['sox'] + glob.glob('*.mp3') + ['out.mp3'])
Would be better to check if there actually are mp3 files in the current folder so:
input_files = glob.glob('*.mp3')
if input_files:
subprocess.run(['sox'] + input_files + ['out.mp3'])
else:
raise Exception("No mp3 files")
if you get the "No mp3 files" message, then check the current directory. It's always good to use a parameter for the input directory, and avoid relying on the current directory (glob.glob(os.path.join(input_directory,'*.mp3')))
I am working on extracting PDFs from SEC filings. They usually come like this:
SEC Filing Example
For whatever reason when I save the raw PDF to a .text file, and then try to run
uudecode -o output_file.pdf input_file.txt
from the python subprocess.call() function or any other python function that allows commands to be executed from the command line, the PDF files that are generated are corrupted. If I run this same command from the command line directly there is no corruption.
When taking a closer look at the PDF file being output from the python script, it looks like the file ends prematurely. Is there some sort of output limit when executing a command line command from python?
Thanks!
This script worked fine for me running under Python 3.4.1 on Fedora 21 x86_64 with uudecode 4.15.2:
import subprocess
subprocess.call("uudecode -o output_file.pdf input_file.txt", shell=True)
Using the linked SEC filing (length: 173,141 B; sha1: e4f7fa2cbb3422411c2f2968d954d6bb9808b884), the decoded PDF (length: 124,557 B; sha1: 1676320e1d9923e14d19451c16688198bc93ca0d) appears correct when viewed.
There may be something else in your environment causing the problem. You may want to add additional details to your question.
Is there some sort of output limit when executing a command line command from python?
If by "output limit" you mean the size of the file being written by uudecode, then no. The only type of "output limit" you need to worry about when using the subprocess module is when you pass stdout=PIPE or stderr=PIPE when creating a child process. If the child process writes enough data to either of these streams, and your script does not regularly drain them, the child process will block (see the subprocess module documentation). In my test, uudecode wrote nothing to stdout or stderr.
I am trying to make a python script that will open a directory, apply a perl script to every file in that directory and get its out put in either multiple text files or just one.
I currently have:
import shlex, subprocess
arg_str = "perl tilt.pl *.pdb > final.txt"
arg = shlex.split(arg_str)
import os
framespdb = os.listdir("prac_frames")
for frames in framespdb:
subprocess.Popen(arg, stdout=True)
I keep getting *.pdb not found. I am very new to all of this so any help trying to complete this script would help.
*.pdb not found means exactly that - there won't be a *.pdb in whatever directory you're running the script... and as I read the code - I don't see anything to imply it's within 'frames' when it runs the perl script.
you probably need os.chdir(path) before the Popen.
How do I "cd" in Python?
...using a python script to run somewhat dubious syscalls to perl may offend some people but everyone's done it.. aside from that I'd point out:
always specify full paths (this becomes a problem if you will later say, want to run your job automatically from cron or an environment that doesn't have your PATH).
i.e. 'which perl' - put that full path in.
./.pdb would be better but not as good as the fullpath/.pdb (which you could use instead of the os.chdir option).
subprocess.Popen(arg, stdout=True)
does not expand filename wildcards. To handle your *.pdb, use shell=True.
From the python console this works:
convert -quality 100 in.pdf out.png
but when I add that command to my Python script like this:
Popen(['convert', '-quality 100', 'in.pdf', 'out.png'])
I get:
unrecognized option `-quality 100'
If I change that parameter to '-quality=100' I still get the error.
I tried fixing it like this:
Popen(['convert', '-quality', '100', 'in.pdf', 'out.png'])
which runs but fails to produce an out.png.
UPDATE: The last version is working. I must have mistyped it originally.
Every argument gets its own list element, so the second variant is correct.
You should bear in mind that until a call to communicate finishes, the command may still run (although that's unlikely in your case). Check returncode after calling communicate to find out whether the program encountered any errors (like a malformed PDF file or so).
Also, imagemagick convert writes out multipage PDFs to multiple PNG files (out-0.png, out-1.png). Check whether those exist. Use -append to supress that behavior.
import subprocess
Popen = subprocess.Popen
s = Popen(['convert', '-quality', '100', 'in.pdf', '-append', 'out.png'])
s.communicate()
if s.returncode != 0:
raise OSError('convert error')
This works fine:
#! /usr/bin/python3.2
from subprocess import Popen
Popen ( ['convert', '-quality', '100', 'test.pdf', 'out.png'] )
Using
Version: ImageMagick 6.6.2-6 2011-03-16 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2010 ImageMagick Studio LLC
Features: OpenMP
Are you sure the script can find your in.pdf?
The convert command must be found in the execution path when the script runs. Can you include the full path to convert in the arguments?
Popen(['/path/to/convert', '-quality', '100', 'in.pdf', 'out.png'])
Replace '/path/to/convert' with the real path. Also, you will need to ensure that the account that executes the script has read and write permissions in the current directory.