Python checking integrity of gzip archive - python

Is there a way in Python using gzip or other module to check the integrity of the gzip archive?
Basically is there equivalent in Python to what the following does:
gunzip -t my_archive.gz

Oops, first answer (now deleted) was result of misreading the question.
I'd suggest using the gzip module to read the file and just throw away what you read. You have to decode the entire file in order to check its integrity in any case. https://docs.python.org/2/library/gzip.html
Something like ( Untested code)
import gzip
chunksize=10000000 # 10 Mbytes
ok = True
with gzip.open('file.txt.gz', 'rb') as f:
try:
while f.read(chunksize) != b'':
pass
except:
ok = False
I don't know what exception reading a corrupt zipfile will throw, you might want to find out and then catch only this particular one.

you can use subprocess or os module to execute this command and read the output. something like this
Using os module
import os
output = os.popen('gunzip -t my_archive.gz').read()
Using Subprocess Module
import subprocess
proc = subprocess.Popen(["gunzip", "-t", "my_archive.gz"], stdout=subprocess.PIPE, shell=True)
(out, err) = proc.communicate()

Related

How do I run another file in python?

So I know how to write in a file or read a file but how do I RUN another file?
for example in a file I have this:
a = 1
print(a)
How do I run this using another file?
file_path = "<path_to_your_python_file>"
using subprocess standard lib
import subprocess
subprocess.call(["python3", file_path])
or using os standard lib
import os
os.system(f"python3 {file_path}")
or extract python code from the file and run it inside your script:
with open(file_path, "r+", encoding="utf-8") as another_file:
python_code = another_file.read()
# running the code inside the file
exec(python_code)
exec is a function that runs python strings exactly how python interpreter runs python files.
IN ADDITION
if you want to see the output of the python file:
import subprocess
p = subprocess.Popen(
["python3", file_path],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
err, output = p.communicate()
print(err)
print(output)
EXTRA
for people who are using python2:
execfile(file_path)
exec_file documentation

How to use Popen to read a json file in subprocess as there is a 33766 limit in terminal

I am using this code to read a json file in subprocess. It does work for only small jsons, If it exceeds over 33766 count. it will show a error showing
FileNotFoundError: [WinError 206] The filename or extension is too long.
this is beccause of exceeding 33766 count. so how to read the json file using popen .Read that this can solve the problem. Help me with suggestions. I am new here :\
import subprocess
import json
import os
from pprint import pprint
auth = "authorization: token 1234
file = "jsoninput11.json"
fd=open("jsoninput11.json")
json_content = fd.read()
fd.close()
subprocess.run(["grpcurl", "-plaintext","-H", auth,"-d","#",json_content,"-format","json","100.20.20.1:5000","api.Service/Method"])
I am not sure but maybe the problem is related to the bufsize (check this:
Very large input and piping using subprocess.Popen )
Does it works with capture_output=False?
subprocess.run(["grpcurl", "-plaintext","-H", auth,"-d","#",json_content,"-format","json","100.20.20.1:5000","api.Service/Method"], capture_output=False)
On the other side, if you need the output you may need to deal with the PIPE of Popen.

subprocess gunzip throws decompression failed

I am trying to gunzip using subprocess but it returns the error -
('Decompression failed %s', 'gzip: /tmp/tmp9OtVdr is a directory -- ignored\n')
What is wrong?
import subprocess
transform_script_process = subprocess.Popen(
['gunzip', f_temp.name, '-kf', temp_dir],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)(transform_script_stdoutdata, transform_script_stderrdata
) = transform_script_process.communicate()
self.log.info("Transform script stdout %s",
transform_script_stdoutdata)
if transform_script_process.returncode > 0:
shutil.rmtree(temp_dir)
raise AirflowException("Decompression failed %s",
transform_script_stderrdata)
You are calling the gunzip process and passing it the following parameters:
f_temp.name
-kf
temp_dir
I'm assuming f_temp.name is the path to the gzipped file you are trying to unzip. -kf will force decompression and instruct gzip to keep the file after decompressing it.
Now comes the interesting part. temp_dir seems like a variable that would hold the destination directory you want to extract the files to. However, gunzip does not support this. Please have a look at the manual for gzip. It states that you must pass in a list of files to decompress. There is no option to specify the destination directory.
Have a look at this post on Superuser for more information on specifying the folder you want to extract to: https://superuser.com/questions/139419/how-do-i-gunzip-to-a-different-destination-directory

Subprocess error file

I'm using the python module subprocess to call a program and redirect the possible std error to a specific file with the following command:
with open("std.err","w") as err:
subprocess.call(["exec"],stderr=err)
I want that the "std.err" file is created only if there are errors, but using the command above if there are no errors the code will create an empty file.
How i can make python create a file only if it's not empty?
I can check after execution if the file is empty and in case remove it, but i was looking for a "cleaner" way.
You could use Popen, checking stderr:
from subprocess import Popen,PIPE
proc = Popen(["EXEC"], stderr=PIPE,stdout=PIPE,universal_newlines=True)
out, err = proc.communicate()
if err:
with open("std.err","w") as f:
f.write(err)
On a side note, if you care about the return code you should use check_call, you could combine it with a NamedTemporaryFile:
from tempfile import NamedTemporaryFile
from os import stat,remove
from shutil import move
try:
with NamedTemporaryFile(dir=".", delete=False) as err:
subprocess.check_call(["exec"], stderr=err)
except (subprocess.CalledProcessError,OSError) as e:
print(e)
if stat(err.name).st_size != 0:
move(err.name,"std.err")
else:
remove(err.name)
You can create your own context manager to handle the cleanup for you -- you can't really do what you're describing here, which boils down to asking how you can see into the future. Something like this (with better error handling, etc.):
import os
from contextlib import contextmanager
#contextmanager
def maybeFile(fileName):
# open the file
f = open(fileName, "w")
# yield the file to be used by the block of code inside the with statement
yield f
# the block is over, do our cleanup.
f.flush()
# if nothing was written, remember that we need to delete the file.
needsCleanup = f.tell() == 0
f.close()
if needsCleanup:
os.remove(fileName)
...and then something like:
with maybeFile("myFileName.txt") as f:
import random
if random.random() < 0.5:
f.write("There should be a file left behind!\n")
will either leave behind a file with a single line of text in it, or will leave nothing behind.

Need to run a diff command on 2 NamedTemporaryFiles using subprocess module

I am trying to run a diff on 2 named temporary files, I did not use difflib because its output was different from the linux diff.
When I run this code, It does not output anything. I tried a diff on regular files and that works just fine.
#using python 2.6
temp_stage = tempfile.NamedTemporaryFile(delete = False)
temp_prod = tempfile.NamedTemporaryFile(delete = False)
temp_stage.write(stage_notes)
temp_prod.write(prod_notes)
#this does not work, shows no output, tried both call and popen
subprocess.Popen(["diff", temp_stage.name, temp_prod.name])
#subprocess.call(["diff", temp_stage.name, temp_prod.name])
You need to force the files to be written out to disk by calling flush(); or else the data you were writing to the file may only exist in a buffer.
In fact, if you do this, you can even use delete = True, assuming there's no other reason to keep the files around. This keeps the benefit of using tempfile.
#!/usr/bin/python2
temp_stage = tempfile.NamedTemporaryFile(delete = True)
temp_prod = tempfile.NamedTemporaryFile(delete = True)
temp_stage.write(stage_notes)
temp_prod.write(prod_notes)
temp_stage.flush()
temp_prod.flush()
subprocess.Popen(["diff", temp_stage.name, temp_prod.name])
Unrelated to your .flush() issue, you could pass one file via stdin instead of writing data to disk:
from tempfile import NamedTemporaryFile
from subprocess import Popen, PIPE
with NamedTemporaryFile() as file:
file.write(prod_notes)
file.flush()
p = Popen(['diff', '-', file.name], stdin=PIPE)
p.communicate(stage_notes) # diff reads the first file from stdin
if p.returncode == 0:
print('the same')
elif p.returncode == 1:
print('different')
else:
print('error %s' % p.returncode)
diff reads from stdin if input filename is -.
If you use a named pipe then you don't need to write data to disk at all:
from subprocess import Popen, PIPE
from threading import Thread
with named_pipe() as path:
p = Popen(['diff', '-', path], stdin=PIPE)
# use thread, to support content larger than the pipe buffer
Thread(target=p.communicate, args=[stage_notes]).start()
with open(path, 'wb') as pipe:
pipe.write(prod_notes)
if p.wait() == 0:
print('the same')
elif p.returncode == 1:
print('different')
else:
print('error %s' % p.returncode)
where named_pipe() context manager is defined as:
import os
import tempfile
from contextlib import contextmanager
from shutil import rmtree
#contextmanager
def named_pipe(name='named_pipe'):
dirname = tempfile.mkdtemp()
try:
path = os.path.join(dirname, name)
os.mkfifo(path)
yield path
finally:
rmtree(dirname)
The content of a named pipe doesn't touch the disk.
I would suggest bypassing the tempfile handling since with a NTF you're going to have to handle cleanup anyway. Create a new file and write your data then close it. Flush the buffer and then call the subprocess commands. See if that gets it to run.
f=open('file1.blah','w')
f2=open('file2.blah','w')
f.write(stage_notes)
f.flush()
f.close()
f2.write(prod_notes)
f2.flush()
f2.close()
then run your subprocess calls

Categories