Setting unix permissions in a ZipFile in a portable python script

Setting unix permissions in a ZipFile in a portable python script - python

I'm trying to create a python script which places a number of files in a "staging" directory tree, and then uses ZipFile to a create a .zip archive of them. This will later be copied to a linux machine, which will extract the files and use them. The staging directory contains a mix of text and binary data files. The section doing the writing is in this "try" block:
try:
import zipfile
zipf = zipfile.ZipFile(out_file, 'w', zipfile.ZIP_DEFLATED)
for root, dirs, files in os.walk(staging_dir):
for d in dirs:
# Write directories so even empty directories are copied:
arcname = os.path.relpath(os.path.join(root, d), staging_dir)
zipf.write(os.path.join(root, d), arcname)
for f in files:
arcname = os.path.relpath(os.path.join(root, f), staging_dir)
zipf.write(os.path.join(root, f), arcname)
This works on a linux machine running python 2.7 (my main goal) or 3.x (secondary goal). It can also run on a Windows machine (sort of an afterthought, it might be useful), but there's a problem with permissions in that case. Normally the script sets permissions in the files in the staging_dir with "os.chmod", and then zip creates the archive with the right permissions. But running this on windows, the "os.chmod" command doesn't really set all linux file modes (not possible), so the zipfile contents aren't at the right permissions. I'm trying to figure out if there's a way to fix the permissions when making the zipfile in the code above. In particular, files in staging_dir/bin need to have "0o750" permissions.
I've seen the answer to How do I set permissions (attributes) on a file in a ZIP file using Python's zipfile module, so I see how you could set permissions with "external_attr", and then write a file with "ZipFile.writestr". But the "external_attr" doesn't seem to apply to "ZipFile.write", only "ZipFile.writestr". And I'd like to do this on a zip archive that contains some binary files. Is there any other option than "writestr"? Is it be possible to use "writestr" on large binary files?

Related

Is it possible to create a HDF5 structure matching the file path on a local pc?

I am trying to automatically create a HDF5 structure by using the file paths on my local pc. I want to read through the subdirectories and create a HDF5 structure to match, that I can then save files to. Thank you

You can do this by combining os.walk() and h5py create_group(). The only complications are handling Linux vs Windows (and drive letter on Windows). Another consideration is relative vs absolute path. (I used absolute path, but my example can be modified. (Note: it's a little verbose so you can see what's going on.) Here is the example code (for Windows):
with h5py.File('SO_73879694.h5','w') as h5f:
cwd = os.getcwd()
for root, dirs, _ in os.walk(cwd, topdown=True):
print(f'ROOT: {root}')
# for Windows, modify root: remove drive letter and replace backslashes:
grp_name = root[2:].replace( '\\', '/')
print(f'grp_name: {grp_name}\n')
h5f.create_group(grp_name)

This is actually quite easy to do using HDFql in Python. Here is a complete script that does that:
# import HDFql module (make sure it can be found by the Python interpreter)
import HDFql
# create and use (i.e. open) an HDF5 file named 'my_file.h5'
HDFql.execute("CREATE AND USE FILE my_file.h5")
# get all directories recursively starting from the root of the file system (/)
HDFql.execute("SHOW DIRECTORY / LIKE **")
# iterate result set and create group in each iteration
while HDFql.cursor_next() == HDFql.SUCCESS:
HDFql.execute("CREATE GROUP \"%s\"" % HDFql.cursor_get_char())

How to parse cythonized .so files in python

Is it possible to open and read cythonized .so files with python?
The use-case is a test that scans all python files in a directory and evaluates if certain object attributes are used (to be ultimately able to identify and remove unused attributes).
This test runs perfectly on the local environment but in our CI that cythonizes all files this breaks, as .so files can't be parsed.
Currently I am scanning the files for the object attribute occurrences like this:
import os
path = '/path/to/dir'
attribute_regex = r'object\.(\w+)'
used_attributes = set()
for root, _, files in os.walk(path):
for file in files:
with open(os.path.join(root, file), 'r') as f:
used_attributes.update(re.findall(attribute_regex, f))
Maybe I am looking at this issue from the wrong angle, are there other more sophisticated ways to check if attributes of an object are used across multiple python files?

How to add a password and output directory using Zipfile module in Python?

I got below code from online and I am trying to add a password and I want to change the result directory to be "C:#SFTPDWN" (Final Zip file should be in this folder).
I try to change it like below, it did not work.
with ZipFile('CC-Data.zip', 'w', 'pass word') as zip:
Can anybody please tell how to change this code to add password and change result folder?
One last thing, currently it will zip #SFTPDWN folder, I just want to zip everything inside (Right now it will create two folders (CC-Data.zip and inside it #SFTPDWN )). Can anybody please tell me how to zip everything inside #SFTPDWN folder?
Code
from zipfile import ZipFile
import os
def get_all_file_paths(directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
return file_paths
def main():
# path to folder which needs to be zipped
directory = 'C:\#SFTPDWN'
file_paths = get_all_file_paths(directory)
print('Following files will be zipped:')
for file_name in file_paths:
print(file_name)
with ZipFile('CC-Data.zip', 'w') as zip:
# writing each file one by one
for file in file_paths:
zip.write(file)
print('Zipped successfully!')
if __name__ == "__main__":
main()

For the password question: from the documentation:
This module [...] supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C.
https://docs.python.org/3/library/zipfile.html
You would need to use a 3rd party library to create an encrypted zip, or encrypt the archive some other way.
For the second part, in ZipFile.write the documentation also mentions:
ZipFile.write(filename, arcname=None, compress_type=None, compresslevel=None)
Write the file named filename to the archive, giving it the archive name arcname (by default, this will be the same as filename, but without a drive letter and with leading path separators removed). [...]
Note: Archive names should be relative to the archive root, that is, they should not start with a path separator.
https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.write
So you would need to strip off whatever prefix of your file variable and pass that as the arcname parameter. Using os.path.relpath may help, e.g. (I'm on Linux, but should work with Windows paths under Windows):
>>> os.path.relpath("/folder/subpath/myfile.txt", "/folder/")
'subpath/myfile.txt'
Sidebar: a path like "C:\Something" is an illegal string, as it has the escape \S. Python kinda tolerates this (I think in 3.8 it will error) and rewrites them literally. Either use "C:\\Something", r"C:\Something", or "C:/Something" If you attempted something like "C:\Users" it would actually throw an error, or "C:\nothing" it might silently do something strange...

Python zipfile issue on Windows

I wrote a simple, rough program that automatically zip everything inside the current working directory. It works very well on Linux but there is huge problem when running on Windows.
Here is my code:
import os, zipfile
zip = zipfile.ZipFile('zipped.zip', 'w') #Create a zip file
zip.close()
zip = zipfile.ZipFile('zipped.zip', 'a') #Make zip file append instead of overwriting
for dir, subdir, file in os.walk(os.path.relpath('.')): #Loop for walking thru the directory
for subdirectory in subdir:
subdirs = os.path.join(dir, subdirectory)
zip.write(subdirs, compress_type=zipfile.ZIP_DEFLATED)
for files in file:
fil = os.path.join(dir, files)
zip.write(fil, compress_type=zipfile.ZIP_DEFLATED)
zip.close()
When I ran this on Windows, it won't stop compressing, but infinitely create the "zipped.zip" file in the zipped file, after left it running a few seconds, generated few hundreds MB of file. On Linux, the program will stop after it zipped all the files excluding newly created zipped.zip.
Screenshot: A "zipped.zip" inside the "zipped.zip"
I am wondering did I miss some code that will make this works well on Windows?

I would zip the folder in a temporary zipfile, then move the temporary zipfile in the folder.

That seems to be because you are saving the zip to the same folder that you are trying to compress, and that must be confusing os.walk() somehow.
One possible solution, as long as you don't have a giant directory to compress, is to use os.walk() to build a full list of what will be compressed, and after the list is complete, then you would it to populate the zip, instead of using os.walk() directly.

Python tarfile extractall except files matching string

I have a legacy script which fetches boost libraries via a python script and extracts then builds them.
On windows, the extract step fails because the path is too long for some of the files in the boost archive. E.g.
IOError: [Errno 2] No such file or directory: 'C:\\<my_path>\\boost_1_57_0\\libs\\geometry\\doc\\html\\geometry\\reference\\spatial_indexes\\boost__geometry__index__rtree\\rtree_parameters_type_const____indexable_getter_const____value_equal_const____allocator_type_const___.html'
Is there anyway to simply make the tarfile lib extractall but ignore all files with .html extension?
Alternatively, is there a way to allow paths which exceed the windows limit of 266?

You can loop through all the files in the tar and extract only those that don't end with ".html"
import os
import tarfile
def custom_files(members):
for tarinfo in members:
if os.path.splitext(tarinfo.name)[1] != ".html":
yield tarinfo
tar = tarfile.open("sample.tar.gz")
tar.extractall(members=custom_files(tar))
tar.close()
The example code and information about the modules was found here
Coming to overcoming the limit on size of the file names, please refer the Microsoft doc](https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.