Python tarfile extractall except files matching string - python

I have a legacy script which fetches boost libraries via a python script and extracts then builds them.
On windows, the extract step fails because the path is too long for some of the files in the boost archive. E.g.
IOError: [Errno 2] No such file or directory: 'C:\\<my_path>\\boost_1_57_0\\libs\\geometry\\doc\\html\\geometry\\reference\\spatial_indexes\\boost__geometry__index__rtree\\rtree_parameters_type_const____indexable_getter_const____value_equal_const____allocator_type_const___.html'
Is there anyway to simply make the tarfile lib extractall but ignore all files with .html extension?
Alternatively, is there a way to allow paths which exceed the windows limit of 266?

You can loop through all the files in the tar and extract only those that don't end with ".html"
import os
import tarfile
def custom_files(members):
for tarinfo in members:
if os.path.splitext(tarinfo.name)[1] != ".html":
yield tarinfo
tar = tarfile.open("sample.tar.gz")
tar.extractall(members=custom_files(tar))
tar.close()
The example code and information about the modules was found here
Coming to overcoming the limit on size of the file names, please refer the Microsoft doc](https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx)

Related

Looping for files in a directory, but cannot open the files in the directory because it doesn't exist

I was trying to read data from a bunch of textfiles in a directory, but getting an error while opening the file
import os
fileList = os.listdir("Desktop/SLUI")
for txtName in fileList:
#Open the textfile
UIname=str(txtName)
userDTL=open(UIname,'r')
if userDTL.mode=='r':
line=userDTL.readlines()
string1=line[0]
string2=line[1]
string3=line[2]
UserDTL.close()
print(string1)
Here is the error when I try to run this code via cmd.exe
File "C:\Users\*****\Desktop\programName.py", line 24, in <module>
userDTL=open(UIname,'r')
FileNotFoundError: [Errno 2] No such file or directory: 'file1.txt'
It's because os.listdir only displays the name of the files, and to open them you need the whole path.
You need to redefine UIname as such:
UIname=os.path.join("Desktop/SLUI",txtName) # I don't think you need the string conversion.
os.path.join() will properly join two (or more) bits of paths, whichever OS you use.
I'm not really familiar with how Python works on Windows, so you might need to replace "Desktop/SLUI" by the appropriate path to your desktop (C:\Users*****\Desktop).

Setting unix permissions in a ZipFile in a portable python script

I'm trying to create a python script which places a number of files in a "staging" directory tree, and then uses ZipFile to a create a .zip archive of them. This will later be copied to a linux machine, which will extract the files and use them. The staging directory contains a mix of text and binary data files. The section doing the writing is in this "try" block:
try:
import zipfile
zipf = zipfile.ZipFile(out_file, 'w', zipfile.ZIP_DEFLATED)
for root, dirs, files in os.walk(staging_dir):
for d in dirs:
# Write directories so even empty directories are copied:
arcname = os.path.relpath(os.path.join(root, d), staging_dir)
zipf.write(os.path.join(root, d), arcname)
for f in files:
arcname = os.path.relpath(os.path.join(root, f), staging_dir)
zipf.write(os.path.join(root, f), arcname)
This works on a linux machine running python 2.7 (my main goal) or 3.x (secondary goal). It can also run on a Windows machine (sort of an afterthought, it might be useful), but there's a problem with permissions in that case. Normally the script sets permissions in the files in the staging_dir with "os.chmod", and then zip creates the archive with the right permissions. But running this on windows, the "os.chmod" command doesn't really set all linux file modes (not possible), so the zipfile contents aren't at the right permissions. I'm trying to figure out if there's a way to fix the permissions when making the zipfile in the code above. In particular, files in staging_dir/bin need to have "0o750" permissions.
I've seen the answer to How do I set permissions (attributes) on a file in a ZIP file using Python's zipfile module, so I see how you could set permissions with "external_attr", and then write a file with "ZipFile.writestr". But the "external_attr" doesn't seem to apply to "ZipFile.write", only "ZipFile.writestr". And I'd like to do this on a zip archive that contains some binary files. Is there any other option than "writestr"? Is it be possible to use "writestr" on large binary files?

compress multiple files into a bz2 file in python

I need to compress multiple files into one bz2 file in python.
I'm trying to find a way but I can't can find an answer.
Is it possible?
This is what tarballs are for. The tar format packs the files together, then you compress the result. Python makes it easy to do both at once with the tarfile module, where passing a "mode" of 'w:bz2' opens a new tar file for write with seamless bz2 compression. Super-simple example:
import tarfile
with tarfile.open('mytar.tar.bz2', 'w:bz2') as tar:
for file in mylistoffiles:
tar.add(file)
If you don't need much control over the operation, shutil.make_archive might be a possible alternative, which would simplify the code for compressing a whole directory tree to:
shutil.make_archive('mytar', 'bztar', directory_to_compress)
Take a look at python's bz2 library. Make sure to google and read the docs first!
https://docs.python.org/2/library/bz2.html#bz2.BZ2Compressor
you have import package for:
import tarfile,bz2
and multilfile compress in bz format
tar = tarfile.open("save the directory.tar.bz", "w:bz2")
for f in ["gti.png","gti.txt","file.taz"]:
tar.add(os.path.basename(f))
tar.close()
let use for in zip format was open in a directory open file
an use
os.path.basename(src_file)
open a only for file
Python's standard lib zipfile handles multiple files and has supported bz2 compression since 2001.
import zipfile
sourcefiles = ['a.txt', 'b.txt']
with zipfile.ZipFile('out.zip', 'w') as outputfile:
for sourcefile in sourcefiles:
outputfile.write(sourcefile, compress_type=zipfile.ZIP_BZIP2)

Ignoring windows hidden files with python glob

I am moving some files with a python script. The script should work on both osx and windows.
I am using the the glob module to select the files. Filter out directories with isfile method from os.path. The glob module automatically ignores unix . files but it seems that it does grab some windows hidden files. I have added code to remove one "desktop.ini" that seems to have appeared in windows.
Are there any other Windows files that might appear or is there a way to ensure that I do not select hidden files in Windows?
files = glob.glob('*')
files = filter(os.path.isfile, files) # filter out dirs
if "desktop.ini" in files : files.remove('desktop.ini')
# then using "shutil.move" to actually move the files
You might want to try Formic.
from formic import FileSet
fileset = FileSet(directory="/some/where/interesting",
include="*.py",
exclude=["desktop.ini", ".*", "addition", "globs", "here"]
)
for filename in fileset:
# use shutil to move them
This is a Python library using Globs, but i) already understands most hidden files (list of builtins here), and ii) allows you to specify any files to exclude from the results (documentation)
Disclosure: I am the maintainer.

Extracting BZ2 compressed folder using Python

I am trying to extract a bz2 compressed folder in a specific location.
I can see the data inside by :
handler = bz2.BZ2File(path, 'r')
print handler.read()
But I wish to extract all the files in this compressed folder into a location (specified by the user) maintaining the internal directory structure of the folder.
I am fairly new to this language .. Please help...
Like gzip, BZ2 is only a compressor for single files, it can not archive a directory structure. What I suspect you have is an archive that is first created by a software like tar, that is then compressed with BZ2. In order to recover the "full directory structure", first extract your Bz2 file, then un-tar (or equivalent) the file.
Fortunately, the Python tarfile module supports bz2 option, so you can do this process in one shot.
bzip2 is a data compression system which compresses one entire file. It does not bundle files and compress them like PKZip does. Therefore handler in your example has one and only one file in it and there is no "internal directory structure".
If, on the other hand, your file is actually a compressed tar-file, you should look at the tarfile module of Python which will handle decompression for you.
You need to use the tarfile module to uncompress a .tar.bz2 file ... from the docs here is how you can do it:
import tarfile
tar = tarfile.open(path, "r:bz2")
for tarinfo in tar:
print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
if tarinfo.isreg():
print "a regular file."
# read the file
f = tar.extractfile(tarinfo)
print f.read()
elif tarinfo.isdir():
print "a directory."
else:
print "something else."
tar.close()

Categories