Python: Unzipping and decompressing .Z files inside .zip - python

I am trying to unzip a Alpha.zip folder which contains a Beta directory which contains a Gamma Folder which contains a.Z, b.Z, c.Z, d.Z files. Using zip and 7-zip I was able to extract all a.D, b.D, c.D, d.D files stored within the .Z files.
I tried this in python using Import gzip and Import zlib.
import sys
import os
import getopt
import gzip
f = open('a.d.Z','r')
file_content = f.read()
f.close()
I keep getting all sorts of errors including: this is not a zip file, return codecs.charmap_encode(input self.errors encoding_map) 0. Any suggestions as to how to code this?

You need to actually make use of a zip library of some kind. Right now you're importing gzip, but you're not doing anything with it. Try taking a look at the gzip documentation and opening the file using that library.
gzip_file = gzip.open('a.d.Z') # use gzip.open instead of builtin open function
file_content = gzip_file.read()
Edit based on your comment: you can't just open all kinds of compressed files with any compression library. Since you have a .Z file, it's likely that you want to use zlib rather than gzip, but since extensions are just conventions, only you know for sure what compression format your file is in. To use zlib, do something like this instead:
# Note: untested code ahead!
import zlib
with open('a.d.Z', 'rb') as f: # Notice that I open this in binary mode
file_content = f.read() # Read the compressed binary data
decompressed_content = zlib.decompress(file_content) # Decompress

Related

Can I read non-code files in a Python Zip archive?

I have a Python application in a directory dir. This directory has a __main__.py file and several data files that are read by the application using open(...,'r'). Without editing the code, it it possible to bundle the code and data files into a single zip file and execute it using something like python app.pyz
My goal is to share the file and data easily.
Running the application using python dir works fine.
If I make a zip file using python -m zipfile -c app.pyz dir/*, the resulting application will run but cannot read the files. This makes sense.
I can ask the customers to unzip the compressed folder before running or I could embed the files as strings within the code. That said, I'm curious of this can be avoided.
Can I bundle code and data into one file?
As of Python 3.9 you can use importlib.resources from the standard library. This module uses Python's import machinery to resolve the paths of data files as though they were modules inside a package.
Create a new package inside dir. Let's call it data. Make sure it has an __init__.py.
Add your data files to data. Let's say you added a text file text.txt and a binary file binary.dat.
Now from your __main__.py script or any part of your code with access to the module data, you can access files inside that package like so:
To read text.txt to memory as a string:
txt_file = importlib.resources.files("data").joinpath("text.txt").read_text(encoding="utf-8")
To read binary.dat to memory as bytes:
bin_file = importlib.resources.files("data").joinpath("binary.dat").read_bytes()
To open any file:
path = importlib.resources.files("data").joinpath("text.txt")
with path.open("rt", encoding="utf-8") as file:
lines = file.readlines()
# As streams:
textio_stream = importlib.resources.files("data").joinpath("text.txt").open("rt", encoding="utf-8")
bytesio_stream = importlib.resources.files("data").joinpath("binary.dat").open("rb")
If something requires an actual real file on the filesystem, or you simply want to wrap zipapp compatibility over existing code (e.g. with open()) without having to modify it:
# Old, incompatible with zipfiles.
file_path = "data/text.txt"
with open(file_path, "rt", encoding="utf-8") as file:
lines = file.readlines()
# New, compatible with zipfiles.
file_path = importlib.resources.files("data").joinpath("text.txt")
# If file is inside a zipfile, unzips it in a temporary file, then
# destroys it once the context manager closes. Otherwise, reads the file normally.
with importlib.resources.as_file(file_path) as path:
with open(path, "rt", encoding="utf-8") as file:
lines = file.readlines()
# Since it is a context manager, you can even store it like this:
file_path = importlib.resources.files("data").joinpath("text.txt")
real_path = importlib.resources.as_file(file_path)
with real_path as path:
with open(path, "rt", encoding="utf-8") as file:
lines = file.readlines()
The Traversable objects returned from importlib.resources functions can be mixed with Path objects using as_posix, since joinpath requires posix separators:
file_path = pathlib.Path("subdirectory", "text.txt")
txt_file = importlib.resources.files("data").joinpath(file_path.as_posix()).read_text(encoding="utf-8")
You can use slashes to grow a Traversable, just like pathlib.Path objects:
resources_root = importlib.resources.files("data")
text_path = resources_root / "text.txt"
bin_file = (resources_root / "subdirectory" / "bin.dat").read_bytes()
You can also import the data package like any other package, and use the module object directly. Subpackages are also supported. The only Python files inside the data tree are the __init__.py files of each subpackage:
# __main__.py
import importlib.resources
import data.config
import data.models.b
# Load binary file `file.dat` from `data.models.b`.
# Subpackages are being used as subdirectories.
bin_file = importlib.resources.files(data.models.b).joinpath("file.dat").read_bytes()
...
You technically only need to make your resource root directory be a package. For max brevity:
# __main__.py
from importlib.resources import files
data = files("data") # Resources root.
# In this example, `models` and `b` are regular directories:
bin_file = (data / "models" / "b" / "file.dat").read_bytes()
...
Note that importlib.resources and zipfiles in general support reading only and you will get an exception if you try to write to any file-like object returned from the above functions. It might technically be possible to support modifying data files inside zips but this is way out of scope. If you want to write files, just open a file in the filesystem as normal.
Now your data files have become file-system agnostic and your program should work via zipapp and normal invocation just the same.

Extract Tar File inside Memory Filesystem

I have trouble using memoryfs:
https://docs.pyfilesystem.org/en/latest/reference/memoryfs.html:
I'm trying to extract tar inside a memoryFS, but I cant use mem_fs because it is an object and cant get the real / memory path...
from fs import open_fs, copy
import fs
import tarfile
mem_fs = open_fs('mem://')
print(mem_fs.isempty('.'))
fs.copy.copy_file('//TEST_FS', 'test.tar', mem_fs, 'test.tar')
print(mem_fs.listdir('/'))
with mem_fs.open('test.tar') as tar_file:
print(tar_file.read())
tar = tarfile.open(tar_file) // I cant create the tar ...
tar.extractall(mem_fs + 'Extract_Dir') // Cant extract it too...
Can someone help me, it is possible to do that ?
The first argument to tarfile.open is a filename. You're (a) passing it an open file object, and (b) even if you were to pass in a filename, tarfile doesn't know anything about your in-memory filesystem and so wouldn't be able to find the file.
Fortunately, tarfile.open has a fileobj argument that accepts an open file object, so you can write:
with mem_fs.open('test.tar', 'rb') as tar_file:
tar = tarfile.open(fileobj=tar_file)
t.list()
Note that you need to open the file in binary mode (rb).
Of course, now you have a second problem: while you can open and read the archive, the tarfile module still doesn't know about your in-memory filesystem, so attempting to extract files will simply extract them to your local filesystem, which is probably not what you want.
To extract into your in-memory filesystem, you're going to need to read the data from the tar archive member and write it yourself. Here's one option for doing that:
import fs
import os
import pathlib
import tarfile
mem_fs = fs.open_fs('mem://')
fs.copy.copy_file('/', '{}/example.tar.gz'.format(os.getcwd()),
mem_fs, 'example.tar.gz')
with mem_fs.open('example.tar.gz', 'rb') as fd:
tar = tarfile.open(fileobj=fd)
# iterate over list of members
for member in tar.getmembers():
# if the member is a file
if member.isfile():
# create any necessary directories
p = pathlib.Path(member.path)
mem_fs.makedirs(str(p.parent), recreate=True)
# open the archive member
with mem_fs.open(member.path, 'wb') as memfd, \
tar.extractfile(member.path) as tarfd:
# and write the data into the memory fs
memfd.write(tarfd.read())
The tarfile.TarFile.extractfile method returns an open file object to a tar archive member, rather than extracting the file to disk.
Note that the above isn't an optimal solution if you're working with large files (since it reads the entire archive member into memory before writing it out).

compress multiple files into a bz2 file in python

I need to compress multiple files into one bz2 file in python.
I'm trying to find a way but I can't can find an answer.
Is it possible?
This is what tarballs are for. The tar format packs the files together, then you compress the result. Python makes it easy to do both at once with the tarfile module, where passing a "mode" of 'w:bz2' opens a new tar file for write with seamless bz2 compression. Super-simple example:
import tarfile
with tarfile.open('mytar.tar.bz2', 'w:bz2') as tar:
for file in mylistoffiles:
tar.add(file)
If you don't need much control over the operation, shutil.make_archive might be a possible alternative, which would simplify the code for compressing a whole directory tree to:
shutil.make_archive('mytar', 'bztar', directory_to_compress)
Take a look at python's bz2 library. Make sure to google and read the docs first!
https://docs.python.org/2/library/bz2.html#bz2.BZ2Compressor
you have import package for:
import tarfile,bz2
and multilfile compress in bz format
tar = tarfile.open("save the directory.tar.bz", "w:bz2")
for f in ["gti.png","gti.txt","file.taz"]:
tar.add(os.path.basename(f))
tar.close()
let use for in zip format was open in a directory open file
an use
os.path.basename(src_file)
open a only for file
Python's standard lib zipfile handles multiple files and has supported bz2 compression since 2001.
import zipfile
sourcefiles = ['a.txt', 'b.txt']
with zipfile.ZipFile('out.zip', 'w') as outputfile:
for sourcefile in sourcefiles:
outputfile.write(sourcefile, compress_type=zipfile.ZIP_BZIP2)

How to read gz compressed files from tar

Let's say we have a tar file which in turn contains multiple gzip compressed files. I want to be able to read the contents of those gzip files without compressing either the tar file or the individual gzip files. I 'm trying to use tarfile module in python.
This might work, I haven't tested it, but this has the main ideas, and related tools. It iterates over the files in the tar, and if they are gzipped, then will read them into the file_contents variable:
import tarfile as t
import gzip as g
for member in t.open("your.gz.tar").getmembers():
fo=t.extractfile(member)
file_contents = g.GzipFile(fileobj=fo).read()
note: if the file is too large for memory, then consider looking into a streamed reader (chunk by chunk) as linked.
If you have additional logic based on what the member (TarInfo) object looks like you can use these:
https://docs.python.org/2/library/tarfile.html#tarinfo-objects
see:
How can I decompress a gzip stream with zlib?
Python decompressing gzip chunk-by-chunk
reading tar file contents without untarring it, in python script

Why zipfile module is_zipfile function cannot detech a gzip file?

I am aware of this question Why "is_zipfile" function of module "zipfile" always returns "false"?. I want to seek some more clarification and confirmation.
I have created a zip file in python using the gzip module.
If I check the zip file using the file command in OSX I get this
> file data.txt
data.txt: gzip compressed data, was "Slide1.html", last modified: Tue Oct 13 10:10:13 2015, max compression
I want to write a generic function to tell if the file is gzip'ed or not.
import gzip
import os
f = '/path/to/data.txt'
print os.path.exists(f) # True
with gzip.GzipFile(f) as zf:
print zf.read() # Print out content as expected
import zipfile
print zipfile.is_zipfile(f) # Give me false. Not expected
I want to use zipfile module but it always reports false.
I just want to have a confirmation that zipfile module is not compatible with gzip. If so, why it is the case? Are zip and gzip considered different format?
I have created a zip file in python using the gzip module.
No you haven't. gzip doesn't create zip files.
I just want to have a confirmation that zipfile module is not compatible with gzip.
Confirmed.
If so, why it is the case?
A gzip file is a single file compressed with zlib with a very small header. A zip file is multiple files, each optionally compressed with zlib, in a single archive with a header and directory.
Are zip and gzip considered different format?
Yes.

Categories