Why zipfile module is_zipfile function cannot detech a gzip file?

Why zipfile module is_zipfile function cannot detech a gzip file? - python

I am aware of this question Why "is_zipfile" function of module "zipfile" always returns "false"?. I want to seek some more clarification and confirmation.
I have created a zip file in python using the gzip module.
If I check the zip file using the file command in OSX I get this
> file data.txt
data.txt: gzip compressed data, was "Slide1.html", last modified: Tue Oct 13 10:10:13 2015, max compression
I want to write a generic function to tell if the file is gzip'ed or not.
import gzip
import os
f = '/path/to/data.txt'
print os.path.exists(f) # True
with gzip.GzipFile(f) as zf:
print zf.read() # Print out content as expected
import zipfile
print zipfile.is_zipfile(f) # Give me false. Not expected
I want to use zipfile module but it always reports false.
I just want to have a confirmation that zipfile module is not compatible with gzip. If so, why it is the case? Are zip and gzip considered different format?

I have created a zip file in python using the gzip module.
No you haven't. gzip doesn't create zip files.
I just want to have a confirmation that zipfile module is not compatible with gzip.
Confirmed.
If so, why it is the case?
A gzip file is a single file compressed with zlib with a very small header. A zip file is multiple files, each optionally compressed with zlib, in a single archive with a header and directory.
Are zip and gzip considered different format?
Yes.

Related

compress multiple files into a bz2 file in python

I need to compress multiple files into one bz2 file in python.
I'm trying to find a way but I can't can find an answer.
Is it possible?

This is what tarballs are for. The tar format packs the files together, then you compress the result. Python makes it easy to do both at once with the tarfile module, where passing a "mode" of 'w:bz2' opens a new tar file for write with seamless bz2 compression. Super-simple example:
import tarfile
with tarfile.open('mytar.tar.bz2', 'w:bz2') as tar:
for file in mylistoffiles:
tar.add(file)
If you don't need much control over the operation, shutil.make_archive might be a possible alternative, which would simplify the code for compressing a whole directory tree to:
shutil.make_archive('mytar', 'bztar', directory_to_compress)

Take a look at python's bz2 library. Make sure to google and read the docs first!
https://docs.python.org/2/library/bz2.html#bz2.BZ2Compressor

you have import package for:
import tarfile,bz2
and multilfile compress in bz format
tar = tarfile.open("save the directory.tar.bz", "w:bz2")
for f in ["gti.png","gti.txt","file.taz"]:
tar.add(os.path.basename(f))
tar.close()
let use for in zip format was open in a directory open file
an use
os.path.basename(src_file)
open a only for file

Python's standard lib zipfile handles multiple files and has supported bz2 compression since 2001.
import zipfile
sourcefiles = ['a.txt', 'b.txt']
with zipfile.ZipFile('out.zip', 'w') as outputfile:
for sourcefile in sourcefiles:
outputfile.write(sourcefile, compress_type=zipfile.ZIP_BZIP2)

How to read gz compressed files from tar

Let's say we have a tar file which in turn contains multiple gzip compressed files. I want to be able to read the contents of those gzip files without compressing either the tar file or the individual gzip files. I 'm trying to use tarfile module in python.

This might work, I haven't tested it, but this has the main ideas, and related tools. It iterates over the files in the tar, and if they are gzipped, then will read them into the file_contents variable:
import tarfile as t
import gzip as g
for member in t.open("your.gz.tar").getmembers():
fo=t.extractfile(member)
file_contents = g.GzipFile(fileobj=fo).read()
note: if the file is too large for memory, then consider looking into a streamed reader (chunk by chunk) as linked.
If you have additional logic based on what the member (TarInfo) object looks like you can use these:
https://docs.python.org/2/library/tarfile.html#tarinfo-objects
see:
How can I decompress a gzip stream with zlib?
Python decompressing gzip chunk-by-chunk
reading tar file contents without untarring it, in python script

Reading gzipped data in Python

I have a *.tar.gz compressed file that I would like to read in with Python 2.7. The file contains multiple h5 formatted files as well as a few text files. I'm a novice with Python. Here is the code I'm trying to adapt:
`subset_path='c:\data\grant\files'
f=gzip.open(filename,'subset_full.tar.gz')
subset_data_path=os.path.join(subset_path,'f')
The first statement identifies the path to the folder with the data. The second statement tells Python to open a specific compressed file and the third statement (hopefully) executes a join of the prior two statements.
Several lines below this code I get an error when Python tries to use the 'subset_data_path' assignment.
What's going on?

The gzip module will only open a single file that has been compressed, i.e. my_file.gz. You have a tar archive of multiple files that are also compressed. This needs to be both untarred and uncompressed.
Try using the tarfile module instead, see https://docs.python.org/2/library/tarfile.html#examples
edit: To add a bit more information on what has happened, you have successfully opened the zipped tarball into a gzip file object, which will work almost the same as a standard file object. For instance you could call f.readlines() as if f was a normal file object and it would return the uncompressed lines.
However, this did not actually unpack the archive into new files in the filesystem. You did not create a subdirectory 'c:\data\grant\files\f', and so when you try to use the path subset_data_path you are looking for a directory that does not exist.
The following ought to work:
import tarfile
subset_path='c:\data\grant\files'
tar = tarfile.open("subset_full.tar.gz")
tar.extractall(subset_path)
subset_data_path=os.path.join(subset_path,'subset_full')

Python: Unzipping and decompressing .Z files inside .zip

I am trying to unzip a Alpha.zip folder which contains a Beta directory which contains a Gamma Folder which contains a.Z, b.Z, c.Z, d.Z files. Using zip and 7-zip I was able to extract all a.D, b.D, c.D, d.D files stored within the .Z files.
I tried this in python using Import gzip and Import zlib.
import sys
import os
import getopt
import gzip
f = open('a.d.Z','r')
file_content = f.read()
f.close()
I keep getting all sorts of errors including: this is not a zip file, return codecs.charmap_encode(input self.errors encoding_map) 0. Any suggestions as to how to code this?

You need to actually make use of a zip library of some kind. Right now you're importing gzip, but you're not doing anything with it. Try taking a look at the gzip documentation and opening the file using that library.
gzip_file = gzip.open('a.d.Z') # use gzip.open instead of builtin open function
file_content = gzip_file.read()
Edit based on your comment: you can't just open all kinds of compressed files with any compression library. Since you have a .Z file, it's likely that you want to use zlib rather than gzip, but since extensions are just conventions, only you know for sure what compression format your file is in. To use zlib, do something like this instead:
# Note: untested code ahead!
import zlib
with open('a.d.Z', 'rb') as f: # Notice that I open this in binary mode
file_content = f.read() # Read the compressed binary data
decompressed_content = zlib.decompress(file_content) # Decompress

Extracting BZ2 compressed folder using Python

I am trying to extract a bz2 compressed folder in a specific location.
I can see the data inside by :
handler = bz2.BZ2File(path, 'r')
print handler.read()
But I wish to extract all the files in this compressed folder into a location (specified by the user) maintaining the internal directory structure of the folder.
I am fairly new to this language .. Please help...

Like gzip, BZ2 is only a compressor for single files, it can not archive a directory structure. What I suspect you have is an archive that is first created by a software like tar, that is then compressed with BZ2. In order to recover the "full directory structure", first extract your Bz2 file, then un-tar (or equivalent) the file.
Fortunately, the Python tarfile module supports bz2 option, so you can do this process in one shot.

bzip2 is a data compression system which compresses one entire file. It does not bundle files and compress them like PKZip does. Therefore handler in your example has one and only one file in it and there is no "internal directory structure".
If, on the other hand, your file is actually a compressed tar-file, you should look at the tarfile module of Python which will handle decompression for you.

You need to use the tarfile module to uncompress a .tar.bz2 file ... from the docs here is how you can do it:
import tarfile
tar = tarfile.open(path, "r:bz2")
for tarinfo in tar:
print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
if tarinfo.isreg():
print "a regular file."
# read the file
f = tar.extractfile(tarinfo)
print f.read()
elif tarinfo.isdir():
print "a directory."
else:
print "something else."
tar.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why zipfile module is_zipfile function cannot detech a gzip file? - python

Related

compress multiple files into a bz2 file in python

How to read gz compressed files from tar

Reading gzipped data in Python

Python: Unzipping and decompressing .Z files inside .zip

Extracting BZ2 compressed folder using Python

Categories

Resources