Python gzip: is there a way to decompress from a string? - python

I've read this SO post around the problem to no avail.
I am trying to decompress a .gz file coming from an URL.
url_file_handle=StringIO( gz_data )
gzip_file_handle=gzip.open(url_file_handle,"r")
decompressed_data = gzip_file_handle.read()
gzip_file_handle.close()
... but I get TypeError: coercing to Unicode: need string or buffer, cStringIO.StringI found
What's going on?
Traceback (most recent call last):
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2974, in _HandleRequest
base_env_dict=env_dict)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 411, in Dispatch
base_env_dict=base_env_dict)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2243, in Dispatch
self._module_dict)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2161, in ExecuteCGI
reset_modules = exec_script(handler_path, cgi_path, hook)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2057, in ExecuteOrImportScript
exec module_code in script_module.__dict__
File "/home/jldupont/workspace/jldupont/trunk/site/app/server/tasks/debian/repo_fetcher.py", line 36, in <module>
main()
File "/home/jldupont/workspace/jldupont/trunk/site/app/server/tasks/debian/repo_fetcher.py", line 30, in main
gziph=gzip.open(fh,'r')
File "/usr/lib/python2.5/gzip.py", line 49, in open
return GzipFile(filename, mode, compresslevel)
File "/usr/lib/python2.5/gzip.py", line 95, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: coercing to Unicode: need string or buffer, cStringIO.StringI found

If your data is already in a string, try zlib, which claims to be fully gzip compatible:
import zlib
decompressed_data = zlib.decompress(gz_data, 16+zlib.MAX_WBITS)
Read more: http://docs.python.org/library/zlib.html‎

gzip.open is a shorthand for opening a file, what you want is gzip.GzipFile which you can pass a fileobj
open(filename, mode='rb', compresslevel=9)
#Shorthand for GzipFile(filename, mode, compresslevel).
vs
class GzipFile
__init__(self, filename=None, mode=None, compresslevel=9, fileobj=None)
# At least one of fileobj and filename must be given a non-trivial value.
so this should work for you
gzip_file_handle = gzip.GzipFile(fileobj=url_file_handle)

You can use gzip.decompress from the gzip builtin Python library(available for Python 3.2+).
Example on how to decompress bytes:
import gzip
gzip.decompress(gzip_data)
Documentation
https://docs.python.org/3.5/library/gzip.html#gzip.decompress

Consider using gzip.GzipFile if you don't like passing obscure arguments to zlib.decompress.
When you deal with urllib2.urlopen response that can be either gzip-compressed or uncompressed:
import gzip
from StringIO import StringIO
# response = urllib2.urlopen(...
content_raw = response.read()
if 'gzip' in response.info().getheader('Content-Encoding'):
content = gzip.GzipFile(fileobj=StringIO(content_raw)).read()
When you deal with a file that can store either gzip-compressed or uncompressed data:
import gzip
# some_file = open(...
try:
content = gzip.GzipFile(fileobj=some_file).read()
except IOError:
some_file.seek(0)
content = some_file.read()
The examples above are in Python 2.7

Related

Decompress a gzip compressed dictionary object using python

I need to decompress this "H4sIAAAAAAAA/6tWKkktLjFUsjI00lEAs42UrCAMpVoAbyLr+R0AAAA=" which actually is compressed form of {"test1":12, "test2": "test"}. Now in python I'm using gzip library and getting below mentioned response:
>>> import gzip
>>> gzip.decompress("H4sIAAAAAAAA/6tWKkktLjFUsjI00lEAs42UrCAMpVoAbyLr+R0AAAA=".encode("UTF-8"))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 548, in decompress
return f.read()
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 292, in read
return self._buffer.read(size)
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 479, in read
if not self._read_gzip_header():
File "/data/python-3.8.10/lib/python3.8/gzip.py", line 427, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'H4')
Is there any way to decompress the string in python ?
The string is Base64 encoded. Therefore:-
import gzip
import base64
b = base64.b64decode('H4sIAAAAAAAA/6tWKkktLjFUsjI00lEAs42UrCAMpVoAbyLr+R0AAAA=')
r = gzip.decompress(b)
print(r.decode())

How to un-tar in-memory data using Python3?

I've got some tar data in bytes, and want to read it without writing it to the file system.
Writing it to the file system works:
with open('out.tar', 'wb') as f:
f.write(data)
then, in the shell: tar -xzvf out.tar
But the following errors:
import tarfile
tarfile.open(data, 'r')
'''
File ".../lib/python3.7/tarfile.py", line 1591, in open
return func(name, filemode, fileobj, **kwargs)
File ".../lib/python3.7/tarfile.py", line 1638, in gzopen
fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
File ".../lib/python3.7/gzip.py", line 163, in __init__
fileobj = self.myfileobj = builtins.open(fil
'''
what is the right way to read the tar in memory?
Update
The following works:
from io import BytesIO
tarfile.open(fileobj=BytesIO(data), 'r')
Why?
tarfile.open is supposed to be able to work with bytes. Converting the bytes to a file-like object myself and then telling tarfile.open to use the file-like object works, but why is the transformation necessary? When does the raw bytes-based API work vs. not work?
You can use the tarfile and from there you can read the data using Byte stream.
import tarfile
with tarfile.open(fileobj = BytesIO(your_file_name)) as tar:
for tar_file in tar:
if (tar_file.isfile()):
inner_data = tar.extractfile(tar_file).read().decode('utf-8')

botocore s3 put has issue hashing file due to encoding?

I'm having trouble figuring out why the file, the contents of which are "DELETE ME LATER", which is loaded with encoding utf-8 causes an exception in botocore when it's being hashed.
with io.open('deleteme','r', encoding='utf-8') as f:
try:
resp=client.put_object(
Body=f,
Bucket='s3-bucket-actual-name-for-real',
Key='testing/a/put'
)
print('deleteme exists')
print(resp)
except:
print('deleteme could not put')
raise
Produces:
deleteme could not put
Traceback (most recent call last): File
"./test_operator.py", line 41, in
Key='testing/a/put' File "/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/client.py",
line 312, in _api_call
return self._make_api_call(operation_name, kwargs) File "/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/client.py",
line 582, in _make_api_call
request_signer=self._request_signer, context=request_context) File
"/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/hooks.py",
line 242, in emit_until_response
responses = self._emit(event_name, kwargs, stop_on_response=True) File
"/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/hooks.py",
line 210, in _emit
response = handler(**kwargs) File "/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/handlers.py",
line 201, in conditionally_calculate_md5
calculate_md5(params, **kwargs) File "/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/handlers.py",
line 179, in calculate_md5
binary_md5 = _calculate_md5_from_file(body) File "/Users/lamblin/VEnvs/awscli/lib/python3.6/site-packages/botocore/handlers.py",
line 193, in _calculate_md5_from_file
md5.update(chunk)
TypeError: Unicode-objects must be encoded before hashing
Now this can be avoided by opening the file with 'rb' but, isn't the file object f clearly using an encoding?
Now this can be avoided by opening the file with 'rb' but, isn't the file object f clearly using an encoding?
The encoding specified to io.open in mode='r' is used to decode the content. So when you iterate f, the content has already been converted from bytes to str (text) by Python.
To interface with botocore directly, open your file with mode 'rb', and drop the encoding kwarg. There is no point to decode it to text when the first thing botocore will have to do in order to transport the content is just encode back into bytes again.

Python lib execute error

I made this python lib and it had this function with uses urllib and urllib2 but when i execute the lib's functions from python shell i get this error
>>> from sabermanlib import geturl
>>> geturl("roblox.com","ggg.html")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
geturl("roblox.com","ggg.html")
File "sabermanlib.py", line 21, in geturl
urllib.urlretrieve(Address,File)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 463, in open_file
return self.open_local_file(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 477, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] The system cannot find the file specified: 'roblox.com'
>>>
and here's the code for the lib i made:
import urllib
import urllib2
def geturl(Address,File):
urllib.urlretrieve(Address,File)
EDIT 2
I cant understand why i get this error in the python shell executing:
geturl(Address,File)
You don't want urllib.urlretrieve. This takes a file-like object. Instead, you want urllib.urlopen:
>>> help(urllib.urlopen)
urlopen(url, data=None, proxies=None)
Create a file-like object for the specified URL to read from.
Additionally, if you want to download and save a document, you'll need a more robust geturl function:
def geturl(Address, FileName):
html_data = urllib.urlopen(Address).read() # Open the URL
with open(FileName, 'wb') as f: # Open the file
f.write(html_data) # Write data from URL to file
geturl(u'http://roblox.com') # URL's must contain the full URI, including http://

How to upload binary file with ftplib in Python?

My python2 script uploads files nicely using this method but python3 is presenting problems and I'm stuck as to where to go next (googling hasn't helped).
from ftplib import FTP
ftp = FTP(ftp_host, ftp_user, ftp_pass)
ftp.storbinary('STOR myfile.txt', open('myfile.txt'))
The error I get is
Traceback (most recent call last):
File "/Library/WebServer/CGI-Executables/rob3/functions/cli_f.py", line 12, in upload
ftp.storlines('STOR myfile.txt', open('myfile.txt'))
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 454, in storbinary
conn.sendall(buf)
TypeError: must be bytes or buffer, not str
I tried altering the code to
from ftplib import FTP
ftp = FTP(ftp_host, ftp_user, ftp_pass)
ftp.storbinary('STOR myfile.txt'.encode('utf-8'), open('myfile.txt'))
But instead I got this
Traceback (most recent call last):
File "/Library/WebServer/CGI-Executables/rob3/functions/cli_f.py", line 12, in upload
ftp.storbinary('STOR myfile.txt'.encode('utf-8'), open('myfile.txt'))
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 450, in storbinary
conn = self.transfercmd(cmd)
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 358, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 329, in ntransfercmd
resp = self.sendcmd(cmd)
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 244, in sendcmd
self.putcmd(cmd)
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 179, in putcmd
self.putline(line)
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/ftplib.py", line 172, in putline
line = line + CRLF
TypeError: can't concat bytes to str
Can anybody point me in the right direction
The issue is not with the command argument, but with the the file object. Since you're storing binary you need to open file with 'rb' flag:
>>> ftp.storbinary('STOR myfile.txt', open('myfile.txt', 'rb'))
'226 File receive OK.'
APPEND to file in FTP.
Note: it's not SFTP - FTP only
import ftplib
ftp = ftplib.FTP('localhost')
ftp.login ('user','password')
fin = open ('foo.txt', 'r')
ftp.storbinary ('APPE foo2.txt', fin, 1)
Ref: Thanks to Noah

Categories