I got the following code from my question on how to convert the tar.gz file to zip file.
import tarfile, zipfile
tarf = tarfile.open(name='sample.tar.gz', mode='r|gz' )
zipf = zipfile.ZipFile.open( name='myzip.zip', mode='a', compress_type=ZIP_DEFLATED )
for m in tarf.getmembers():
f = tarf.extractfile( m )
fl = f.read()
fn = m.name
zipf.writestr( fn, fl )
tarf.close()
zipf.close()
but when I run it I get the error.
What should I change in the code to make it work?
NameError: name 'ZIP_DEFLATED' is not defined
ZIP_DEFLATED is a name defined by the zipfile module; reference it from there:
zipf = zipfile.ZipFile(
'myzip.zip', mode='a',
compression=zipfile.ZIP_DEFLATED)
Note that you don't use the ZipFile.open() method here; you are not opening members in the archive, you are writing to the object.
Also, the correct ZipFile class signature names the 3rd argument compression. compress_type is only used as an attribute on ZipInfo objects and for the ZipFile.writestr() method. The first argument is not named name either; it's file, but you normally would just pass in the value as a positional argument.
Next, you can't seek in a gzip-compressed tarfile, so you'll have issues accessing members in order if you use tarf.getmembers(). This method has to do a full scan to find all members to build a list, and then you can't go back to read the file data anymore.
Instead, iterate directly over the object, and you'll get member objects in order at a point you can still read the file data too:
for m in tarf:
f = tarf.extractfile( m )
fl = f.read()
fn = m.name
zipf.writestr( fn, fl )
Related
Based on this answer, is there an option to rename the file when extracting it? Or what is the best solution to do so?
Didn't find anything on the documentation
I found two methods:
Using ZipFile.read()
You can get data from zip file using ZipFile.read() and write it with new name using standard open(), write()
import zipfile
z = zipfile.ZipFile('image.zip')
for f in z.infolist():
data = z.read(f)
with open('new_name.png', 'wb') as fh:
fh.write(data)
Using zipfile.extract() with ZipInfo
You can change name before using extract()
import zipfile
z = zipfile.ZipFile('image.zip')
for f in z.infolist():
#print(f.filename)
#print(f.orig_filename)
f.filename = 'new_name.png'
z.extract(f)
This version can automatically create subfolders if you use
f.filename = 'folder/subfolder/new_name.png'
z.extract(f)
f.filename = 'new_name.png'
z.extract(f, 'folder/subfolder')
I added sections and its values to ini file, but configparser doesn't want to print what sections I have in total. What I've done:
import configparser
import os
# creating path
current_path = os.getcwd()
path = 'ini'
try:
os.mkdir(path)
except OSError:
print("Creation of the directory %s failed" % path)
# add section and its values
config = configparser.ConfigParser()
config['section-1'] = {'somekey' : 'somevalue'}
file = open(f'ini/inifile.ini', 'a')
with file as f:
config.write(f)
file.close()
# get sections
config = configparser.ConfigParser()
file = open(f'ini/inifile.ini')
with file as f:
config.read(f)
print(config.sections())
file.close()
returns
[]
The similar code was in the documentation, but doesn't work. What I do wrong and how I could solve this problem?
From the docs, config.read() takes in a filename (or list of them), not a file descriptor object:
read(filenames, encoding=None)
Attempt to read and parse an iterable of filenames, returning a list of filenames which were successfully parsed.
If filenames is a string, a bytes object or a path-like object, it is treated as a single filename. ...
If none of the named files exist, the ConfigParser instance will contain an empty dataset. ...
A file object is an iterable of strings, so basically the config parser is trying to read each string in the file as a filename. Which is sort of interesting and silly, because if you passed it a file that contained the filename of your actual config...it would work.
Anyways, you should pass the filename directly to config.read(), i.e.
config.read("ini/inifile.ini")
Or, if you want to use a file descriptor object instead, simply use config.read_file(f). Read the docs for read_file() for more information.
As an aside, you are duplicating some of the work the context manager is doing for no gain. You can use the with block without creating the object explicitly first or closing it after (it will get closed automatically). Keep it simple:
with open("path/to/file.txt") as f:
do_stuff_with_file(f)
I have a python script that gets input file names from the command prompt. I created a list to store all the input files and pass that to a function to create a new file with all the input files merged at once. Now, I pass this newly written file as an input to another function. I am getting an error message
TypeError: coercing to Unicode: need string or buffer, list found
Code:
file_list = []
for arg in range(1,len(sys.argv)-2):
file_list.append(sys.argv[arg])
process_name = sys.argv[len(sys.argv)-1]
integrate_files(file_list,process_name)
def integrate_files(file_list,process_name):
with open('result.log', 'w' ) as result:
for file_ in file_list:
for line in open( file_, 'r' ):
result.write( line )
start_process(result,process_name)
def start_process(result,process_name):
with open(result,'r') as mainFile:
content = mainFile.readlines()
I am getting this error highlighted at the lines having the word with.open(). I tried to print the abspath of the result.log file. It printed closed file 'result.log', mode 'w' at 0x000000000227578. Where am I going wrong ? How should I create a new file and pass it to a function?
Your problem is that result is a closed file object:
start_process(result,process_name)
I think you want
start_process('result.log', process_name)
You could clean the script up a bit with
import shutil
file_list = sys.argv[1:-1]
process_name = sys.argv[-1]
integrate_files(file_list,process_name)
def integrate_files(file_list,process_name):
with open('result.log', 'w' ) as result:
for file_ in file_list:
with open(file_) as infile:
shutil.copyfileobj(infile, result)
start_process('result.log',process_name)
def start_process(result,process_name):
with open(result,'r') as mainFile:
content = mainFile.readlines()
The issue is here:
with open('result.log', 'w' ) as result:
# ...
start_process(result,process_name)
Since you reopen your file in start_process, you should just pass the name:
start_process(result.name, process_name)
Or just be explicit:
start_process('result.log', process_name)
When you write with open('result.log', 'w') as result:, you make result be an object representing the actual file on disk. That is different from the name of the file.
You certainly can pass that result to another function. But since it will be the actual file object, and not a file name, you can't pass that to open - open expects a name of a file, and looks for the file with that name, in order to create a new file object.
You can call methods on that file object, but none of them will actually re-open the file. Instead, the simplest thing is to remember and pass the file name, so that start_process can open it again.
As shown in #matsjoyce's answer, the file object remembers the original file name. So you could pass the object, and have start_process get the name. But that's messy. Really, just pass the name. (You could, like mats showed, pass result.name explicitly instead of making your own name variable first). Passing file objects around is usually not what you want - do it only when you want to split the reading/writing work across functions (and have a good reason for that).
In this:
with open('result.log', 'w' ) as result:
When you define result above, you are only defining it for that single loop, so it won't pass when you call start_process
So either change start_process to:
with open('result.log','r') as mainFile:
Or you could pass the string result.log into start_process instead of the variable result:
file_list = []
for arg in range(1,len(sys.argv)-2):
file_list.append(sys.argv[arg])
process_name = sys.argv[len(sys.argv)-1]
integrate_files(file_list,process_name)
def integrate_files(file_list,process_name):
with open('result.log', 'w' ) as result:
for file_ in file_list:
for line in open( file_, 'r' ):
result.write( line )
start_process('result.log',process_name)
def start_process(result,process_name):
with open(result,'r') as mainFile:
content = mainFile.readlines()
I have a zip archive: my_zip.zip. Inside it is one txt file, the name of which I do not know. I was taking a look at Python's zipfile module ( http://docs.python.org/library/zipfile.html ), but couldn't make too much sense of what I'm trying to do.
How would I do the equivalent of 'double-clicking' the zip file to get the txt file and then use the txt file so I can do:
>>> f = open('my_txt_file.txt','r')
>>> contents = f.read()
What you need is ZipFile.namelist() that will give you a list of all the contents of the archive, you can then do a zip.open('filename_you_discover') to get the contents of that file.
import zipfile
# zip file handler
zip = zipfile.ZipFile('filename.zip')
# list available files in the container
print (zip.namelist())
# extract a specific file from the zip container
f = zip.open("file_inside_zip.txt")
# save the extraced file
content = f.read()
f = open('file_inside_zip.extracted.txt', 'wb')
f.write(content)
f.close()
import zipfile
zip=zipfile.ZipFile('my_zip.zip')
f=zip.open('my_txt_file.txt')
contents=f.read()
f.close()
You can see the documentation here. In particular, the namelist() method will give you the names of the zip file members.
Hello
My error is produced in generating a zip file. Can you inform what I should do?
main.py", line 2289, in get
buf=zipf.read(2048)
NameError: global name 'zipf' is not defined
The complete code is as follows:
def addFile(self,zipstream,url,fname):
# get the contents
result = urlfetch.fetch(url)
# store the contents in a stream
f=StringIO.StringIO(result.content)
length = result.headers['Content-Length']
f.seek(0)
# write the contents to the zip file
while True:
buff = f.read(int(length))
if buff=="":break
zipstream.writestr(fname,buff)
return zipstream
def get(self):
self.response.headers["Cache-Control"] = "public,max-age=%s" % 86400
start=datetime.datetime.now()-timedelta(days=20)
count = int(self.request.get('count')) if not self.request.get('count')=='' else 1000
from google.appengine.api import memcache
memcache_key = "ads"
data = memcache.get(memcache_key)
if data is None:
a= Ad.all().filter("modified >", start).filter("url IN", ['www.koolbusiness.com']).filter("published =", True).order("-modified").fetch(count)
memcache.set("ads", a)
else:
a = data
dispatch='templates/kml.html'
template_values = {'a': a , 'request':self.request,}
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Content-Length'] = len(output)
zipstream=StringIO.StringIO()
file = zipfile.ZipFile(zipstream,"w")
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
zipstream.seek(0)
# create and return the output stream
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = 'attachment; filename="list.kmz"'
while True:
buf=zipf.read(2048)
if buf=="": break
self.response.out.write(buf)
That is probably zipstream and not zipf. So replace that with zipstream and it might work.
i don't see where you declare zipf?
zipfile? Senthil Kumaran is probably right with zipstream since you seek(0) on zipstream before the while loop to read chunks of the mystery variable.
edit:
Almost certainly the variable is zipstream.
zipfile docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter
should be 'r' to read an existing
file, 'w' to truncate and write a new
file, or 'a' to append to an existing
file. If mode is 'a' and file refers
to an existing ZIP file, then
additional files are added to it. If
file does not refer to a ZIP file,
then a new ZIP archive is appended to
the file. This is meant for adding a
ZIP archive to another file (such as
python.exe).
your code:
zipsteam=StringIO.StringIO()
create a file-like object using StringIO which is essentially a "memory file" read more in docs
file = zipfile.ZipFile(zipstream,w)
opens the zipfile with the zipstream file-like object in 'w' mode
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
uses the addFile method to retrieve and write the retrieved data to the file-like object and returns it. The variables are slightly confusing because you pass a zipfile to the addFile method which aliases as zipstream (confusing because we are using zipstream as a StringIO file-like object). Anyways, the zipfile is returned, and closed to make sure everything is "written".
It was written to our "memory file", which we now seek to index 0
zipstream.seek(0)
and after doing some header stuff, we finally reach the while loop that will read our "memory-file" in chunks
while True:
buf=zipstream.read(2048)
if buf=="": break
self.response.out.write(buf)
You need to declare:
global zipf
right after your
def get(self):
line. you are modifying a global variable, and this is the only way python knows what you are doing.