zipfile.BadZipFile: Bad offset for central directory - python

I have designed a webpage that allows the user to upload a zip file. What I want to do is store this zip file directly into my sqlite database as a large binary object, then be able to read this binary object as a zipfile using the zipfile package. Unfortunately this doesn't work because attempting to pass the file as a binary string in io.BytesIO into zipfile.ZipFile gives the error detailed in the title.
For my MWE, I exclude the database to better demonstrate my issue.
views = Blueprint('views', __name__)
#views.route("/upload", methods=["GET", "SET"])
def upload():
# Assume that file in request is a zip file (checked already)
f = request.files['file']
zip_content = f.read()
# Store in database
# ...
# at some point retrieve the file from database
archive = zipfile.ZipFile(io.BytesIO(zip_content))
return ""
I have searched for days on-end how to fix this issue without success. I have even printed out zip_content and the contents of io.BytesIO(zip_content) after applying .read() and they are exactly the same string.
What am I doing wrong?

Solved. Using f.read() only gets the name of the zip file. I needed to use f.getvalue() instead to get the full file contents.

Related

Django Python save database blog posts to a zipfile?

I have a django blog and want to download a backup zipfile with all the entries. The blog post text content is stored in the database.
I have written this code with the goal of trying to get the zipfile to save a bunch of .txt files in the main zip directory, but all this code does is outputs a single corrupted zip file. It cannot be unzipped but for some reason it can be opened in Word and it shows all of the blog post text mashed up.
def download_backups(request):
zip_filename = "test.zip"
s = BytesIO()
zf = zipfile.ZipFile(s, "w")
blogposts = Blog.objects.all()
for blogpost in blogposts:
filename = blogpost.title + ".txt"
zf.writestr(filename, blogpost.content)
resp = HttpResponse(s.getvalue())
resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
return resp
Any help is appreciated.
Based on this answer to another question, you may be having an issue with the read mode. You'll also need to call zf.close(), either explicitly or implicitly, before the file will actually be complete.
I think there's a simpler way of handling this using a temporary file, which should have the advantage of not needing to fit all of the file's contents in memory.
from tempfile import TemporaryFile
from zipfile import ZipFile
with TemporaryFile() as tf:
with ZipFile(tf, mode="w") as zf:
zf.writestr("file1.txt", "The first file")
zf.writestr("file2.txt", "A second file")
tf.seek(0)
print(tf.read())
The with blocks here will result in your temp file going out of scope and being deleted, and zf.close being called implicitly before you attempt to read the file.
If the goal here is just to back up the data rather than using this specific format, though, I'd suggest using the built-in dumpdata management command. You can call it from code if you want to serve the results through a view like this.

Read-in Files from Flask request module

I am trying to read-in a file from a Python request, form data. All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body. I see many examples out there like: if 'filename' in request.files:, however this never works for me. I know that the file does in fact live within the ImmutableMultiDict type. Here is my working code example:
if 'my_file.xls' in request.files:
# do something
else:
# return error
if 'file' in request.files:
This is looking for the field name 'file' which corresponds to the name attribute you set in the form:
<input type='file' name='file'>
You then need to do something like this to assign the FileStorage object to the variable mem:
mem = request.files['file']
See my recent answer for more details of how and why.
You can then access the filename itself with:
mem.filename # should give back 'my_file.xls'
To actually read the stream data:
mem.read()
The official flask docs have further info on this, and how to save to disk with secure_filename() etc. Probably worth a read.
All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body.
If you actually want to read the contents of that Excel file, then you'll need to use a library which has compatibility for this such as xlrd. this answer demonstrates how to open a workbook, passing it as a stream. Note that they have used fileobj as the variable name, instead of mem.

How to create a PDF from a binary string?

There is a request has been made to the server using Python's requests module:
requests.get('myserver/pdf', headers)
It returned a status-200 response, which all contains PDF binary data in response.content
Question
How does one create a PDF file from the response.content?
You can create an empty pdf then save write to that pdf in binary like this:
from reportlab.pdfgen import canvas
import requests
# Example of path. This file has not been created yet but we
# will use this as the location and name of the pdf in question
path_to_create_pdf_with_name_of_pdf = r'C:/User/Oleg/MyDownloadablePdf.pdf'
# Anything you used before making the request. Since you did not
# provide code I did not know what you used
.....
request = requests.get('myserver/pdf', headers)
#Actually creates the empty pdf that we will use to write the binary data to
pdf_file = canvas.Canvas(path_to_create_pdf_with_name_of_pdf)
#Open the empty pdf that we created above and write the binary data to.
with open(path_to_create_pdf_with_name_of_pdf, 'wb') as f:
f.write(request.content)
f.close()
The reportlab.pdfgen allows you to make a new pdf by specifying the path you want to save the pdf in along with the name of the pdf using the canvas.Canvas method. As stated in my answer you need to provide the path to do this.
Once you have an empty pdf, you can open the pdf file as wb (write binary) and write the content of the pdf from the request to the file and close the file.
When using the path - ensure that the name is not the name of any existing files to ensure that you do not overwrite any existing files. As the comments show, if this name is the name of any other file then you risk overwriting the data. If you are doing this in a loop for example, you will need to specify the path with a new name at each iteration to ensure that you have a new pdf each time. But if it is a one-off thing then you do not run that risk so as long as it is not the name of another file.

Python basics - request data from API and write to a file

I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)
filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)

Can't read file from secure_filename(f.filename)

I need to get a binary file from wtforms and store it as bytea in postgresql. And I don't need to store it permanently as a file. From my understanding of the Flask offical doc I shall be able to access the filename through either request.files.['myfile'].filename or secure_filename(f.filename). However, both of them give me a error: IOError: [Errno 2] No such file or directory: u'myuploadpdf.pdf'
f = request.files.['myfile']:
if f and allowed_file(f.filename):
#filename = secure_filename(f.filename)
data = open(f.filename, 'rb').read()
#data = open(filename , 'rb').read()
binary = psycopg2.Binary(data)
open() expects a pathname to the file. Since the file hasn't been saved to disk, no such path exists. :)
What you actually want to do is call f.read() directly. Reading incoming files is covered here.
Also, definitely use secure_filename() if you work with anything on disk. Don't want to open yourself to any directory traversal attacks down the line.
The objects in request.files are FileStorage objects and they have the same methods as normal file objects in python. So to get the contents of the file as binary, try doing this:
data = request.files['myfile'].read()

Categories