I am trying to upload a zip file from Web2Py form and then read the contents:
form = FORM(TABLE(
TR(TD('Upload File:', INPUT(_type='file',
_name='myfile',
id='myfile',
requires=IS_NOT_EMPTY()))),
TR(TD(INPUT(_type='submit',_value='Submit')))
))
if form.accepts(request.vars):
data=StringIO.StringIO(request.vars.myfile)
import zipfile
zfile=zipfile.Zipfile(data)
For some reason this code does work and complains of file not being a zip file although the uploaded file is a zip file.
I am new to Web2Py. How can the data be represented as zip-file?
web2py form field uploads already are cgi.FieldStorage, you can get the raw uploaded bytes using:
data = request.vars.myfile.value
For a file-like object StringIO is not needed, use:
filelike = request.vars.myfile.file
zip = zipfile.Zipfile(filelike)
HTTP uploads aren't just raw binary, it's mixed-multipart-form encoded. Write request.vars.myfile out to disk and you'll see, it'll say something like
------------------BlahBlahBoundary
Content-Disposition: type="file"; name="myfile"
Content-Type: application/octet-stream
<binary data>
------------------BlahBlahBoundary--
The naive solution for this is, use cgi.FieldStorage(), the example I provide uses wsgi.input, which is part of mod_wsgi.
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
raw_filw = cStringIO.StringIO(form['myfile'].file.read())
Two things to point out here
Always use cStringIO if you have it,
it'll be faster than StringIO
If you allow uploads like this,
you're streaming the file into ram,
so, however big the file is is how
much ram you'll be using - this does
NOT scale. I had to write my own
custom MIME stream parser to stream
files to disk through python to avoid
this. But, if you're learning or this is
a proof of concept you should be fine.
Related
I am currently developing a little backend project for myself with the Python Framework FastAPI. I made an endpoint, where the user should be able to upload 2 files, while the first one is a zip-file (which contains X .xmls) and the latter a normal .xml file.
The code is as follows:
#router.post("/sendxmlinzip/")
def create_upload_files_with_zip(files: List[UploadFile] = File(...)):
if not len(files) == 2:
raise Httpex.EXPECTEDTWOFILES
my_file = files[0].file
zfile = zipfile.ZipFile(my_file, 'r')
filelist = []
for finfo in zfile.infolist():
print(finfo)
ifile = zfile.open(finfo)
line_list = ifile.readlines()
print(line_list)
This should print the content of the files, that are in the .zip file, but it raises the Exception
AttributeError: 'SpooledTemporaryFile' object has no attribute 'seekable'
In the row ifile = zfile.open(finfo)
Upon approximately 3 days research with a lot of trial and error involved, trying to use different functions such as .read() or .extract(), I gave up. Because the python docs literally state, that this should be possible in this way...
For you, who do not know about FastAPI, it's a backend fw for Restful Webservices and is using the starlette datastructure for UploadFile. Please forgive me, if I have overseen something VERY obvious, but I literally tried to check every corner, that may have been the possible cause of the error such as:
Check, whether another implementation is possible
Check, that the .zip file is correct
Check, that I attach the correct file (lol)
Debug to see, whether the actual data, that comes to the backend is indeed the .zip file
This is a known Python bug:
SpooledTemporaryFile does not fully satisfy the abstract for IOBase.
Namely, seekable, readable, and writable are missing.
This was discovered when seeking a SpooledTemporaryFile-backed lzma
file.
As #larsks suggested in his comment, I would try writing the contents of the spooled file to a new TemporaryFile, and then operate on that. As long as your files aren't too large, that should work just as well.
This is my workaround
with zipfile.ZipFile(io.BytesIO(file.read()), 'r') as zip:
I am currently writing json files to disk using
print('writing to disk .... ')
f = open('mypath/myfile, 'wb')
f.write(getjsondata.read())
f.close()
Which works perfectly, except that the json files are very large and I would like to compress them. How can I do that automatically? What should I do?
Thanks!
Python has a standard module for zlib, which can compress and decompress data for you. You can use this immediately on your data and write (and read) a custom format, or use the module gzip, which wraps the inner workings of zlib to read and write gzip compatible files, while
automatically compressing or decompressing the data so that it looks like an ordinary file object.
It thus neatly replaces the default open format to interact with files, and all you need is this:
import gzip
print('writing to disk .... ')
with gzip.open('mypath/myfile', 'wb') as f:
f.write(getjsondata.read())
(with a change in the open line because I highly recommend using the with syntax to handle file objects.)
I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)
filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)
Is it possible to append to a gzipped text file on the fly using Python ?
Basically I am doing this:-
import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'a', 9)
f.write(content)
f.close()
A line is appended (note "appended") to the file every 6 seconds or so, but the resulting file is just as big as a standard uncompressed file (roughly 1MB when done).
Explicitly specifying the compression level does not seem to make a difference either.
If I gzip an existing uncompressed file afterwards, it's size comes down to roughly 80kb.
Im guessing its not possible to "append" to a gzip file on the fly and have it compress ?
Is this a case of writing to a String.IO buffer and then flushing to a gzip file when done ?
That works in the sense of creating and maintaining a valid gzip file, since the gzip format permits concatenated gzip streams.
However it doesn't work in the sense that you get lousy compression, since you are giving each instance of gzip compression so little data to work with. Compression depends on taking advantage the history of previous data, but here gzip has been given essentially none.
You could either a) accumulate at least a few K of data, many of your lines, before invoking gzip to add another gzip stream to the file, or b) do something much more sophisticated that appends to a single gzip stream, leaving a valid gzip stream each time and permitting efficient compression of the data.
You find an example of b) in C, in gzlog.h and gzlog.c. I do not believe that Python has all of the interfaces to zlib needed to implement gzlog directly in Python, but you could interface to the C code from Python.
I am trying to upload large CSV files onto GAE using a zip using XML & HTTP POST
Steps:
CSV is zipped & base64 encoded and sent to GAE via XML/HTTP POST
GAE - using minidom to parse XML
GAE - Base64 decode ZIP
GAE - Get CSV from Zip file.
I have tried using zipfile but can't figure out how to create a zipfile object from the base 64decoded string
I get: TypeError: unbound method read() must be called with ZipFile instance as first argument (got str instance instead)
myZipFile = base64.decodestring(base64ZipFile)
objZip = zipfile.ZipFile(myZipFile,'r')
strCSV = zipfile.ZipFile.read(objZip,'list.csv')
As Rob mentioned, ZipFile requires a file-like object. You can use StringIO to provide a file-like interface to a string.
For example:
import StringIO
myZipFile = base64.decodestring(base64ZipFile)
objZip = zipfile.ZipFile(StringIO.StringIO(myZipFile),'r')
Yes you can. In fact, I wrote a blog post that describes how to do exactly that.
A simple approach might be to upload the zipped csv to the blobstore using the blob upload API, and process the zip file from there. You'd need to fake a form post, but life might be simpler for you on the appengine side.
There's an example of how to process zipped data in AppEngine MapReduce. See the BlobstoreZipInputReader class.
ZipFile does not take a string but a file-like object.
One solution is creating a tempfile to write the string to then passing that to ZipFile:
import tempfile
import zipfile
tmp = tempfile.TemporaryFile()
tmp.write(myZipFile) # myZipFile is your decoded string containing the zip-data
objZip = zipfile.ZipFile(tmp)