Large file upload fails - python

I'm in the process of writing a python module to POST files to a server , I can upload files of size of upto 500MB but when I tried to upload a 1gb file the upload failed, If I were to use something like cURL it won't fail. I got the code after googling how to upload multipart formdata using python , the code can be found here. I just compiled and ran that code , the error I'm getting is this
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
opener.open("http://127.0.0.1/test_server/upload",params)
File "C:\Python27\lib\urllib2.py", line 392, in open
req = meth(req)
File "C:\Python27\MultipartPostHandler.py", line 35, in http_request
boundary, data = self.multipart_encode(v_vars, v_files)
File "C:\Python27\MultipartPostHandler.py", line 63, in multipart_encode
buffer += '\r\n' + fd.read() + '\r\n'
MemoryError
I'm new to python and having a hard time grasping it. I also came across another program here , I'll be honest I don't know how to run it. I tried running it by guessing based on the function name , but that didn't work.

The script in question isn't very smart and builds the POST body in memory.
Thus, to POST a 1GB file, you'll need 1GB of memory just to hold that data, plus the HTTP headers, boundaries, and python and the code itself.
You'd have to rework the script to use mmap instead, where you first construct the whole body in a temp file before handing that file wrapped in a mmap.mmap value to passing it to request.add_data.
See Python: HTTP Post a large file with streaming for hints on how to achieve that.

Related

Python, running into memory error when parsing a 30MB file(already downloaded into my local computer)

Here is my download address, the file name is 'kosarak'
http://fimi.uantwerpen.be/data/
My parsing code is:
parsedDat = [line.split() for line in open('kosarak.dat').readlines()]
I need this data as a whole to run some method on it, so read one line by one line and do the operation on each line is not fit for me here.
The file is only 30 MB and my computer has at least 10G memory left and 30+G Hard drive place,So I guess there shouldn't be any resource problem
FYI: My python version is 2.7 and I am running my python inside Spyder. My OS is windows 10.
PS: You don't need to use my parsing code/method to do the job,as long as you could get the data from file to my python environment that would be perfect.
Perhaps this may help.
with open('kosarak.dat', 'r') as f: # Or 'rb' for binary data.
parsed_data = [line.split() for line in f]
The difference being that your approach reads all of the lines in the file at once and then processes each one (effectively requiring 2x memory, once for the file data and once again for the parsed data, all of which must be stored in memory at the same time), whereas this approach just reads the file line by line and only needs the memory for the resulting parsed_data.
In addition, your method did not explicitly close the file (although you may just not have shown that portion of your code). This method uses a context manager (with expression [as variable]:) which will close the object automatically once the with block terminates, even following an error. See PEP 343.

Decompress zip file with password fails - bug in Python?

I get a strange error in python. When I try to extract a password protected file using the zip module, I get an exception when trying to set "oy" as password. Everything else seems to work. A bug in ZipFile module?
import zipfile
zip = zipfile.ZipFile("file.zip", "r")
zip.setpassword("oy".encode('utf-8'))
zip.extractall() #Above password "oy" generates the error here
zip.close()
This is the exception I get:
Traceback (most recent call last):
File "unzip.py", line 4, in <module>
zip.extractall()
File "C:\Program Files\Python32\lib\zipfile.py", line 1002, in extrac
l
self.extract(zipinfo, path, pwd)
File "C:\Program Files\Python32\lib\zipfile.py", line 990, in extract
return self._extract_member(member, path, pwd)
File "C:\Program Files\Python32\lib\zipfile.py", line 1035, in _extra
member
shutil.copyfileobj(source, target)
File "C:\Program Files\Python32\lib\shutil.py", line 65, in copyfileo
buf = fsrc.read(length)
File "C:\Program Files\Python32\lib\zipfile.py", line 581, in read
data = self.read1(n - len(buf))
File "C:\Program Files\Python32\lib\zipfile.py", line 633, in read1
max(n - len_readbuffer, self.MIN_READ_SIZE)
zlib.error: Error -3 while decompressing: invalid block type
If I use UTF-16 as encoding I get this error:
zlib.error: Error -3 while decompressing: invalid distance too far back
EDIT
I have now tested on a virtual Linux machine with following stuff:
Python version: 2.6.5
I created a password protected zip file with zip -e file.zip
hello.txt
Now it seems the problem is something else. Now I can extract the zip file even if the password is wrong!
try:
zip.setpassword("ks") # "ks" is wrong password but it still extracts the zip
zip.extractall()
except RuntimeException:
print "wrong!"
Sometimes I can extract the zip file with an incorrect password. The file (inside the zip file) is then extracted but when I try to open it the information seems to be corrupted/decrypted.
If there's a problem with the password, usually you get the following exception:
RuntimeError: ('Bad password for file', <zipfile.ZipInfo object at 0xb76dec2c>)
Since your exception complains about block type, most probably your .zip archive is corrupted, have you tried to unpack it with standalone unzip utility?
Or maybe you have used something funny, like 7zip to create it, which makes incompatible .zip archives.
You don't provide enough information (OS version? Python version? ZIP archive creator and contents? are there many files in those archives or single file in single archive? do all those files give same errors, or you can unpack some of them?), so here's quick Q&A section, which should help you to find and remedy the problem.
Q1. Is this a bug in Python?
A1. Unlikely.
Q2. What might cause this behaviour?
A2. Broken zip files, incompatible zip compressors -- since you don't tell anything, it's hard to point the the exact cause.
Q3. How to find the cause?
A3. Try to isolate the problem, find the file which gives you an error, try to use zip.testzip() and/or decompress that particular file with different unzip utility, share the results. Only you have access to the problematic files, so nobody can help you unless you try to do something yourself.
Q4. How to fix this?
A4. You cannot. Use different zip extractor, ZipFile won't work.
Try using the testzip() method to check the file's integrity before extracting files.
It could be possibly a bug in zipfile, or a bug in your zip implementation. I noted that your line numbers do not match mine so I guess this is python 3.2 earlier than the current 3.2.3 release I have.
Now, as to your code, it does work for me on Python 3.2.3 on Linux. I suggest you update to the latest 3.2.x as there seem to be a number of bug fixes related to zipfile and zlib, including fixes for crashes.

SFTP error with Python/Paramiko writing a file

I'm trying to write a small Python script that will get query results from a database, write them to a file, and then sftp the file to a different server. The pieces work just fine but I'm getting a weird error when trying to sftp the file immediately after it's written.
The error I'm getting is
File "/usr/lib/python2.4/site-packages/paramiko/sftp_client.py", line 558, in put
file_size = os.stat(localpath).st_size
TypeError: coercing to Unicode: need string or buffer, file found
The offending line of code is just
sftp.put(outputfile, sftpoutputfile)
I tried using a copy of the output file instead of the one that's being written in the script and that worked exactly as it's supposed to. I'm calling file.close() after the file is written (and before setting up the sftp) so it seems like the file should be, well, closed and usable after that. Can someone tell me what I'm doing wrong? I can post more of the code if that would be helpful. Thank you very much.
The error message is telling you that it (in this case, os.stat) wants a stringlike object, and you're giving it the file instead.
Looking at the source of sftp_client.py in my copy of paramiko, we see
def put(self, localpath, remotepath, callback=None, confirm=True):
[...]
file_size = os.stat(localpath).st_size
fl = file(localpath, 'rb')
try:
fr = self.file(remotepath, 'wb')
fr.set_pipelined(True)
so I'm pretty sure that it wants the filename, not the file itself.

Errors during downloading data from Google App Engine using bulkloader

I am trying to download some data from the datastore using the following
command:
appcfg.py download_data --config_file=bulkloader.yaml --application=myappname
--kind=mykindname --filename=myappname_mykindname.csv
--url=http://myappname.appspot.com/_ah/remote_api
When I didn't have much data in this particular kind/table I could
download the data in one shot - occasionally running into the
following error:
.................................[ERROR ] [Thread-11]
ExportProgressThread:
Traceback (most recent call last):
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 1448, in run
self.PerformWork()
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2216, in PerformWork
item.key_end)
File "C:\Program Files\Google\google_appengine\google\appengine\tools
\bulkload
er.py", line 2011, in StoreKeys
(STATE_READ, unicode(kind), unicode(key_start), unicode(key_end)))
OperationalError: unable to open database file
This is what I see in the server log:
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 277, in post
response_data = self.ExecuteRequest(request)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/remote_api/handler.py", line 308, in ExecuteRequest
response_data)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 86, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_stub_map.py", line 286, in MakeSyncCall
rpc.CheckSuccess()
File "/base/python_runtime/python_lib/versions/1/google/appengine/
api/apiproxy_rpc.py", line 126, in CheckSuccess
raise self.exception
ApplicationError: ApplicationError: 4 no matching index found.
When that error appeared I would simply re-run the download and things
would work out well.
Of late, I am noticing that as the size of my kind increases, the
download tool fails much more often. For instance, with a kind with
~3500 entities I had to run to the command 5 times - only the last of
which succeeded. Is there a way around this error? Previously, my only
worry was I wouldn't be able to automate downloads in a script because
of the occasional failures - now I am scared I won't be able to get my
data out at all.
This issue was discussed previously here
but the post is old and I am not sure what the suggested flag does -
hence posting my similar query again.
Some additional details.
As mentioned here I tried the suggestion to proceed with interrupted downloads (in the section Downloading Data from App Engine ). When I resume after the interruption, I get no errors, but the number of rows that are downloaded are lesser than the entity count the datastore admin shows me.This is the message I get:
[INFO ] Have 3220 entities, 3220 previously transferred
[INFO ] 3220 entities (1003 bytes) transferred in 2.9 seconds
The datastore admin tells me this particular kind has ~4300 entities. Why aren't the remaining entities getting downloaded?
Thanks!
I am going to make a completely uneducated guess at this just based on the fact that I saw the word "unicode" in the first error; I had an issue that was related to my data being user generated from the web. A user put in a few unicode characters and a whole load of stuff started breaking - probably my fault - as I had implemented pretty looking repr functions and a load of other stuff. If you can, take a quick scan of your data via the console utility in your live app, maybe (if it's only 4k records), try converting all of the data to ascii strings to find any that don't conform.
And after that, I started "sanitising" user inputs (sorry, but my "public handle" field needs to be ascii only players!)

WindowsError: priveledged instruction when saving a FreeImagePy Image in script, works in IDLE

I'm working on a program to do some image wrangling in Python for work. I'm using FreeImagePy because PIL doesn't support multi-page TIFFs. Whenever I try to save a file with it from my program I get this error message (or something similar depending on which way I try to save):
Error returned. TIFF FreeImage_Save: failed to open file C:/OCRtmp/ocr page0
Traceback (most recent call last):
File "C:\Python25\Projects\OCRPageUnzipper\PageUnzipper.py", line 102, in <mod
ule> OCRBox.convertToPages("C:/OCRtmp/ocr page",FIPY.FIF_TIFF)
File "C:\Python25\lib\site-packages\FreeImagePy\FreeImagePy\FreeImagePy.py", l
ine 2080, in convertToPages self.Save(FIF, dib, fileNameOut, flags)
File "C:\Python25\lib\site-packages\FreeImagePy\FreeImagePy\FreeImagePy.py", l
ine 187, in Save return self.__lib.Save(typ, bitmap, fileName, flags)
WindowsError: exception: priviledged instruction
When I try and do the same things from IDLE, it works fine.
Looks like a permission issues, make sure you don't have the file open in another application, and that you have write permissions to the file location your trying to write to.
That's what I thought too, but I figured it out a couple hours ago. Apparently if the directory/file I'm trying to write to doesn't exist, FreeImagePy isn't smart enough to create it (most of the time. Creating a new multipage image seems to work fine) but i guess running it within IDLE, IDLE figures it out and takes care of it or something. I managed to work around it by using os.mkdir to explicitly make sure things that I need exist.

Categories