IOError: [Errno 22] Invalid argument when reading/writing large bytestring

IOError: [Errno 22] Invalid argument when reading/writing large bytestring - python

I'm getting
IOError: [Errno 22] Invalid argument
when I try to write a large bytestring to disk with f.write(), where f was opened with mode wb.
I've seen lots of people online getting this error when using a Windows network drive, but I'm on OSX (10.7 when I originally asked the question but 10.8 now, with a standard HFS+ local filesystem). I'm using Python 3.2.2 (happens on both a python.org binary and a homebrew install). I don't see this problem with the system Python 2.7.2.
I also tried mode w+b based on this Windows bug workaround, but of course that didn't help.
The data is coming from a large numpy array (almost 4GB of floats). It works fine if I manually loop over the string and write it out in chunks. But because I can't write it all in one pass, np.save and np.savez fail -- since they just use f.write(ary.tostring()). I get a similar error when I try to save it into an existing HDF5 file with h5py.
Note that I get the same problem when reading a file opened with file(filename, 'rb'): f.read() gives this IOError, while f.read(chunk_size) for reasonable chunk_size works.
Any thoughts?

This appears to be a general OSX bug with fread / fwrite and so isn't really fixable by a Python user. See numpy #3858, this torch7 commit, this SO question/answer, ....
Supposedly it's been fixed in Mavericks, but I'm still seeing the issue.
Python 2 may have worked around this or its io module may have always buffered large reads/writes; I haven't investigated thoroughly.

Perhaps try not opening with the b flag, I didn't think that was supported on all OS / filesystems.

Related

Dump Python sklearn model in Windows and read it in Linux

I am trying to save a sklearn model on a Windows server using sklearn.joblib.dump and then joblib.load the same file on a linux server (centOS71). I get the error below:
ValueError: non-string names in Numpy dtype unpickling
This is what I have tried:
Tried both python27 and python35
Tried the built in open() with 'wb' and 'rb' arguments
I really don't care how the file is moved, I just need to be able to move and load it in a reasonable amount of time.

Python pickle should run between windows/linux. There may be incompatibilities if:
python versions on the two hosts are different (If so, try installing same version of python on both hosts); AND/OR
if one machine is 32-bit and another is 64-bit (I dont know any fix so far for this problem)

cv2.imread() always throws a Nonetype only in this one program?

Here is my code.
Line number 65
ma=cv2.imread(str(files1[x]),1)
The result of a cv2.imread() is always a None.
I have done all basic checks.
The file which i'm trying to read exists.
There are no other variables or functions called cv2 or imread.
I'm using the same version of python in all cases.
I have only one version of opencv installed.
I'm using ubuntu 14.04 and the folder has read and write permissions for all users.
I have also tested it on Pycharm with a new python file and it works.
I have only problem with this program.
Please let me know if u have any ideas about this problem.

Distributing a Python script to unpack .tar.xz

Is there a way to distribute a Python script that can unpack a .tar.xz file?
Specifically:
This needs to run on other people's machines, not mine, so I can't require any extra modules to have been installed.
I can get away with assuming the presence of Python 2.7, but not 3.x.
So that seems to amount to asking whether out-of-the-box Python 2.7 has such a feature, and as far as I can tell the answer is no, but is there anything I'm missing?

First decompress the xz file into tar data and then extract the tar data:
import lzma
import tarfile
with lzma.open("file.tar.xz") as fd:
with tarfile.open(fileobj=fd) as tar:
content = tar.extractall('/path/to/extract/to')
For python2.7 you need to install pip27.pylzma

High memory usage with Pythons native tarfile lib

I'm working in a memory constrained environment and uses a Python script with tarfile library (http://docs.python.org/2/library/tarfile.html) to continuously make backups of log files.
As the number of log files have grown (~74 000) I noticed that the system effectively kills this backup process when it runs now. I noticed that it consumes an awful lots of memory (~192mb before it gets killed by OS).
I can make a gzip tar archive ($ tar -czf) of the log files without a problem or high memory usage.
Code:
import tarfile
t = tarfile.open('asdf.tar.gz', 'w:gz')
t.add('asdf')
t.close()
The dir "asdf" consists of 74407 files with filenames of length 73.
Is it not recommended to use Python's tarfile when you have a huge amount of files ?
I'm running Ubuntu 12.04.3 LTS and Python 2.7.3 (tarfile version seems to be "$Revision: 85213 $").

I did some digging in the source code and it seems that tarfile is storing all files in a list of TarInfo objects (http://docs.python.org/2/library/tarfile.html#tarfile.TarFile.getmembers), causing the ever increasing memory footprint with many and long file names.
The caching of these TarInfo objects seems to have been optimized significantally in a commit from 2008, http://bugs.python.org/issue2058, but from what I can see it was only merged with py3k branch, for Python 3.
One could reset the members list again and again, as in http://blogs.it.ox.ac.uk/inapickle/2011/06/20/high-memory-usage-when-using-pythons-tarfile-module/, however I'm not sure what internal tarfile functionality one misses then so I went with using a system level call instead (> os.system('tar -czf asdf.tar asdf/').

two ways to solve: if your VM does not have swap add and try. i have 13GB files to be tarred to a big bundle it was consistently failing. OS killed . Adding 4GB swap helped.
if you are using k8-pod, or docker container one quick workaround could be - add swap in host , capability:sys-admin or privilege mode will use host swap.
if you need tarfile with stream to avoid memory - checkout : https://gist.github.com/leth/6adb9d30f2fdcb8802532a87dfbeff77

Python: writing to another process's memory under linux

How to write to another process's address space using python under Ubuntu Linux?
My attempts:
1) Using the virtual file /proc/$PID/mem and seeking to the address. I have successfully used it to read memory, but attempting to write causes an IOError:
fd=open("/proc/"+pid+"/mem","r+")
fd.seek(address,0)
fd.write("ABC")
Output:
IOError: [Errno 22] Invalid argument
2) Attempting to use the python-ptrace library as suggested in other threads. However, I cannot find good documentation or example code.
Note: this is not a permissions issue, running as root produces the same behaviour.

Found a solution here: http://tito.googlecode.com/svn-history/r2/trunk/draft/fakefs.py
It uses the ctypes package to load libc, then libc.ptrace with the POKEDATA option to write the bytes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

IOError: [Errno 22] Invalid argument when reading/writing large bytestring - python

Perhaps try not opening with the b flag, I didn't think that was supported on all OS / filesystems.

Related

Dump Python sklearn model in Windows and read it in Linux

cv2.imread() always throws a Nonetype only in this one program?

Distributing a Python script to unpack .tar.xz

High memory usage with Pythons native tarfile lib

Python: writing to another process's memory under linux

Categories

Resources