python open('file','r+') giving weird result - python

After reading some posts, it seems like you can open a file for both reading and writing with the mode of 'r+' or 'w+'. However, trying to use these modes always give me weird results:
If I use 'r+', call file.read(), and then call file.write('str'),
there'll be an error of "IOError: [Errno 0] Error"
If I use 'r+', call file.write('str'), and then call file.read(),
it'll return unexpected and very long content(looks like the inside
of some object)
If I use 'w+', calling file.read() will return empty string
What I'm trying to do is open a file, read the content, modify it, and write back. Currently I'm opening it with 'r', change the content, and open it again with 'w' and write back. Is this a good way of doing it?
There's an example at http://snipt.org/zglJ0
I'm using window 7 and python 2.7.2

You have to flush() when switching between reading and writing a file that's been opened in an update mode. Or I think you can also seek(). This is caused by some weird behavior in the Windows file implementation in Python 2.x; they fixed it in 3.x.

Related

Reading a dynamically updated log file via readline

I'm trying to read a log file, written line by line, via readline.
I'm surprised to observe the following behaviour (code executed in the interpreter, but same happens when variations are executed from a file):
f = open('myfile.log')
line = readline()
while line:
print(line)
line = f.readline()
# --> This displays all lines the file contains so far, as expected
# At this point, I open the log file with a text editor (Vim),
# add a line, save and close the editor.
line = f.readline()
print(line)
# --> I'm expecting to see the new line, but this does not print anything!
Is this behaviour standard? Am I missing something?
Note: I know there are better way to deal with an updated file for instance with generators as pointed here: Reading from a frequently updated file. I'm just interested in understanding the issue with this precise use case.
For your specific use case, the explanation is that Vim uses a write-to-temp strategy. This means that all writing operations are performed on a temporary file.
On the contrary, your scripts reads from the original file, so it does not see any change on it.
To further test, instead of Vim, you can try to directly write on the file using:
echo "Hello World" >> myfile.log
You should see the new line from python.
for following your file, you can use this code:
f = open('myfile.log')
while True:
line = readline()
if not line:
print(line)

python 2.7 cpickle.load adds \r to strings in dictionaries [duplicate]

I got a pickled object (a list with a few numpy arrays in it) that was created on Windows and apparently saved to a file loaded as text, not in binary mode (ie. with open(filename, 'w') instead of open(filename, 'wb')). Result is that now I can't unpickle it (not even on Windows) because it's infected with \r characters (and possibly more)? The main complaint is
ImportError: No module named multiarray
supposedly because it's looking for numpy.core.multiarray\r, which of course doesn't exist. Simply removing the \r characters didn't do the trick (tried both sed -e 's/\r//g' and, in python s = file.read().replace('\r', ''), but both break the file and yield a cPickle.UnpicklingError later on)
Problem is that I really need to get the data out of the objects. Any ideas how to fix the files?
Edit: On request, the first few hundred bytes of my file, Octal:
\x80\x02]q\x01(}q\x02(U\r\ntotal_timeq\x03G?\x90\x15r\xc9(s\x00U\rreaction_timeq\x04NU\x0ejump_directionq\x05cnumpy.core.multiarray\r\nscalar\r\nq\x06cnumpy\r\ndtype\r\nq\x07U\x02f8K\x00K\x01\x87Rq\x08(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\x025\x9d\x13\xfc#\xc8?\x86Rq\tU\x14normalised_directionq\r\nh\x06h\x08U\x08\xf0\xf9,\x0eA\x18\xf8?\x86Rq\x0bU\rjump_distanceq\x0ch\x06h\x08U\x08\x13\x14\xea&\xb0\x9b\x1a#\x86Rq\rU\x04jumpq\x0ecnumpy.core.multiarray\r\n_reconstruct\r\nq\x0fcnumpy\r\nndarray\r\nq\x10K\x00\x85U\x01b\x87Rq\x11(K\x01K\x02\x85h\x08\x89U\x10\x87\x16\xdaEG\xf4\xf3?\x06`OC\xe7"\x1a#tbU\x0emovement_speedq\x12h\x06h\x08U\x08\\p\xf5[2\xc2\xef?\x86Rq\x13U\x0ctrial_lengthq\x14G#\t\x98\x87\xf8\x1a\xb4\xbaU\tconditionq\x15U\x0bhigh_mentalq\x16U\x07subjectq\x17K\x02U\x12movement_directionq\x18h\x06h\x08U\x08\xde\x06\xcf\x1c50\xfd?\x86Rq\x19U\x08positionq\x1ah\x0fh\x10K\x00\x85U\x01b\x87Rq\x1b(K\x01K\x02\x85h\x08\x89U\x10K\xb7\xb4\x07q=\x1e\xc0\xf2\xc2YI\xb7U&\xc0tbU\x04typeq\x1ch\x0eU\x08movementq\x1dh\x0fh\x10K\x00\x85U\x01b\x87Rq\x1e(K\x01K\x02\x85h\x08\x89U\x10\xad8\x9c9\x10\xb5\xee\xbf\xffa\xa2hWR\xcf?tbu}q\x1f(h\x03G#\t\xba\xbc\xb8\xad\xc8\x14h\x04G?\xd9\x99%]\xadV\x00h\x05h\x06h\x08U\x08\xe3X\xa9=\xc1\xb1\xeb?\x86Rq h\r\nh\x06h\x08U\x08\x88\xf7\xb9\xc1\t\xd6\xff?\x86Rq!h\x0ch\x06h\x08U\x08v\x7f\xeb\x11\xea5\r#\x86Rq"h\x0eh\x0fh\x10K\x00\x85U\x01b\x87Rq#(K\x01K\x02\x85h\x08\x89U\x10\xcd\xd9\x92\x9a\x94=\x06#]C\xaf\xef\xeb\xef\x02#tbh\x12h\x06h\x08U\x08-\x9c&\x185\xfd\xef?\x86Rq$h\x14G#\r\xb8W\xb2`V\xach\x15h\x16h\x17K\x02h\x18h\x06h\x08U\x08\x8e\x87\xd1\xc2
You may also download the whole file (22k).
Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU') i.e. universal newlines.
If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200)) and paste the results into an edit of your question.
Update after file contents were published:
Your file starts with '\x80\x02'; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each '\n' being converted to '\r\n' by the C runtime. Files should be opened in binary mode like this:
with open('result.pickle', 'wb') as f: # b for binary
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
with open('result.pickle', 'rb') as f: # b for binary
obj = pickle.load(f)
Docs are here. This code will work portably on both Windows and non-Windows systems.
You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of '\r\n' by '\n'. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.
Newlines in Windows aren't just '\r', it's CRLF, or '\r\n'.
Give file.read().replace('\r\n', '\n') a try. You were previously deleting carriage returns that may not have actually been part of newlines.
Can't you -- on Windows -- just open the file in text mode, the same way it was written, read it in and then write it out to another file opened properly in binary mode?
Have you tried unpickling in text mode? That is,
x = pickle.load(open(filename, 'r'))
(On Windows, of course.)

python [Errno 22] invalid mode ('wb') or filename: u'Escuela Sab\xe1tica Part 2.doc'

I've seen many similar questions, but none that fix my problem. I'm trying to open a file with a unicode filename, but get the error:
[Errno 22] invalid mode ('wb') or filename: u'Escuela Sab\xe1tica Part 2.doc'
I've tried using open, codecs.open, and io.open to open the file (which I didn't think would matter for a binary file, but whatever). No dice. I think the clue might be the filename:
u'Escuela Sab\xe1tica Part 2.doc'
When printed, this file name works fine:
Escuela Sabática Part 2.doc
But I think it's weird the error prints it as u'...\xe1...' instead of u'...\uxxx...'. I'm still not comfortable with unicode, so that's my suspicion. I've tried encoding and decoding the filename ('utf-8') before opening with no success.
Edit: Version is python 2.7.3. Code snippet:
with open(to_path, "wb") as to_file:
to_file.write(f.read())
The error traces back to the 'with open' line, and the code works for files without unicode in the file name.
This works fine for me on Python 2.7.10:
with open(u'Escuela Sab\xe1tica Part 2.txt', 'wb') as f:
f.write('test')
>>> ls Esc*
Escuela Sabática Part 2.doc (I'm using text instead of code sample to show the accent).
>>> cat Esc*
test
Perhaps you are reading instead of writing?

Python fails to open 11gb csv in r+ mode but opens in r mode

I'm having problems with some code that loops through a bunch of .csvs and deletes the final line if there's nothing in it (i.e. files that end with the \n newline character)
My code works successfully on all files except one, which is the largest file in the directory at 11gb. The second largest file is 4.5gb.
The line it fails on is simply:
with open(path_str,"r+") as my_file:
and I get the following message:
IOError: [Errno 22] invalid mode ('r+') or filename: 'F:\\Shapefiles\\ab_premium\\processed_csvs\\a.csv'
The path_str I create using os.file.join to avoid errors, and I tried renaming the file to a.csv just to make sure there wasn't anything odd going on with the filename. This made no difference.
Even more strangely, the file is happy to open in r mode. I.e. the following code works fine:
with open(path_str,"r") as my_file:
I have tried navigating around the file in read mode, and it's happy to read characters at the start, end, and in the middle of the file.
Does anyone know of any limits on the size of file that Python can deal with or why I might be getting this error? I'm on Windows 7 64bit and have 16gb of RAM.
The default I/O stack in Python 2 is layered over CRT FILE streams. On Windows these are built on top of a POSIX emulation API that uses file descriptors (which in turn is layered over the user-mode Windows API, which is layered over the kernel-mode I/O system, which itself is a deeply layered system based on I/O request packets; the hardware is down there somewhere...). In the POSIX layer, opening a file with _O_RDWR | _O_TEXT mode (as in "r+"), requires seeking to the end of the file to remove CTRL+Z, if it's present. Here's a quote from the CRT's fopen documentation:
Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
an end-of-file character on input. In files opened for reading/writing
with "a+", fopen checks for a CTRL+Z at the end of the file and
removes it, if possible. This is done because using fseek and ftell to
move within a file that ends with a CTRL+Z, may cause fseek to behave
improperly near the end of the file.
The problem here is that the above check calls the 32-bit _lseek (bear in mind that sizeof long is 4 bytes on 64-bit Windows, unlike most other 64-bit platforms), instead of _lseeki64. Obviously this fails for an 11 GB file. Specifically, SetFilePointer fails because it gets called with a NULL value for lpDistanceToMoveHigh. Here's the return value and LastErrorValue for the latter call:
0:000> kc 2
Call Site
KERNELBASE!SetFilePointer
MSVCR90!lseek_nolock
0:000> r rax
rax=00000000ffffffff
0:000> dt _TEB #$teb LastErrorValue
ntdll!_TEB
+0x068 LastErrorValue : 0x57
The error code 0x57 is ERROR_INVALID_PARAMETER. This is referring to lpDistanceToMoveHigh being NULL when trying to seek from the end of a large file.
To work around this problem with CRT FILE streams, I recommend opening the file using io.open instead. This is a backported implementation of Python 3's I/O stack. It always opens files in raw binary mode (_O_BINARY), and it implements its own buffering and text-mode layers on top of the raw layer.
>>> import io
>>> f = io.open('a.csv', 'r+')
>>> f
<_io.TextIOWrapper name='a.csv' encoding='cp1252'>
>>> f.buffer
<_io.BufferedRandom name='a.csv'>
>>> f.buffer.raw
<_io.FileIO name='a.csv' mode='rb+'>
>>> f.seek(0, os.SEEK_END)
11811160064L

Is it possible to get writing access to raw devices using python with windows?

This is sort of a follow-up to this question. I want to know if you can access raw devices (i.e. \\.\PhysicalDriveN) in writing mode and if this should be the case, how.
Using Linux, write access can simply be achieved by using e.g. open("/dev/sdd", "w+") (provided that the script is running with root permissions). I assume that Mac OS behaves similar (with /dev/diskN as input file).
When trying the same command under Windows (with the corresponding path), it fails with the following error:
IOError: [Errno 22] invalid mode ('w+') or filename: '\\\\.\\PhysicalDrive3'
However, when trying to read from the PhysicalDrive, it does work (even the correct data is read). The shell is running with administrator permissions under Windows 7.
Is there any other way to accomplish this task using python while still keeping the script as platform-independent as possible?
Edit:
I looked a bit further into what methods python provides for file handling and stumbled across os.open. Opening the PhysicalDrive using os.open(drive_string, os.O_WRONLY|os.O_BINARY) returns no error. So far, so good. Now I have either the choice to write directly to this file-descriptor using os.write, or use os.fdopen to get a file-object and write to it in the regular way.
Sadly, none of these possibilities works. In the first case (os.write()), I get this:
>>> os.write(os.open("\\\\.\\PhysicalDrive3", os.O_WRONLY|os.O_BINARY), "test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
In the second case, I can create a file object with write permissions, but the writing itself fails (well, after enforcing its execution using .flush()):
>>> g = os.fdopen(os.open("\\\\.\\PhysicalDrive3", os.O_WRONLY|os.O_BINARY), "wb")
>>> g.write("test")
>>> g.flush()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 22] Invalid argument
As eryksun and agf pointed out in the comments (but I didn't really get it at first), the solution is rather simple: you have to open the device in the rb+ mode, which opens the device for updating (as I have found out now..) without trying to replace it with a new file (which wouldn't work because the file is in fact a physical drive).
When writing, you have to write always a whole sector at a time (i.e. multiples of 512-byte), otherwise it fails.
In addition, the .seek() command can also jump only sector-wise. If you try to seek a position inside a sector (e.g. position 621), the file object will jump to the beginning of the sector where your requested position is (i.e. to the beginning of the second sector, byte 512).
Possibly in Win 7 you have to do something more extreme, such as locking the volume(s) for the disk beforehand with DeviceIoControl(hVol, FSCTL_LOCK_VOLUME, ...)
In Win 7 you don't have to do that; opening and writing with 'rb+' mode works fine.

Categories