I am new to python and I am trying to use unpack like this:
data = f.read(4)
AAA=len(data)
BBB=struct.calcsize(cformat)
print AAA
print BBB
value = struct.unpack(cformat, data)
return value[0]
This runs fine as long as AAA == BBB but sometimes, f.read only reads 3 bytes and then I get an error. The actual value in the file that I am trying to read is 26. It reads all of the values from 1-221 except for 26 where it errors because f.read(size) only reads three bytes
Assuming the question is "How should I read a 26 without an error?"
First check the arguments to the open() that produces f. Under Windows, unless you open a file in binary mode (f = open(filename, "rb")), Python assumes that the file is a text file. Windows treats byte value 26 (Ctrl+Z) in a text file as an end-of-file marker, a quirk that it inherited from CP/M.
You have opened a binary file in text mode, and you are using an operating system where the distinction matters. Try adding b to the mode parameter when you open the file:
f = open("my_input_file.bin", "rb")
Related
I have a bin file which contains binary data stored in bytes.
When trying to read them in python, the output is something like this \xb5D\xbe"jSUk\xe75\x18}#\'%\x89oRqR\xfb\xe9\xe9\
How can I print file contents as Base 2 binary ?
For example 10000000 01000000 11000000 , etc
Here is an example reading 8 bytes at a time and formatting them in the way that you describe.
Note that you probably already have system utilities that will do a similar task, for example the od program on Unix-like systems.
with open("your_binary_file", "rb") as f:
while True:
data = f.read(8)
if not data:
break
print(" ".join(f"{byte:08b}" for byte in data))
I have been working on a project where it is necessary to program a binary file, of a certain kind, to a AT28C256 chip. The specifics are not important beyond the fact that the file needs to be 32,768 bytes in size (exactly).
I have some "minimal problem" code here:
o = open("images.bin", "wb")
c = 0
for i in range(256):
for j in range(128):
c += 1
o.write(chr(0).encode('utf-8'))
print(c)
This, to me, would appear to write 32,768 bytes to a file (the split into i,j is necessary because I need to write an image to the device) as 128*256 = 32768. And the output of c is 32768!
But the file it creates is 28672 bytes long! The fact that this is 7000 in hex has not passed me by but I'm not sure why this is happening. Any ideas?
You should call o.close() to flush the write buffer and close the file properly.
I got a pickled object (a list with a few numpy arrays in it) that was created on Windows and apparently saved to a file loaded as text, not in binary mode (ie. with open(filename, 'w') instead of open(filename, 'wb')). Result is that now I can't unpickle it (not even on Windows) because it's infected with \r characters (and possibly more)? The main complaint is
ImportError: No module named multiarray
supposedly because it's looking for numpy.core.multiarray\r, which of course doesn't exist. Simply removing the \r characters didn't do the trick (tried both sed -e 's/\r//g' and, in python s = file.read().replace('\r', ''), but both break the file and yield a cPickle.UnpicklingError later on)
Problem is that I really need to get the data out of the objects. Any ideas how to fix the files?
Edit: On request, the first few hundred bytes of my file, Octal:
\x80\x02]q\x01(}q\x02(U\r\ntotal_timeq\x03G?\x90\x15r\xc9(s\x00U\rreaction_timeq\x04NU\x0ejump_directionq\x05cnumpy.core.multiarray\r\nscalar\r\nq\x06cnumpy\r\ndtype\r\nq\x07U\x02f8K\x00K\x01\x87Rq\x08(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\x025\x9d\x13\xfc#\xc8?\x86Rq\tU\x14normalised_directionq\r\nh\x06h\x08U\x08\xf0\xf9,\x0eA\x18\xf8?\x86Rq\x0bU\rjump_distanceq\x0ch\x06h\x08U\x08\x13\x14\xea&\xb0\x9b\x1a#\x86Rq\rU\x04jumpq\x0ecnumpy.core.multiarray\r\n_reconstruct\r\nq\x0fcnumpy\r\nndarray\r\nq\x10K\x00\x85U\x01b\x87Rq\x11(K\x01K\x02\x85h\x08\x89U\x10\x87\x16\xdaEG\xf4\xf3?\x06`OC\xe7"\x1a#tbU\x0emovement_speedq\x12h\x06h\x08U\x08\\p\xf5[2\xc2\xef?\x86Rq\x13U\x0ctrial_lengthq\x14G#\t\x98\x87\xf8\x1a\xb4\xbaU\tconditionq\x15U\x0bhigh_mentalq\x16U\x07subjectq\x17K\x02U\x12movement_directionq\x18h\x06h\x08U\x08\xde\x06\xcf\x1c50\xfd?\x86Rq\x19U\x08positionq\x1ah\x0fh\x10K\x00\x85U\x01b\x87Rq\x1b(K\x01K\x02\x85h\x08\x89U\x10K\xb7\xb4\x07q=\x1e\xc0\xf2\xc2YI\xb7U&\xc0tbU\x04typeq\x1ch\x0eU\x08movementq\x1dh\x0fh\x10K\x00\x85U\x01b\x87Rq\x1e(K\x01K\x02\x85h\x08\x89U\x10\xad8\x9c9\x10\xb5\xee\xbf\xffa\xa2hWR\xcf?tbu}q\x1f(h\x03G#\t\xba\xbc\xb8\xad\xc8\x14h\x04G?\xd9\x99%]\xadV\x00h\x05h\x06h\x08U\x08\xe3X\xa9=\xc1\xb1\xeb?\x86Rq h\r\nh\x06h\x08U\x08\x88\xf7\xb9\xc1\t\xd6\xff?\x86Rq!h\x0ch\x06h\x08U\x08v\x7f\xeb\x11\xea5\r#\x86Rq"h\x0eh\x0fh\x10K\x00\x85U\x01b\x87Rq#(K\x01K\x02\x85h\x08\x89U\x10\xcd\xd9\x92\x9a\x94=\x06#]C\xaf\xef\xeb\xef\x02#tbh\x12h\x06h\x08U\x08-\x9c&\x185\xfd\xef?\x86Rq$h\x14G#\r\xb8W\xb2`V\xach\x15h\x16h\x17K\x02h\x18h\x06h\x08U\x08\x8e\x87\xd1\xc2
You may also download the whole file (22k).
Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU') i.e. universal newlines.
If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200)) and paste the results into an edit of your question.
Update after file contents were published:
Your file starts with '\x80\x02'; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each '\n' being converted to '\r\n' by the C runtime. Files should be opened in binary mode like this:
with open('result.pickle', 'wb') as f: # b for binary
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
with open('result.pickle', 'rb') as f: # b for binary
obj = pickle.load(f)
Docs are here. This code will work portably on both Windows and non-Windows systems.
You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of '\r\n' by '\n'. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.
Newlines in Windows aren't just '\r', it's CRLF, or '\r\n'.
Give file.read().replace('\r\n', '\n') a try. You were previously deleting carriage returns that may not have actually been part of newlines.
Can't you -- on Windows -- just open the file in text mode, the same way it was written, read it in and then write it out to another file opened properly in binary mode?
Have you tried unpickling in text mode? That is,
x = pickle.load(open(filename, 'r'))
(On Windows, of course.)
I'm using OpenCV Python library to extract descriptors and write them to file. Each descriptor is 32 bytes and I only save 80 of them. Meaning that, the final file must be exactly 2560 bytes. But it's 2571 bytes.
I also have another file which had been written using the same Python script (Not on Windows but I guess it was on Linux) and it's exactly 2560 bytes.
Using WinMerge, I tried to compare them and it gave me a warning that the carriage return is different in two files and asked me if I wanted to treat them equally. If I say "yes", then both files are identical but if I say "no" then they are different.
I was wondering if there is anyway in Python to write binary files which produce identical result on both Windows and Linux?
Not to mention this is the relevant part of the script:
f = open("something", "w+")
f.write(descriptors)
f.close()
Yes, there's a way to open a file in binary mode - just put the b character into the open.
f = open("something", "wb+")
If you don't do that in Windows, every linefeed '\n' will be converted to the two-character line ending sequence that is used by Windows, '\r\n'.
I want to read in a non text file. It has an extension ".map" but can be opened by notepad. How should I open this file through python?
file = open("path-to-file","r") doesn't work for me. It returns No such file or directory: error.
Here's what my file looks like:
111 + gi|89106884|ref|AC_000091.1| 725803 TCGAGATCGACCATGTTGCCCGCCT IIIIIIIIIIIIIIIIIIIIIIIII 0 14:A>G
457 + gi|89106884|ref|AC_000091.1| 32629 CCGTGTCCACCGACTACGACACCTC IIIIIIIIIIIIIIIIIIIIIIIII 0 4:C>G,22:T>C
779 + gi|89106884|ref|AC_000091.1| 483582 GATCACCCACGCAAAGATGGGGCGA IIIIIIIIIIIIIIIIIIIIIIIII 0 15:A>G,18:C>G
784 + gi|89106884|ref|AC_000091.1| 226200 ACCGATAGTGAACCAGTACCGTGAG IIIIIIIIIIIIIIIIIIIIIIIII 1
If I do the follwing:
file = open("D:\bowtie-0.12.7-win32\bowtie-0.12.7\output_635\results_NC_000117.fna.1.ebwt.map","rb")
It still gives me No such file or directory: 'D:\x08owtie-0.12.7-win32\x08owtie-0.12.7\\output_635\results_NC_000117.fna.1.ebwt.map' error. Is this because the file isn't binary or I don't have some permissions?
Would apppreciate help with this!
Binary files should use a binary mode.
f = open("path-to-file","rb")
But that won't help if you don't have the appropriate permissions or don't know the format of the file itself.
EDIT:
Obviously you didn't bother reading the error message, or you would have noticed that the filename it is using is not the one you expected.
f = open("D:\\bowtie-0.12.7-win32\\bowtie-0.12.7\\output_635\\results_NC_000117.fna.1.ebwt.map","rb")
f = open(r"D:\bowtie-0.12.7-win32\bowtie-0.12.7\output_635\results_NC_000117.fna.1.ebwt.map","rb")
You have hit upon a minor difference between Unix and Windows here.
Since you mentioned Notepad, you must be running this on Windows. In DOS/Windows land, opening a binary file requires specifying attribute 'b' for binary, as others have already indicated. Unix/Linux are a bit more relaxed about this. Omitting attribute 'b' will still open a binary file.
The same behavior is exhibited in the C library's fopen() call.
If its a non-text file you could try opening it using binary format. Try this -
with open("path-to-file", "rb") as f:
byte = f.read(1)
while byte != "":
byte = f.read(1) # Do stuff with byte.
The with statement handles opening and closing the file, including if an exception is raised in the inner block.
Of course since the format is binary you need to know what you are going to do after you read. Also, here I read 1 byte at a time, you can define bigger chunk sizes too.
UPDATE: Maybe this is not a binary file. You might be having problems with file encoding, the characters might not be ascii or they might belong to unicode charset. Try this -
import codecs
f = codecs.open(u'path-to-file','r','utf-8')
print f.read()
f.close()
If you print this out in the terminal, you might still get gibberish since the terminal might not support this charset. I would advise, go ahead & process the text assuming its properly opened.
Source