I have a binary file containing the position of 8000 particles.
I know that each particle value should look like "-24.6151..." (I don't know with which precision the values are given by my program. I guess it is double precision(?).
But when I try to read the file with this code:
In: with open('.//results0epsilon/energybinary/energy_00004.dat', 'br') as f:
buffer = f.read()
print ("Lenght of buffer is %d" % len(buffer))
for i in buffer:
print(int(i))
I get as output:
Lenght of buffer is 64000
10
168
179
43
...
I skip the whole list of values but as you can see those values are far away from what I expect. I think I have some kind of decoding error.
I would appreciate any kind of help :)
What you are printing now are the bytes composing your floating point data. So it doesn't make sense as numerical values.
Of course, there's no 100% sure answer since we didn't see your data, but I'll try to guess:
You have 8000 values to read and the file size is 64000. So you probably have double IEEE values (8 bytes each). If it's not IEEE, then you're toast.
In that case you could try the following:
import struct
with open('.//results0epsilon/energybinary/energy_00004.dat', 'br') as f:
buffer = f.read()
print ("Length of buffer is %d" % len(buffer))
data = struct.unpack("=8000d",buffer)
if the data is printed bogus, it's probably an endianness problem. So change the =8000 by <8000 or >8000.
for reference and packing/unpacking formats: https://docs.python.org/3/library/struct.html
Related
I am trying to read data from this png image, and then place the image length at the start of the data, and pad it a given number of spaces defined by my header variable. However, once I do that, the image length increases drastically for a reason beyond my knowledge. Please can someone inform me of what is happening? I must be missing something since I am still fairly new to this field.
HEADER = 10
PATH = os.path.abspath("penguin.png")
print(PATH)
with open(PATH,"rb") as f:
imgbin = f.read()
print(len(imgbin))
imgbin = f"{len(imgbin):<{HEADER}}"+str(imgbin)
print(len(imgbin))
when I first print the length of the data, I get a length of 163287, and on the second print, I get a length of 463797
This is because you are changing the data from binary string to a string when you load the image to when you pass it through str:
len(imgbin), len(str(imgbin))
>>> (189255, 545639)
(note I use a different image so the numbers are different). You can solve this issue by adding a binary string to the start like so:
with open(PATH,"rb") as f:
imgbin = f.read()
imgbin = f"{len(imgbin):<{HEADER}}".encode('utf-8')+imgbin
print(len(imgbin))
>>> 189245
>>> 189255
You can find out more about binary strings here.
For reference it is worth noting that png images are uint-8 in type (i.e. 0-255). It is possible to manipulate them as binary strings because they can be utf-8 (i.e. the same size). However, it might be worth using something like numpy where you have uint-8 as a data type so as to avoid this.
I want to read a binary file, get the content four bytes by four bytes and perform int operations on these packets.
Using a dummy binary file, opened this way:
with open('MEM_10001000_0000B000.mem', 'br') as f:
for byte in f.read():
print (hex(byte))
I want to perform an encryption with a 4 byte long key, 0x9485A347 for example.
Is there a simple way I can read my files 4 bytes at a time and get them as int or do I need to put them in a temporary result using a counter?
My original idea is the following:
current_tmp = []
for byte in data:
current_tmp.append(int(byte))
if (len(current_tmp) == 4):
print (current_tmp)
# but current_tmp is an array not a single int
current_tmp = []
In my example, instead of having [132, 4, 240, 215] I would rather have 0x8404f0d7
Just use the "amount" parameter of read to read 4 bytes at a time, and the "from_bytes" constructor of Python's 3 int to get it going:
with open('MEM_10001000_0000B000.mem', 'br') as f:
data = f.read(4)
while data:
number = int.from_bytes(data, "big")
...
data = f.read(4)
If you are not using Python 3 yet for some reason, int won't feature a from_bytes method - then you could resort to use the struct module:
import struct
...
number = struct.unpack(">i", data)[0]
...
These methods however are good for a couple interations, and could get slow for a large file - Python offers a way for you to simply fill an array of 4-byte integer numbers directly in memory from an openfile - which is more likely what you should be using:
import array, os
numbers = array.array("i")
with open('MEM_10001000_0000B000.mem', 'br') as f:
numbers.fromfile(f, os.stat('MEM_10001000_0000B000.mem').st_size // numbers.itemsize)
numbers.byteswap()
Once you have the array, you can xor it with something like
from functools import reduce #not needed in Python2.7
result = reduce(lambda result, input: result ^ input, numbers, key)
will give you a numbers sequence with all numbers in your file read-in as 4 byte, big endian, signed ints.
If you file is not a multiple of 4 bytes, the first two methods might need some adjustment - fixing the while condition will be enough.
I'm new to python but i need to get this project done in it. I'm using telnetlib to get some raw data from a device, and this is what the data looks like (this is only part of the output i get, the real one is about 10x bigger)
\xc2\xb2\xdd\x0f\xc2\xb2x/\xc2\xb2\x08\xb2M\xcf\xc2\xb2\xc5S\xc2\xb2\xd6[\xc2\xb2qw\xc2\xb1\xafK\xc2\xb1n+\xc2\xb2?\x83\xc2\xb1\xe3\xb7\xc2\xb0\xe8\x87\xc2\xb0\xf1\x8f\xc2\xb1x\xbf\xc2\xb1\xcbO\xc2\xb1\x98\x93\xc2\xb1\xd4\xc3\xc2\xb1\xf7\x9f\xc2\xb1\xb3\x97\xc2\xb1\xe7;\xc2\xb2\x97\xcb\xc2\xb2\xd3\xf3\xc2\xb2f\x8b\xc2\xb1\xc6\xdb\xc2\xb1\xadC\xc2\xb1t\xcf\xc2\xb1\x9c\xdf\xc2\xb1\xb7\x1b\xc2\xb1\xa3\xc2\xb1\t_\xc2\xb1v\xc3\xc2\xb1\xeb
The documentation of the device says that this is
raw data: binary. An array of float values in big-endian format (not as a string).
The question is how can i convert this data into an array of float numbers?
the code:
import telnetlib
tn = telnetlib.Telnet(hostIP)
tn.read_until("connected")
tn.write("getData\r\n")
data = tn.read_until("\r\n")
print data
When i execute this script from terminal i get some binary "garbage"
²\f²▒▒²▒V²▒²▒
³▒▒³u▒³:v³▒>³;>²W▒²O^²Xf²▒▒±▒▒²P▒²▒j²▒²▒³Pv³▒▒²▒n²:Z²▒±▒F±▒±7▒±#▒±t^±▒▒±▒▒²5:±▒"±▒~±ю±±*±▒°▒▒°{n°a▒°▒:°Q▒°[°cj°0▒¯▒▒¯▒▒r¯ޒ°▒°▒¯▒▒¯a▒¯▒°E▒°▒r°q*¯▒¯▒
If i do the same from python shell i get the \xc2\xb2\xdd\x0f\xc2... values
You need to know in advance the number of elements in the array, or somehow infer the count, ie by counting the number of bytes and then dividing by the float size. You then use the struct module to unpack the binary data.
if (len(data) % 8) > 0:
assert "Data length not a multiple of 8"
L = []
for i in range(0, len(data), 8):
L.append(struct.unpack('>d', data[i:i+8]))
Complementing #vz0 answer, there is also struct.iter_unpack() that:
Iteratively unpack from the buffer buffer according to the format string format.
read the docs here
So we can convert without any trouble:
import struct
import numpy as np
# Choose operators from https://docs.python.org/3/library/struct.html#format-strings
Byte_Order = '<' # little-endian
Format_Characters = 'f' # float (4 bytes)
data_format = Byte_Order + Format_Characters
r = np.array(list(struct.iter_unpack(data_format, data)), dtype=float)
I am aware that there are a lot of almost identical questions, but non seems to really target the general case.
So assume I want to open a file, read it in memory, possibly do some operations on the respective bitstring and write the result back to file.
The following is what seems straightforward to me, but it results in completely different output. Note that for simplicity I only copy the file here:
file = open('INPUT','rb')
data = file.read()
data_16 = data.encode('hex')
data_2 = bin(int(data_16,16))
OUT = open('OUTPUT','wb')
i = 0
while i < len(data_2) / 8:
byte = int(data_2[i*8 : (i+1)*8], 2)
OUT.write('%c' % byte)
i += 1
OUT.close()
I looked at data, data_16 and data_2. The transformations make sense as far as I can see.
As expected, the output file has exactly the same size in bits as the input file.
EDIT: I considered the possibility that the leading '0b' has to be cut. See the following:
>>> data[:100]
'BMFU"\x00\x00\x00\x00\x006\x00\x00\x00(\x00\x00\x00\xe8\x03\x00\x00\xee\x02\x00\x00\x01\x00\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x12\x0b\x00\x00\x12\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05=o\xce\xf4^\x16\xe0\x80\x92\x00\x00\x00\x01I\x02\x1d\xb5\x81\xcaN\xcb\xb8\x91\xc3\xc6T\xef\xcb\xe1j\x06\xc3;\x0c*\xb9Q\xbc\xff\xf6\xff\xff\xf7\xed\xdf'
>>> data_16[:100]
'424d46552200000000003600000028000000e8030000ee020000010018000000000000000000120b0000120b000000000000'
>>> data_2[:100]
'0b10000100100110101000110010101010010001000000000000000000000000000000000000000000011011000000000000'
>>> data_2[1]
'b'
Maybe the BMFU" part should be cut from data?
>>> bin(25)
'0b11001'
Note two things:
The "0b" at the beginning. This means that your slicing will be off by 2 bits.
The lack of padding to 8 bits. This will corrupt your data every time unless it happens to mesh up with point 1.
Process the file byte by byte instead of attempting to process it in one big gulp like this. If you find your code too slow then you need to find a faster way of working byte by byte, not switch to an irreparably flawed method such as this one.
You could simply write the data variable back out and you'd have a successful round trip.
But it looks like you intend to work on the file as a string of 0 and 1 characters. Nothing wrong with that (though it's rarely necessary), but your code takes a very roundabout way of converting the data to that form. Instead of building a monster integer and converting it to a bit string, just do so for one byte at a time:
data = file.read()
data_2 = "".join( bin(ord(c))[2:] for c in data )
data_2 is now a sequence of zeros and ones. (In a single string, same as you have it; but if you'll be making changes, I'd keep the bitstrings in a list). The reverse conversion is also best done byte by byte:
newdata = "".join(chr(int(byte, 8)) for byte in grouper(long_bitstring, 8, "0"))
This uses the grouper recipe from the itertools documentation.
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
You can use the struct module to read and write binary data. (Link to the doc here.)
EDIT
Sorry, I was mislead by your title. I’ve just understand that you write binary data in a text file instead of writing binary data directly.
Ok, thanks to alexis and being aware of Ignacio's warning about the padding, I found a way to do what I wanted to do, that is read data into a binary representation and write a binary representation to file:
def padd(bitstring):
padding = ''
for i in range(8-len(bitstring)):
padding += '0'
bitstring = padding + bitstring
return bitstring
file = open('INPUT','rb')
data = file.read()
data_2 = "".join( padd(bin(ord(c))[2:]) for c in data )
OUT = open('OUTPUT','wb')
i = 0
while i < len(data_2) / 8:
byte = int(data_2[i*8 : (i+1)*8], 2)
OUT.write('%c' % byte)
i += 1
OUT.close()
If I did not do it exactly the way proposed by alexis then that is because it did not work. Of course this is terribly slow but now that I can do the simplest thing, I can optimize it further.
I have a binary output file from a FORTRAN code. Want to read it in python. (Reading with FORTRAN and outputting text to read for python is not an option. Long story.) I can read the first record in a simplistic manner:
>>> binfile=open('myfile','rb')
>>> pad1=struct.unpack('i',binfile.read(4))[0]
>>> ver=struct.unpack('d',binfile.read(8))[0]
>>> pad2=struct.unpack('i',binfile.read(4))[0]
>>> pad1,ver,pad2
(8,3.13,8)
Just fine. But this is a big file and I need to do this more efficiently. So I try:
>>> (pad1,ver,pad2)=struct.unpack('idi',binfile.read(16))
This won't run. Gives me an error and tells me that unpack needs an argument with a length of 20. This makes no sense to me since the last time I checked, 4+8+4=16. When I give in and replace the 16 with 20, it runs, but the three numbers are populated with numerical junk. Does anyone see what I am doing wrong? Thanks!
The size you get is due to alignment, try struct.calcsize('idi') to verify the size is actually 20 after alignment. To use the native byte-order without alignment, specify struct.calcsize('=idi') and adapt it to your example.
For more info on the struct module, check http://docs.python.org/2/library/struct.html
The struct module is mainly intended to interoperate with C structures and because of this it aligns the data members. idi corresponds to the following C structure:
struct
{
int int1;
double double1;
int int2;
}
double entries require 8 byte alignment in order to function efficiently (or even correctly) with most CPU load operations. That's why 4 bytes of padding are being added between int1 and double1, which increases the size of the structure to 20 bytes. The same padding is performed by the struct module, unless you suppress the padding by adding < (on little endian machines) or > (on big endian machines), or simply = at the beginning of the format string:
>>> struct.unpack('idi', d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: unpack requires a string argument of length 20
>>> struct.unpack('<idi', d)
(-1345385859, 2038.0682530887993, 428226400)
>>> struct.unpack('=idi', d)
(-1345385859, 2038.0682530887993, 428226400)
(d is a string of 16 random chars.)
I recommend using arrays to read a file that was written by FORTRAN with UNFORMATTED, SEQUENTIAL.
Your specific example using arrays, would be as follows:
import array
binfile=open('myfile','rb')
pad = array.array('i')
ver = array.array('d')
pad.fromfile(binfile,1) # read the length of the record
ver.fromfile(binfile,1) # read the actual data written by FORTRAN
pad.fromfile(binfile,1) # read the length of the record
If you have FORTRAN records that write arrays of integers and doubles, which is very common, your python would look something like this:
import array
binfile=open('myfile','rb')
pad = array.array('i')
my_integers = array.array('i')
my_floats = array.array('d')
number_of_integers = 1000 # replace with how many you need to read
number_of_floats = 10000 # replace with how many you need to read
pad.fromfile(binfile,1) # read the length of the record
my_integers.fromfile(binfile,number_of_integers) # read the integer data
my_floats.fromfile(binfile,number_of_floats) # read the double data
pad.fromfile(binfile,1) # read the length of the record
Final comment is that if you have characters on the file, you can read those into an array as well, and then decode it into a string. Something like this:
import array
binfile=open('myfile','rb')
pad = array.array('i')
my_characters = array.array('B')
number_of_characters = 63 # replace with number of characters to read
pad.fromfile(binfile,1) # read the length of the record
my_characters.fromfile(binfile,number_of_characters ) # read the data
my_string = my_characters.tobytes().decode(encoding='utf_8')
pad.fromfile(binfile,1) # read the length of the record