How floats are stored in python - python

I have been using python for my assignments for past few days. I noticed one strange thing that -
When I convert string to float - it gives exactly same number of digits that were in string.
When I put this number in file using struct.pack() with 4 bytes floats and read it back using struct.unpack(), it gives a number not exactly same but some longer string which I expect if as per the floating point storage
Ex. - String - 0.931973
Float num - 0.931973
from file - 0.931972980499 (after struct pack and unpack into 4 bytes)
So I am unable to understand how python actually stored my number previously when I read it from string.
EDIT
Writing the float (I think in python 2.7 on ubuntu its other way around, d- double and f-float)
buf = struct.pack("f", float(self.dataArray[i]))
fout.write(buf)
Query -
buf = struct.pack("f", dataPoint)
dataPoint = struct.unpack("f", buf)[0]
node = root
while(node.isBPlusNodeLeaf()) == False:
node = node.findNextNode(dataPoint)
findNextNode -
def findNextNode(self, num):
i = 0
for d in self.dataArray:
if float(num) > float(d):
i = i + 1
continue
else:
break
ptr = self.pointerArray[i]
#open the node before passing on the pointer to it
out, tptr = self.isNodeAlive(ptr)
if out == False:
node = BPlusNode(name = ptr)
node.readBPlusNode(ptr)
return node
else:
return BPlusNode.allNodes[tptr]
once I reach to leaf it reads the leaf and check if the datapoint exist there.
for data in node.dataArray:
if data == dataPoint:
return True
return False
So in this case it returns unsuccessful search for datapoint - 0.931972980499 which is there although.
While following code works fine -
for data in node.dataArray:
if round(float(data), 6) == dataPoint:
return True
return False
I am not able to understand why this is happening

float in Python is actually what C programmers call double, i.e. it is 64 bits (or perhaps even wider on some platforms). So when you store it in 4 bytes (32 bits), you lose precision.
If you use the d format instead of f, you should see the results you expect:
>>> struct.unpack('d', struct.pack('d', float('0.931973')))
(0.931973,)

Related

Python C# datatypes for Sockets

I am creating a socket server to connect and speak with a C# program over TCP. Currently I am trying to create a way to convert the hex sent over the TCP socket to specific variables (the variable type will be in the packet header, and yes I do know tcp is a stream not technically sending packets but I am designing it like this). Currently I have all of the C# integral numeric types converting to and from bytesarray/integers correctly via the code below (All of the different types are the same with a couple edits to fit the c# type)
## SBYTE Type Class definition
## C#/Unity "sbyte" can be from -128 to 127
##
## Usage:
##
## Constructor
## variable = sbyte(integer)
## variable = sbyte(bytearray)
##
## Variables
## sbyte.integer (Returns integer representation)
## sbyte.bytes (Returns bytearray representation)
class sbyte:
def __init__(self, input):
if type(input) == type(int()):
self.integer = input
self.bytes = self.__toBytes(input)
elif type(input) == type(bytearray()):
self.bytes = input
self.integer = self.__toInt(input)
else:
raise TypeError(f"sbyte constructor can take integer or bytearray type not {type(input)}")
## Return Integer from Bytes Array
def __toInt(self, byteArray):
## Check that there is only 1 byte
if len(byteArray) != 1:
raise OverflowError(f"sbyte.__toInt length can only be 1 byte not {len(byteArray)} bytes")
## Return signed integer
return int.from_bytes(byteArray, byteorder='little', signed=True)
## Return Bytes Array from Integer
def __toBytes(self, integer):
## Check that the passed integer is not larger than 128 and is not smaller than -128
if integer > 127 or integer < -128:
raise ValueError(f"sbyte.__toBytes can only take an integer less than or equal to 127, and greater than or equal to -128, not \"{integer}\"")
## Convert the passed integer to Bytes
return integer.to_bytes(1, byteorder='little', signed=True)
This is working for all the types I currently implemented, but I do wonder if there is a better way to handle this? Such as using ctype's or some other python library. Since this will be a socket server with potentially many connections handling this as fast as possible is best. Or if there is anything else you see that I can improve I would love to know.
If all you want is an integer value from a byte array, simply index the byte array:
>>> b = bytearray.fromhex('1E')
>>> b[0]
30
After testing the differenced between from_bytes, struct.unpack, and numpy.frombuffer with the following code:
setup1 = """\
byteArray = bytearray.fromhex('1E')
"""
setup2 = """\
import struct
byteArray = bytearray.fromhex('1E')
"""
setup3 = """\
import numpy as np
type = np.dtype(np.byte)
byteArray = bytearray.fromhex('1E')
"""
stmt1 = "int.from_bytes(byteArray, byteorder='little', signed=True)"
stmt2 = "struct.unpack('b', byteArray)"
stmt3 = "np.frombuffer(byteArray, type)"
print(f"First statement execution time = {timeit.timeit(stmt=stmt1, setup=setup1, number=10**8)}")
print(f"Second statement execution time = {timeit.timeit(stmt=stmt2, setup=setup2, number=10**8)}")
print(f"Third statement execution time = {timeit.timeit(stmt=stmt3, setup=setup3, number=10**8)}")
Results:
First statement execution time = 14.456886599999999
Second statement execution time = 6.671141799999999
Third statement execution time = 21.8327342
From the initial results it looks like struct is the fastest way to accomplish this. Unless there are other libraries I am missing.
EDIT:
Per AKX's suggestion I added the following test for signed byte:
stmt4 = """\
if byteArray[0] <=127:
byteArray[0]
else:
byteArray[0]-127
"""
and got the following execution time:
Fourth statement execution time = 4.581732600000002
Going this path is the fastest although slightly over just using structs. I will have to test with each type for the fastest way to cast the bytes and vice versa but this question gave me 4 different ways to test each one now. Thanks!

Reading binary file into different hex "types" (8bit, 16bit, 32bit, ...)

I have a file which contains binary data. The content of this file is just one long line.
Example: 010101000011101010101
Originaly the content was an array of c++ objects with the following data types:
// Care pseudo code, just for visualisation
int64 var1;
int32 var2[50];
int08 var3;
I want to skip var1 and var3 and only extract the values of var2 into some readable decimal values. My idea was to read the file byte by byte and convert them into hex values. In the next step I though I could "combine" (append) 4 of those hex values to get one int32 value.
Example: 0x10 0xAA 0x00 0x50 -> 0x10AA0050
My code so far:
def append_hex(a, b):
return (a << 4) | b
with open("file.dat", "rb") as f:
counter = 0
tickdifcounter = 0
current_byte=" "
while True:
if (counter >= 8) and (counter < 208):
tickdifcounter+=1
if (tickdifcounter <= 4):
current_byte = append_hex(current_byte, f.read(1))
if (not current_byte):
break
val = ord(current_byte)
if (tickdifcounter > 4):
print hex(val)
tickdifcounter = 0
current_byte=""
counter+=1
if(counter == 209): #209 bytes = int64 + (int32*50) + int08
counter = 0
print
Now I have the problem that my append_hex is not working because the variables are strings so the bitshift is not working.
I am new to python so please give me hints when I can do something in a better way.
You can use struct module for reading binary files.
This can help you Reading a binary file into a struct in Python
A character can be converted to a int using the ord(x) method. In order to get the integer value of a multi-byte number, bitshift left. For example, from a earlier project:
def parseNumber(string, index):
return ord(string[index])<<24 + ord(string[index+1])<<16 + \
ord(string[index+2])<<8+ord(string[index+3])
Note this code assumes big-endian system, you will need to reverse the index for parsing little-endian code.
If you know exaclty what the size of the struct is going to be, (or can easily calculate it based on the size of the file) you are probably better of using the "struct" module.

Implementing DES in Python, cannot understand part of the code

I'm learning how to write the code for DES encryption in Python. I came across this code on Github (link: https://github.com/RobinDavid/pydes/blob/master/pydes.py) but I'm not able to understand a part of the code. (See line 123 in the Github code, also given below:)
def binvalue(val, bitsize): #Return the binary value as a string of the given size
binval = bin(val)[2:] if isinstance(val, int) else bin(ord(val))[2:] # this is line 124 I'm not getting
if len(binval) > bitsize:
raise "binary value larger than the expected size"
while len(binval) < bitsize:
binval = "0"+binval #Add as many 0 as needed to get the wanted size
return binval
I understand what the function does (as written: #Return the binary value as a string of the given size) but I don't understand how it does it, I don't understand line 124. Thanks for answering.
binval = bin(val)[2:] if isinstance(val, int) else bin(ord(val))[2:]
this line is a ternary expression returning the binary value of val if val is integer, else it does the same but on the ASCII code of val.
This is a way (among others) to be compatible with Python 2 and Python 3.
in Python 3, val is an integer, as a part of a bytes data, when val is a 1-sized string as part of a str data in Python 2, which doesn't make a difference between binary & string.
In a nutshell, this is a portable way of converting a byte/character to its binary representation as string.
Note that the author could learn more about python since
while len(binval) < bitsize:
binval = "0"+binval
could be replaced by binval = binval.zfill(bitsize).

Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

I'm trying to modify the code shown far below, which works in Python 2.7.x, so it will also work unchanged in Python 3.x. However I'm encountering the following problem I can't solve in the first function, bin_to_float() as shown by the output below:
float_to_bin(0.000000): '0'
Traceback (most recent call last):
File "binary-to-a-float-number.py", line 36, in <module>
float = bin_to_float(binary)
File "binary-to-a-float-number.py", line 9, in bin_to_float
return struct.unpack('>d', bf)[0]
TypeError: a bytes-like object is required, not 'str'
I tried to fix that by adding a bf = bytes(bf) right before the call to struct.unpack(), but doing so produced its own TypeError:
TypeError: string argument without an encoding
So my questions are is it possible to fix this issue and achieve my goal? And if so, how? Preferably in a way that would work in both versions of Python.
Here's the code that works in Python 2:
import struct
def bin_to_float(b):
""" Convert binary string to a float. """
bf = int_to_bytes(int(b, 2), 8) # 8 bytes needed for IEEE 754 binary64
return struct.unpack('>d', bf)[0]
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
bytes = []
for _ in range(nbytes):
bytes.append(chr(n & 0xff))
n >>= 8
if minlen > 0 and len(bytes) < minlen: # zero pad?
bytes.extend((minlen-len(bytes)) * '0')
return ''.join(reversed(bytes)) # high bytes at beginning
# tests
def float_to_bin(f):
""" Convert a float into a binary string. """
ba = struct.pack('>d', f)
ba = bytearray(ba)
s = ''.join('{:08b}'.format(b) for b in ba)
s = s.lstrip('0') # strip leading zeros
return s if s else '0' # but leave at least one
for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
binary = float_to_bin(f)
print('float_to_bin(%f): %r' % (f, binary))
float = bin_to_float(binary)
print('bin_to_float(%r): %f' % (binary, float))
print('')
To make portable code that works with bytes in both Python 2 and 3 using libraries that literally use the different data types between the two, you need to explicitly declare them using the appropriate literal mark for every string (or add from __future__ import unicode_literals to top of every module doing this). This step is to ensure your data types are correct internally in your code.
Secondly, make the decision to support Python 3 going forward, with fallbacks specific for Python 2. This means overriding str with unicode, and figure out methods/functions that do not return the same types in both Python versions should be modified and replaced to return the correct type (being the Python 3 version). Do note that bytes is a reserved word, too, so don't use that.
Putting this together, your code will look something like this:
import struct
import sys
if sys.version_info < (3, 0):
str = unicode
chr = unichr
def bin_to_float(b):
""" Convert binary string to a float. """
bf = int_to_bytes(int(b, 2), 8) # 8 bytes needed for IEEE 754 binary64
return struct.unpack(b'>d', bf)[0]
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
ba = bytearray(b'')
for _ in range(nbytes):
ba.append(n & 0xff)
n >>= 8
if minlen > 0 and len(ba) < minlen: # zero pad?
ba.extend((minlen-len(ba)) * b'0')
return u''.join(str(chr(b)) for b in reversed(ba)).encode('latin1') # high bytes at beginning
# tests
def float_to_bin(f):
""" Convert a float into a binary string. """
ba = struct.pack(b'>d', f)
ba = bytearray(ba)
s = u''.join(u'{:08b}'.format(b) for b in ba)
s = s.lstrip(u'0') # strip leading zeros
return (s if s else u'0').encode('latin1') # but leave at least one
for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
binary = float_to_bin(f)
print(u'float_to_bin(%f): %r' % (f, binary))
float = bin_to_float(binary)
print(u'bin_to_float(%r): %f' % (binary, float))
print(u'')
I used the latin1 codec simply because that's what the byte mappings are originally defined, and it seems to work
$ python2 foo.py
float_to_bin(0.000000): '0'
bin_to_float('0'): 0.000000
float_to_bin(1.000000): '11111111110000000000000000000000000000000000000000000000000000'
bin_to_float('11111111110000000000000000000000000000000000000000000000000000'): 1.000000
float_to_bin(-14.000000): '1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float('1100000000101100000000000000000000000000000000000000000000000000'): -14.000000
float_to_bin(12.546000): '100000000101001000101111000110101001111110111110011101101100100'
bin_to_float('100000000101001000101111000110101001111110111110011101101100100'): 12.546000
float_to_bin(3.141593): '100000000001001001000011111101110000010110000101011110101111111'
bin_to_float('100000000001001001000011111101110000010110000101011110101111111'): 3.141593
Again, but this time under Python 3.5)
$ python3 foo.py
float_to_bin(0.000000): b'0'
bin_to_float(b'0'): 0.000000
float_to_bin(1.000000): b'11111111110000000000000000000000000000000000000000000000000000'
bin_to_float(b'11111111110000000000000000000000000000000000000000000000000000'): 1.000000
float_to_bin(-14.000000): b'1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float(b'1100000000101100000000000000000000000000000000000000000000000000'): -14.000000
float_to_bin(12.546000): b'100000000101001000101111000110101001111110111110011101101100100'
bin_to_float(b'100000000101001000101111000110101001111110111110011101101100100'): 12.546000
float_to_bin(3.141593): b'100000000001001001000011111101110000010110000101011110101111111'
bin_to_float(b'100000000001001001000011111101110000010110000101011110101111111'): 3.141593
It's a lot more work, but in Python3 you can more clearly see that the types are done as proper bytes. I also changed your bytes = [] to a bytearray to more clearly express what you were trying to do.
I had a different approach from #metatoaster's answer. I just modified int_to_bytes to use and return a bytearray:
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
b = bytearray()
for _ in range(nbytes):
b.append(n & 0xff)
n >>= 8
if minlen > 0 and len(b) < minlen: # zero pad?
b.extend([0] * (minlen-len(b)))
return bytearray(reversed(b)) # high bytes at beginning
This seems to work without any other modifications under both Python 2.7.11 and Python 3.5.1.
Note that I zero padded with 0 instead of '0'. I didn't do much testing, but surely that's what you meant?
In Python 3, integers have a to_bytes() method that can perform the conversion in a single call. However, since you asked for a solution that works on Python 2 and 3 unmodified, here's an alternative approach.
If you take a detour via hexadecimal representation, the function int_to_bytes() becomes very simple:
import codecs
def int_to_bytes(n, minlen=0):
hex_str = format(n, "0{}x".format(2 * minlen))
return codecs.decode(hex_str, "hex")
You might need some special case handling to deal with the case when the hex string gets an odd number of characters.
Note that I'm not sure this works with all versions of Python 3. I remember that pseudo-encodings weren't supported in some 3.x version, but I don't remember the details. I tested the code with Python 3.5.

How can I convert a byte array to an integer more elegantly in Python

I'm receiving a byte array via serial communication and converting part of the byte array to an integer. The code is as follows:
data = conn.recv(40)
print(data)
command = data[0:7]
if(command == b'FORWARD' and data[7] == 3):
value = 0
counter = 8
while (data[counter] != 4):
value = value * 10 + int(data[counter] - 48)
counter = counter + 1
In short, I unpack the bytearray data starting at location 8 and going until I hit a delimiter of b'\x03'. So I'm unpacking an integer of from 1 to 3 digits, and putting the numeric value into value.
This brute force method works. But is there a more elegant way to do it in Python? I'm new to the language and would like to learn better ways of doing some of these things.
You can find the delimiter, convert the substring of the bytearray to str and then int. Here's a little function to do that:
def intToDelim( ba, delim ):
i=ba.find( delim )
return int(str(ba[0:i]))
which you can invoke with
value = intToDelim( data[8:], b'\x04' )
(or with b'\x03' if that's your delimiter). This works in Python 2.7 and should work with little or no change in Python 3.

Categories