Prevent Rounding to Zero in Python - python

I have a program meant to approximate pi using the Chudnovsky Algorithm, but a term in my equation that is very small keeps being rounded to zero.
Here is the algorithm:
import math
from decimal import *
getcontext().prec = 100
pi = Decimal(0.0)
C = Decimal(12/(math.sqrt(640320**3)))
k = 0
x = Decimal(0.0)
result = Decimal(0.0)
sign = 1
while k<10:
r = Decimal(math.factorial(6*k)/((math.factorial(k)**3)*math.factorial(3*k)))
s = Decimal((13591409+545140134*k)/((640320**3)**k))
x += Decimal(sign*r*s)
sign = sign*(-1)
k += 1
result = Decimal(C*x)
pi = Decimal(1/result)
print Decimal(pi)
The equations may be clearer without the "decimal" terms.
import math
pi = 0.0
C = 12/(math.sqrt(640320**3))
k = 0
x = 0.0
result = 0.0
sign = 1
while k<10:
r = math.factorial(6*k)/((math.factorial(k)**3)*math.factorial(3*k))
s = (13591409+545140134*k)/((640320**3)**k)
x += sign*r*s
sign = sign*(-1)
k += 1
result = C*x
pi = 1/result
print pi
The issue is with the "s" variable. For k>0, it always comes to zero. e.g. at k=1, s should equal about 2.1e-9, but instead it is just zero. Because of this all of my terms after the first =0. How do I get python to calculate the exact value of s instead of rounding it down to 0?

Try:
s = Decimal((13591409+545140134*k)) / Decimal(((640320**3)**k))
The arithmetic you're doing is native python - by allowing the Decimal object to perform your division, you should eliminate your error.
You can do the same, then, when computing r.

A couple of comments.
If you are using Python 2.x, the / returns an integer result. If you want a Decimal result, you convert at least one side to Decimal first.
math.sqrt() only return ~16 digits of precision. Since your value for C will only be accurate to ~16 digits, your final result will only be accurate to 16 digits.

If you're doing maths in Python 2.x, you should probably be putting this line into every module:
from __future__ import division
This changes the meaning of the division operator so that it will return a floating point number if needed to give a (closer to) precise answer. The historical behaviour is for x / y to return an int if both x and y are ints, which usually forces the answer to be rounded down.
Returning a float if necessary is generally regarded as a better way to handle division in a language like Python where duck typing is encouraged, since you can just worry about the value of your numbers rather than getting different behaviour for different types.
In Python 3 this is in fact the default, but since old programs relied on the historical behaviour of the division operator it was felt the change was too backwards-incompatible to be made in Python 2. This is why you have to explicitly turn it on with the __future__ import. I would recommend always adding that import in any module that might be doing any mathematics (or just any module at all, if you can be bothered). You'll almost never be upset that it's there, but not having it there has been the cause of a number of obscure bugs I've had to chase.

I feel that the problem with 's' is that all terms are integers, thus you are doing integer maths. A very simple workaround, would be to use 3.0 in the denominator. It only takes one float in the calculation to get a float returned.

Related

Python integer division gives wrong result

I'm not sure if this is a bug or if I'm just misunderstanding how integer division is supposed to work.
Consider the following code:
import math
a = 18000
b = 5500
c = (a-b)/9 # c = 1388.(8)
d = (a-b)/c
e = (a-b)//c
f = math.floor(d)
print(f"(a-b)/c={d}, (a-b)//c={e}, floor={f}") # outputs (a-b)/c=9.0, (a-b)//c=8.0, floor=9
Why is e different from d? As far as I understand, num1//num2 should be equal to math.floor(num1/num2).
Using Python 3.8.10 32bit on Windows 10 Pro.
The difference here is the use of intermediate calculations.
See this question for more details on floating point representation.
(a-b)/9 is not exactly representable in floating point. In python it is stored as 1388.888888888888914152630604803562164306640625
from decimal import Decimal
print(Decimal(c)) # 1388.888888888888914152630604803562164306640625
Observe that c is actually a little larger than the true value of (a-b)/9. Because of this, the true value of (a-b)/c is slightly less than 9. Floor-divide in python gives the floor of the true value of the division, therefore (a-b)//c correctly evaluates to 8.
On the other hand, (a-b)/c results in another floating point precision error. Though the true value is slightly less than 9, the closest value that can be represented is exactly 9. Applying the floor operation to exactly 9 results in 9, as expected.
Because c is a non-integer number of float type, the integer-division // can lead to unintuitive results. Best to only use // when dividing 2 integers with the purpose of creating an int as output type.

Python division returning incorrect result

I want to write a function that takes in 2 values and calculates whether the division is an integer, but i am having trouble handling the case when say 5.1 and 0.1 is entered as the result is not 51 as expected.
def(a,b):
return a/b.is_integer()
An alternative version i tried, to convert the values to decimals as well but still have the same issue.:
def(a,b):
return Decimal(a)%Decimal(b) == 0
This a problem of arithmetic precision in python (and other languages that use floating-point in computers), you could do one of two things in those case, the first case is assuming that it will be some limits and deal with it, or the second case is to use a specialized library that tries to deal with those problems.
A good compromise could be to use limited precision in your divisor, for example you could assume that any divisor b in your function could be a number with max 3 digits of precision, from like 5.999 or 0.001, then you could first multiply dividend a and divisor b by 1000 and try to do the division there
Let's rewrite your function (and assume you are using Python 3)
def is_division_integer(a, b):
a = 1000*a
b = 1000*b
d = a/b
if type(d) == int:
return True
return d.is_integer()
The other alternative could be using a library like numpy, mpmath or others but probably you don't need them for general simple cases.

Why is math.sqrt() incorrect for large numbers?

Why does the math module return the wrong result?
First test
A = 12345678917
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 12345678917
B = 12345678917
Here, the result is correct.
Second test
A = 123456758365483459347856
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 123456758365483459347856
B = 123456758365483467538432
Here the result is incorrect.
Why is that the case?
Because math.sqrt(..) first casts the number to a floating point and floating points have a limited mantissa: it can only represent part of the number correctly. So float(A**2) is not equal to A**2. Next it calculates the math.sqrt which is also approximately correct.
Most functions working with floating points will never be fully correct to their integer counterparts. Floating point calculations are almost inherently approximative.
If one calculates A**2 one gets:
>>> 12345678917**2
152415787921658292889L
Now if one converts it to a float(..), one gets:
>>> float(12345678917**2)
1.5241578792165828e+20
But if you now ask whether the two are equal:
>>> float(12345678917**2) == 12345678917**2
False
So information has been lost while converting it to a float.
You can read more about how floats work and why these are approximative in the Wikipedia article about IEEE-754, the formal definition on how floating points work.
The documentation for the math module states "It provides access to the mathematical functions defined by the C standard." It also states "Except when explicitly noted otherwise, all return values are floats."
Those together mean that the parameter to the square root function is a float value. In most systems that means a floating point value that fits into 8 bytes, which is called "double" in the C language. Your code converts your integer value into such a value before calculating the square root, then returns such a value.
However, the 8-byte floating point value can store at most 15 to 17 significant decimal digits. That is what you are getting in your results.
If you want better precision in your square roots, use a function that is guaranteed to give full precision for an integer argument. Just do a web search and you will find several. Those usually do a variation of the Newton-Raphson method to iterate and eventually end at the correct answer. Be aware that this is significantly slower that the math module's sqrt function.
Here is a routine that I modified from the internet. I can't cite the source right now. This version also works for non-integer arguments but just returns the integer part of the square root.
def isqrt(x):
"""Return the integer part of the square root of x, even for very
large values."""
if x < 0:
raise ValueError('square root not defined for negative numbers')
n = int(x)
if n == 0:
return 0
a, b = divmod(n.bit_length(), 2)
x = (1 << (a+b)) - 1
while True:
y = (x + n//x) // 2
if y >= x:
return x
x = y
If you want to calculate sqrt of really large numbers and you need exact results, you can use sympy:
import sympy
num = sympy.Integer(123456758365483459347856)
print(int(num) == int(sympy.sqrt(num**2)))
The way floating-point numbers are stored in memory makes calculations with them prone to slight errors that can nevertheless be significant when exact results are needed. As mentioned in one of the comments, the decimal library can help you here:
>>> A = Decimal(12345678917)
>>> A
Decimal('123456758365483459347856')
>>> B = A.sqrt()**2
>>> B
Decimal('123456758365483459347856.0000')
>>> A == B
True
>>> int(B)
123456758365483459347856
I use version 3.6, which has no hardcoded limit on the size of integers. I don't know if, in 2.7, casting B as an int would cause overflow, but decimal is incredibly useful regardless.

Convert float to string in positional format (without scientific notation and false precision)

I want to print some floating point numbers so that they're always written in decimal form (e.g. 12345000000000000000000.0 or 0.000000000000012345, not in scientific notation, yet I'd want to the result to have the up to ~15.7 significant figures of a IEEE 754 double, and no more.
What I want is ideally so that the result is the shortest string in positional decimal format that still results in the same value when converted to a float.
It is well-known that the repr of a float is written in scientific notation if the exponent is greater than 15, or less than -4:
>>> n = 0.000000054321654321
>>> n
5.4321654321e-08 # scientific notation
If str is used, the resulting string again is in scientific notation:
>>> str(n)
'5.4321654321e-08'
It has been suggested that I can use format with f flag and sufficient precision to get rid of the scientific notation:
>>> format(0.00000005, '.20f')
'0.00000005000000000000'
It works for that number, though it has some extra trailing zeroes. But then the same format fails for .1, which gives decimal digits beyond the actual machine precision of float:
>>> format(0.1, '.20f')
'0.10000000000000000555'
And if my number is 4.5678e-20, using .20f would still lose relative precision:
>>> format(4.5678e-20, '.20f')
'0.00000000000000000005'
Thus these approaches do not match my requirements.
This leads to the question: what is the easiest and also well-performing way to print arbitrary floating point number in decimal format, having the same digits as in repr(n) (or str(n) on Python 3), but always using the decimal format, not the scientific notation.
That is, a function or operation that for example converts the float value 0.00000005 to string '0.00000005'; 0.1 to '0.1'; 420000000000000000.0 to '420000000000000000.0' or 420000000000000000 and formats the float value -4.5678e-5 as '-0.000045678'.
After the bounty period: It seems that there are at least 2 viable approaches, as Karin demonstrated that using string manipulation one can achieve significant speed boost compared to my initial algorithm on Python 2.
Thus,
If performance is important and Python 2 compatibility is required; or if the decimal module cannot be used for some reason, then Karin's approach using string manipulation is the way to do it.
On Python 3, my somewhat shorter code will also be faster.
Since I am primarily developing on Python 3, I will accept my own answer, and shall award Karin the bounty.
Unfortunately it seems that not even the new-style formatting with float.__format__ supports this. The default formatting of floats is the same as with repr; and with f flag there are 6 fractional digits by default:
>>> format(0.0000000005, 'f')
'0.000000'
However there is a hack to get the desired result - not the fastest one, but relatively simple:
first the float is converted to a string using str() or repr()
then a new Decimal instance is created from that string.
Decimal.__format__ supports f flag which gives the desired result, and, unlike floats it prints the actual precision instead of default precision.
Thus we can make a simple utility function float_to_str:
import decimal
# create a new context for this task
ctx = decimal.Context()
# 20 digits should be enough for everyone :D
ctx.prec = 20
def float_to_str(f):
"""
Convert the given float to a string,
without resorting to scientific notation
"""
d1 = ctx.create_decimal(repr(f))
return format(d1, 'f')
Care must be taken to not use the global decimal context, so a new context is constructed for this function. This is the fastest way; another way would be to use decimal.local_context but it would be slower, creating a new thread-local context and a context manager for each conversion.
This function now returns the string with all possible digits from mantissa, rounded to the shortest equivalent representation:
>>> float_to_str(0.1)
'0.1'
>>> float_to_str(0.00000005)
'0.00000005'
>>> float_to_str(420000000000000000.0)
'420000000000000000'
>>> float_to_str(0.000000000123123123123123123123)
'0.00000000012312312312312313'
The last result is rounded at the last digit
As #Karin noted, float_to_str(420000000000000000.0) does not strictly match the format expected; it returns 420000000000000000 without trailing .0.
If you are satisfied with the precision in scientific notation, then could we just take a simple string manipulation approach? Maybe it's not terribly clever, but it seems to work (passes all of the use cases you've presented), and I think it's fairly understandable:
def float_to_str(f):
float_string = repr(f)
if 'e' in float_string: # detect scientific notation
digits, exp = float_string.split('e')
digits = digits.replace('.', '').replace('-', '')
exp = int(exp)
zero_padding = '0' * (abs(int(exp)) - 1) # minus 1 for decimal point in the sci notation
sign = '-' if f < 0 else ''
if exp > 0:
float_string = '{}{}{}.0'.format(sign, digits, zero_padding)
else:
float_string = '{}0.{}{}'.format(sign, zero_padding, digits)
return float_string
n = 0.000000054321654321
assert(float_to_str(n) == '0.000000054321654321')
n = 0.00000005
assert(float_to_str(n) == '0.00000005')
n = 420000000000000000.0
assert(float_to_str(n) == '420000000000000000.0')
n = 4.5678e-5
assert(float_to_str(n) == '0.000045678')
n = 1.1
assert(float_to_str(n) == '1.1')
n = -4.5678e-5
assert(float_to_str(n) == '-0.000045678')
Performance:
I was worried this approach may be too slow, so I ran timeit and compared with the OP's solution of decimal contexts. It appears the string manipulation is actually quite a bit faster. Edit: It appears to only be much faster in Python 2. In Python 3, the results were similar, but with the decimal approach slightly faster.
Result:
Python 2: using ctx.create_decimal(): 2.43655490875
Python 2: using string manipulation: 0.305557966232
Python 3: using ctx.create_decimal(): 0.19519368198234588
Python 3: using string manipulation: 0.2661344590014778
Here is the timing code:
from timeit import timeit
CODE_TO_TIME = '''
float_to_str(0.000000054321654321)
float_to_str(0.00000005)
float_to_str(420000000000000000.0)
float_to_str(4.5678e-5)
float_to_str(1.1)
float_to_str(-0.000045678)
'''
SETUP_1 = '''
import decimal
# create a new context for this task
ctx = decimal.Context()
# 20 digits should be enough for everyone :D
ctx.prec = 20
def float_to_str(f):
"""
Convert the given float to a string,
without resorting to scientific notation
"""
d1 = ctx.create_decimal(repr(f))
return format(d1, 'f')
'''
SETUP_2 = '''
def float_to_str(f):
float_string = repr(f)
if 'e' in float_string: # detect scientific notation
digits, exp = float_string.split('e')
digits = digits.replace('.', '').replace('-', '')
exp = int(exp)
zero_padding = '0' * (abs(int(exp)) - 1) # minus 1 for decimal point in the sci notation
sign = '-' if f < 0 else ''
if exp > 0:
float_string = '{}{}{}.0'.format(sign, digits, zero_padding)
else:
float_string = '{}0.{}{}'.format(sign, zero_padding, digits)
return float_string
'''
print(timeit(CODE_TO_TIME, setup=SETUP_1, number=10000))
print(timeit(CODE_TO_TIME, setup=SETUP_2, number=10000))
As of NumPy 1.14.0, you can just use numpy.format_float_positional. For example, running against the inputs from your question:
>>> numpy.format_float_positional(0.000000054321654321)
'0.000000054321654321'
>>> numpy.format_float_positional(0.00000005)
'0.00000005'
>>> numpy.format_float_positional(0.1)
'0.1'
>>> numpy.format_float_positional(4.5678e-20)
'0.000000000000000000045678'
numpy.format_float_positional uses the Dragon4 algorithm to produce the shortest decimal representation in positional format that round-trips back to the original float input. There's also numpy.format_float_scientific for scientific notation, and both functions offer optional arguments to customize things like rounding and trimming of zeros.
If you are ready to lose your precision arbitrary by calling str() on the float number, then it's the way to go:
import decimal
def float_to_string(number, precision=20):
return '{0:.{prec}f}'.format(
decimal.Context(prec=100).create_decimal(str(number)),
prec=precision,
).rstrip('0').rstrip('.') or '0'
It doesn't include global variables and allows you to choose the precision yourself. Decimal precision 100 is chosen as an upper bound for str(float) length. The actual supremum is much lower. The or '0' part is for the situation with small numbers and zero precision.
Note that it still has its consequences:
>> float_to_string(0.10101010101010101010101010101)
'0.10101010101'
Otherwise, if the precision is important, format is just fine:
import decimal
def float_to_string(number, precision=20):
return '{0:.{prec}f}'.format(
number, prec=precision,
).rstrip('0').rstrip('.') or '0'
It doesn't miss the precision being lost while calling str(f).
The or
>> float_to_string(0.1, precision=10)
'0.1'
>> float_to_string(0.1)
'0.10000000000000000555'
>>float_to_string(0.1, precision=40)
'0.1000000000000000055511151231257827021182'
>>float_to_string(4.5678e-5)
'0.000045678'
>>float_to_string(4.5678e-5, precision=1)
'0'
Anyway, maximum decimal places are limited, since the float type itself has its limits and cannot express really long floats:
>> float_to_string(0.1, precision=10000)
'0.1000000000000000055511151231257827021181583404541015625'
Also, whole numbers are being formatted as-is.
>> float_to_string(100)
'100'
I think rstrip can get the job done.
a=5.4321654321e-08
'{0:.40f}'.format(a).rstrip("0") # float number and delete the zeros on the right
# '0.0000000543216543210000004442039220863003' # there's roundoff error though
Let me know if that works for you.
Interesting question, to add a little bit more of content to the question, here's a litte test comparing #Antti Haapala and #Harold solutions outputs:
import decimal
import math
ctx = decimal.Context()
def f1(number, prec=20):
ctx.prec = prec
return format(ctx.create_decimal(str(number)), 'f')
def f2(number, prec=20):
return '{0:.{prec}f}'.format(
number, prec=prec,
).rstrip('0').rstrip('.')
k = 2*8
for i in range(-2**8,2**8):
if i<0:
value = -k*math.sqrt(math.sqrt(-i))
else:
value = k*math.sqrt(math.sqrt(i))
value_s = '{0:.{prec}E}'.format(value, prec=10)
n = 10
print ' | '.join([str(value), value_s])
for f in [f1, f2]:
test = [f(value, prec=p) for p in range(n)]
print '\t{0}'.format(test)
Neither of them gives "consistent" results for all cases.
With Anti's you'll see strings like '-000' or '000'
With Harolds's you'll see strings like ''
I'd prefer consistency even if I'm sacrificing a little bit of speed. Depends which tradeoffs you want to assume for your use-case.
using format(float, ' .f '):
old = 0.00000000000000000000123
if str(old).__contains__('e-'):
float_length = str(old)[-2:]
new=format(old,'.'+str(float_length)+'f')
print(old)
print(new)

Increment a Python floating point value by the smallest possible amount

How can I increment a floating point value in python by the smallest possible amount?
Background: I'm using floating point values as dictionary keys.
Occasionally, very occasionally (and perhaps never, but not certainly never), there will be collisions. I would like to resolve these by incrementing the floating point value by as small an amount as possible. How can I do this?
In C, I would twiddle the bits of the mantissa to achieve this, but I assume that isn't possible in Python.
Since Python 3.9 there is math.nextafter in the stdlib. Read on for alternatives in older Python versions.
Increment a python floating point value by the smallest possible amount
The nextafter(x,y) functions return the next discretely different representable floating-point value following x in the direction of y. The nextafter() functions are guaranteed to work on the platform or to return a sensible value to indicate that the next value is not possible.
The nextafter() functions are part of POSIX and ISO C99 standards and is _nextafter() in Visual C. C99 compliant standard math libraries, Visual C, C++, Boost and Java all implement the IEEE recommended nextafter() functions or methods. (I do not honestly know if .NET has nextafter(). Microsoft does not care much about C99 or POSIX.)
None of the bit twiddling functions here fully or correctly deal with the edge cases, such as values going though 0.0, negative 0.0, subnormals, infinities, negative values, over or underflows, etc. Here is a reference implementation of nextafter() in C to give an idea of how to do the correct bit twiddling if that is your direction.
There are two solid work arounds to get nextafter() or other excluded POSIX math functions in Python < 3.9:
Use Numpy:
>>> import numpy
>>> numpy.nextafter(0,1)
4.9406564584124654e-324
>>> numpy.nextafter(.1, 1)
0.10000000000000002
>>> numpy.nextafter(1e6, -1)
999999.99999999988
>>> numpy.nextafter(-.1, 1)
-0.099999999999999992
Link directly to the system math DLL:
import ctypes
import sys
from sys import platform as _platform
if _platform == "linux" or _platform == "linux2":
_libm = ctypes.cdll.LoadLibrary('libm.so.6')
_funcname = 'nextafter'
elif _platform == "darwin":
_libm = ctypes.cdll.LoadLibrary('libSystem.dylib')
_funcname = 'nextafter'
elif _platform == "win32":
_libm = ctypes.cdll.LoadLibrary('msvcrt.dll')
_funcname = '_nextafter'
else:
# these are the ones I have access to...
# fill in library and function name for your system math dll
print("Platform", repr(_platform), "is not supported")
sys.exit(0)
_nextafter = getattr(_libm, _funcname)
_nextafter.restype = ctypes.c_double
_nextafter.argtypes = [ctypes.c_double, ctypes.c_double]
def nextafter(x, y):
"Returns the next floating-point number after x in the direction of y."
return _nextafter(x, y)
assert nextafter(0, 1) - nextafter(0, 1) == 0
assert 0.0 + nextafter(0, 1) > 0.0
And if you really really want a pure Python solution:
# handles edge cases correctly on MY computer
# not extensively QA'd...
import math
# 'double' means IEEE 754 double precision -- c 'double'
epsilon = math.ldexp(1.0, -53) # smallest double that 0.5+epsilon != 0.5
maxDouble = float(2**1024 - 2**971) # From the IEEE 754 standard
minDouble = math.ldexp(1.0, -1022) # min positive normalized double
smallEpsilon = math.ldexp(1.0, -1074) # smallest increment for doubles < minFloat
infinity = math.ldexp(1.0, 1023) * 2
def nextafter(x,y):
"""returns the next IEEE double after x in the direction of y if possible"""
if y==x:
return y #if x==y, no increment
# handle NaN
if x!=x or y!=y:
return x + y
if x >= infinity:
return infinity
if x <= -infinity:
return -infinity
if -minDouble < x < minDouble:
if y > x:
return x + smallEpsilon
else:
return x - smallEpsilon
m, e = math.frexp(x)
if y > x:
m += epsilon
else:
m -= epsilon
return math.ldexp(m,e)
Or, use Mark Dickinson's excellent solution
Obviously the Numpy solution is the easiest.
Python 3.9 and above
Starting with Python 3.9, released 2020-10-05, you can use the math.nextafter function:
math.nextafter(x, y)
Return the next floating-point value after x towards y.
If x is equal to y, return y.
Examples:
math.nextafter(x, math.inf) goes up: towards positive infinity.
math.nextafter(x, -math.inf) goes down: towards minus infinity.
math.nextafter(x, 0.0) goes towards zero.
math.nextafter(x, math.copysign(math.inf, x)) goes away from zero.
See also math.ulp().
First, this "respond to a collision" is a pretty bad idea.
If they collide, the values in the dictionary should have been lists of items with a common key, not individual items.
Your "hash probing" algorithm will have to loop through more than one "tiny increments" to resolve collisions.
And sequential hash probes are known to be inefficient.
Read this: http://en.wikipedia.org/wiki/Quadratic_probing
Second, use math.frexp and sys.float_info.epsilon to fiddle with mantissa and exponent separately.
>>> m, e = math.frexp(4.0)
>>> (m+sys.float_info.epsilon)*2**e
4.0000000000000018
Forgetting about why we would want to increment a floating point value for a moment, I would have to say I think Autopulated's own answer is probably correct.
But for the problem domain, I share the misgivings of most of the responders to the idea of using floats as dictionary keys. If the objection to using Decimal (as proposed in the main comments) is that it is a "heavyweight" solution, I suggest a do-it-yourself compromise: Figure out what the practical resolution is on the timestamps, pick a number of digits to adequately cover it, then multiply all the timestamps by the necessary amount so that you can use integers as the keys. If you can afford an extra digit or two beyond the timer precision, then you can be even more confident that there will be no or fewer collisions, and that if there are collisions, you can just add 1 (instead of some rigamarole to find the next floating point value).
I recommend against assuming that floats (or timestamps) will be unique if at all possible. Use a counting iterator, database sequence or other service to issue unique identifiers.
Instead of incrementing the value, just use a tuple for the colliding key. If you need to keep them in order, every key should be a tuple, not just the duplicates.
A better answer (now I'm just doing this for fun...), motivated by twiddling the bits. Handling the carry and overflows between parts of the number of negative values is somewhat tricky.
import struct
def floatToieee754Bits(f):
return struct.unpack('<Q', struct.pack('<d', f))[0]
def ieee754BitsToFloat(i):
return struct.unpack('<d', struct.pack('<Q', i))[0]
def incrementFloat(f):
i = floatToieee754Bits(f)
if f >= 0:
return ieee754BitsToFloat(i+1)
else:
raise Exception('f not >= 0: unsolved problem!')
Instead of resolving the collisions by changing the key, how about collecting the collisions? IE:
bag = {}
bag[1234.] = 'something'
becomes
bag = collections.defaultdict(list)
bag[1234.].append('something')
would that work?
For colliding key k, add: k / 250
Interesting problem. The amount you need to add obviously depends on the magnitude of the colliding value, so that a normalized add will affect only the least significant bits.
It's not necessary to determine the smallest value that can be added. All you need to do is approximate it. The FPU format provides 52 mantissa bits plus a hidden bit for 53 bits of precision. No physical constant is known to anywhere near this level of precision. No sensor is able measure anything near it. So you don't have a hard problem.
In most cases, for key k, you would be able to add k/253, because of that 52-bit fraction plus the hidden bit.
But it's not necessary to risk triggering library bugs or exploring rounding issues by shooting for the very last bit or anything near it.
So I would say, for colliding key k, just add k / 250 and call it a day.1
1. Possibly more than once until it doesn't collide any more, at least to foil any diabolical unit test authors.
import sys
>>> sys.float_info.epsilon
2.220446049250313e-16
Instead of modifying your float timestamp, use a tuple for every key as Mark Ransom suggests where the tuple (x,y) is composed of x=your_unmodified_time_stamp and y=(extremely unlikely to be a same value twice).
So:
x just is the unmodified timestamp and can be the same value many times;
y you can use:
a random integer number from a large range,
serial integer (0,1,2,etc),
UUID.
While 2.1 (random int from a large range) there works great for ethernet, I would use 2.2 (serializer) or 2.3 (UUID). Easy, fast, bulletproof. For 2.2 and 2.3 you don't even need collision detection (you might want to still have it for 2.1 as ethernet does.)
The advantage of 2.2 is that you can also tell, and sort, data elements that have the same float time stamp.
Then just extract x from the tuple for any sorting type operations and the tuple itself is a collision free key for the hash / dictionary.
Edit
I guess example code will help:
#!/usr/bin/env python
import time
import sys
import random
#generator for ints from 0 to maxinteger on system:
serializer=(sn for sn in xrange(0,sys.maxint))
#a list with guranteed collisions:
times=[]
for c in range(0,35):
t=time.clock()
for i in range(0,random.choice(range(0,4))):
times.append(t)
print len(set(times)), "unique items in a list of",len(times)
#dictionary of tuples; no possibilities of collisions:
di={}
for time in times:
sn=serializer.next()
di[(time,sn)]='Element {}'.format(sn)
#for tuples of multiple numbers, Python sorts
# as you expect: first by t[0] then t[1], until t[n]
for key in sorted(di.keys()):
print "{:>15}:{}".format(key, di[key])
Output:
26 unique items in a list of 55
(0.042289, 0):Element 0
(0.042289, 1):Element 1
(0.042289, 2):Element 2
(0.042305, 3):Element 3
(0.042305, 4):Element 4
(0.042317, 5):Element 5
# and so on until Element n...
Here it part of it. This is dirty and slow, but maybe that is how you like it. It is missing several corner cases, but maybe this gets someone else close.
The idea is to get the hex string of a floating point number. That gives you a string with the mantissa and exponent bits to twiddle. The twiddling is a pain since you have to do all it manually and keep converting to/from strings. Anyway, you add(subtract) 1 to(from) the last digit for positive(negative) numbers. Make sure you carry through to the exponent if you overflow. Negative numbers are a little more tricky to make you don't waste any bits.
def increment(f):
h = f.hex()
# decide if we need to increment up or down
if f > 0:
sign = '+'
inc = 1
else:
sign = '-'
inc = -1
# pull the string apart
h = h.split('0x')[-1]
h,e = h.split('p')
h = ''.join(h.split('.'))
h2 = shift(h, inc)
# increase the exponent if we added a digit
h2 = '%s0x%s.%sp%s' % (sign, h2[0], h2[1:], e)
return float.fromhex(h2)
def shift(s, num):
if not s:
return ''
right = s[-1]
right = int(right, 16) + num
if right > 15:
num = right // 16
right = right%16
elif right < 0:
right = 0
num = -1
else:
num = 0
# drop the leading 0x
right = hex(right)[2:]
return shift(s[:-1], num) + right
a = 1.4e4
print increment(a) - a
a = -1.4e4
print increment(a) - a
a = 1.4
print increment(a) - a
I think you mean "by as small an amount possible to avoid a hash collision", since for example the next-highest-float may already be a key! =)
while toInsert.key in myDict: # assumed to be positive
toInsert.key *= 1.000000000001
myDict[toInsert.key] = toInsert
That said you probably don't want to be using timestamps as keys.
After Looking at Autopopulated's answer I came up with a slightly different answer:
import math, sys
def incrementFloatValue(value):
if value == 0:
return sys.float_info.min
mant, exponent = math.frexp(value)
epsilonAtValue = math.ldexp(1, exponent - sys.float_info.mant_dig)
return math.fsum([value, epsilonAtValue])
Disclaimer: I'm really not as great at maths as I think I am ;) Please verify this is correct before using it. Also I'm not sure about performance
some notes:
epsilonAtValue calculates how many bits are used for the mantissa (the maximum minus what is used for the exponent).
I'm not sure if the math.fsum() is needed but hey it doesn't seem to hurt.
It turns out that this is actually quite complicated (maybe why seven people have answered without actually providing an answer yet...).
I think this is the right solution, it certainly seems to handle 0 and positive values correctly:
import math
import sys
def incrementFloat(f):
if f == 0.0:
return sys.float_info.min
m, e = math.frexp(f)
return math.ldexp(m + sys.float_info.epsilon / 2, e)

Categories