Handling very small numbers in python

Handling very small numbers in python - python

I am currently working with very small numbers in my python program, e.g.
x = 200 + 2e-26
One solution is to work with logarithmic values which would increase the range of my float value. The problem is that I have to do a fft with these values, too, and therefore using the logarithmic approach is not usable (and using the Decimal-module neither). Is there another way to solve that problem?
Edit: My problem with the decimal module is: How can I handle imaginary values? I tried a = Decimal(1e-26)+Decimal(1e-26*1j) and a = Decimal(1e-26)+Decimal(1e-26)*1j, and both ways failed (error on request).

Consider trying out the mpmath package.
>>> from mpmath import mpf, mpc, mp
>>> mp.dps = 40
>>> mpf(200) + mpf(2e-26) + mpc(1j)
mpc(real='200.0000000000000000000000000200000000000007', imag='1.0')
Mostly accurate and can handle complex numbers, more details in the documentation.

While numpy supports more decimal types (and also complex versions), they don't help:
>>> import numpy
>>> numpy.longfloat
<type 'numpy.float128'>
>>> a = numpy.array([200, 2e-26], dtype=numpy.longfloat)
>>> a
array([ 200.0, 2e-26], dtype=float128)
>>> a.sum()
200.0
>>> a = numpy.array([200, 2e-26], dtype=numpy.longdouble)
>>> a.sum()
200.0
The reason is explained here: Internally, numpy uses 80 bits which means 63 bits mantissa which just supports 63/3 = 21 digits.
What you need is a real 128bit float type like the one from boost.
Try the Boost.Python module which might give you access to this type. If that doesn't work, then you'll have to write your own wrapper class in C++ as explained here.

Related

Hexadecimal notation of square roots in Python - Sha-512

I am going over the description of Sha-512. It is mentioned that the initial hash value consists of the sequence of 64-bit words that are obtained by taking the fractional part of the first eight primes. I am trying to replicate these values in Python, but I am not getting the same results. To include more digits, I am using the mpmath library.
from mpmath import *
mp.dps = 50
sqrt(2)
# mpf('1.4142135623730950488016887242096980785696718753769468')
mpf(0.4142135623730950488016887242096980785696718753769468 * 2 ** 64)
# mpf('7640891576956012544.0')
hex(7640891576956012544)
# '0x6a09e667f3bcc800'
However, the description indicates this value must be 6a09e667f3bcc908. As it can be seen, the result I get differs in the last three digits from what I should be getting according to the description. I was wondering why that is the case, and what is the correct approach.
I have come across a similar question, but adjusting it for 64-bit word would yield:
import math
hex(int(math.modf(math.sqrt(2))[0]*(1<<64)))
# '0x6a09e667f3bcd000'
which actually differs in the last four digits.

As a comment already explained, you're actually only using 53 bits in your calculations (native CPython float precision).
Here's an easy way to reproduce the result you're apparently after:
>>> import decimal
>>> x = decimal.getcontext().sqrt(2) - 1
>>> x
Decimal('0.414213562373095048801688724')
>>> hex(int(x * 2**64))
'0x6a09e667f3bcc908'
Nothing really magical about decimal. It just happens to use enough precision by default. You could certainly do the same with mpmath.
For example,
>>> import mpmath
>>> mpmath.mp.prec = 80
>>> hex(int(mpmath.frac( mpmath.sqrt(2) ) * 2**64))
'0x6a09e667f3bcc908'

Accuracy of deriving the CDF using integration

I have two ways of deriving the probability of a normally (say) distributed random variable to be within an interval. The first and most straight-forward is the following:
import scipy.stats
print scipy.stats.norm.cdf(6) - scipy.stats.norm.cdf(5)
# 2.85664984223e-07
And the second is by integrating the pdf:
import scipy.integrate
print scipy.integrate.quad(scipy.stats.norm.pdf, 5, 6)[0]
# 2.85664984234e-07
The difference in this case is really tiny, but it doesn't mean it can't grow larger for other distributions or integration limits. Can you tell which is more accurate and why?
By the way, the first alternative seems to be at least 10 times faster, so if it is also more accurate (which would be my guess, since it is somewhat specialized), then it is perfect.

In this particular case, given those particular numbers, the quad approach will actually be more accurate. The CDF itself can be computed quickly and accurately, of course, but look at the actual numbers:
>>> scipy.stats.norm.cdf(6), scipy.stats.norm.cdf(5)
(0.9999999990134123, 0.99999971334842808)
When you're differencing two very similar quantities, you lose accuracy. Similar problems can be mitigated somewhat during integration if the coders are careful with their summations.
Anyway, we can check this against a high-resolution calculation using mpmath:
>>> via_cdf = scipy.stats.norm.cdf(6)-scipy.stats.norm.cdf(5)
>>> via_quad = scipy.integrate.quad(scipy.stats.norm.pdf, 5, 6)[0]
>>> import mpmath
>>> mpmath.mp.dps = 100
>>> def cdf(x): return 0.5 * (1 + mpmath.erf(x/mpmath.sqrt(2)))
>>> highres = cdf(6)-cdf(5)
>>> highres
mpf('0.0000002856649842341562135330514687422473118357532223619105443630157837185833042478210791954518847897468442097')
>>> float((highres - via_quad)/highres)
-2.3824773334590333e-16
>>> float((highres - via_cdf)/highres)
3.86659439572868e-11

The first calls an implementation of the cdf included in scipy.special. The latter actually does the integration. The former is probably more accurate (as it is limited only by the computer's ability do evaluate the CDF and not by any errors introduced by numerical integration). In practice, unless you need results that are good to better than 6 decimal places, you're probably fine.

Exact Sine/Cosine/Tangent of Various Angles [duplicate]

This question already has answers here:
Python cos(90) and cos(270) not 0
(3 answers)
Closed 9 years ago.
Is there a way to get the exact Tangent/Cosine/Sine of an angle (in radians)?
math.tan()/math.sin()/math.cos() does not give the exact for some angles:
>>> from math import *
>>> from decimal import Decimal
>>> sin(pi) # should be 0
1.2246467991473532e-16
>>> sin(2*pi) # should be 0
-2.4492935982947064e-16
>>> cos(pi/2) # should be 0
6.123233995736766e-17
>>> cos(3*pi/2) # 0
-1.8369701987210297e-16
>>> tan(pi/2) # invalid; tan(pi/2) is undefined
1.633123935319537e+16
>>> tan(3*pi/2) # also undefined
5443746451065123.0
>>> tan(2*pi) # 0
-2.4492935982947064e-16
>>> tan(pi) # 0
-1.2246467991473532e-16
I tried using Decimal(), but this does not help either:
>>> tan(Decimal(pi)*2)
-2.4492935982947064e-16
numpy.sin(x) and the other trigonometric functions also have the same issue.
Alternatively, I could always create a new function with a dictionary of values such as:
def new_sin(x):
sin_values = {math.pi: 0, 2*math.pi: 0}
return sin_values[x] if x in sin_values.keys() else math.sin(x)
However, this seems like a cheap way to get around it. Is there any other way? Thanks!

It is impossible to store the exact numerical value of pi in a computer. math.pi is the closest approximation to pi that can be stored in a Python float. math.sin(math.pi) returns the correct result for the approximate input.
To avoid this, you need to use a library that supports symbolic arithmetic. For example, with sympy:
>>> from sympy import *
>>> sin(pi)
0
>>> pi
pi
>>>
sympy will operate on an object that represents pi and can give exact results.

When you're dealing with inexact numbers, you need to deal with error explicitly. math.pi (or numpy.pi) isn't exactly π, it's, e.g., the closest binary rational number in 56 digits to π. And the sin of that number is not 0.
But it is very close to 0. And likewise, tan(pi/2) is not infinity (or NaN), but huge, and asin(1)/pi is very close to 0.5.
So, even if the algorithms were somehow exact, the results still wouldn't be exact.
If you've never read What Every Computer Scientist Should Know About Floating-Point Arithmetic, you should do so now.
The way to deal with this is to use epsilon-comparisons rather than exact comparisons everywhere, and explicitly round things when printing them out, and so on.
Using decimal.Decimal numbers instead of float numbers makes this easier. First, you probably think in decimal rather than binary, so it's easier for you to understand and make decisions about the error. Second, you can explicitly set precision and other context information on Decimal values, while float are always IEEE double values.
The right way to do it is to do full error analysis on your algorithms, propagate the errors appropriately, and use that information where it's needed. The simple way is to just pick some explicit absolute or relative epsilon (and the equivalent for infinity) that's "good enough" for your application, and use that everywhere. (You'll probably also want to use the appropriate domain-specific knowledge to treat some values as multiples of pi instead of just raw values.)

Patterns in Binary Fractions with Python

Researching pattern identification requires to identify the repeating patterns in binary representations of fractions of rational numbers. bin(2**24/n) strips the leading zeros, e.g. bin(2**24/11) -> 0b101110100010111010001 instead of 0b000101110100010111010001. The number of leading zeros is variable of course. The obvious pattern here is 0001011101...
I am a nubee with Python still on the learning curve. Is there a Python-appropriate way to approach this?

This can be done with string formatting, in 2.6+:
>>> '{0:024b}'.format(23)
'000000000000000000010111'

You might find the bitstring module useful if you have more advanced needs than string formatting can provide.
>>> from bitstring import BitArray
>>> a = BitArray(24) # 24 zero bits
>>> a.uint = 2**24/11 # set the unsigned integer propery
>>> a.bin # get the binary propery
'000101110100010111010001'
It will never cut off leading zero bits, and can do a few other useful tricks
>>> a.uint /= 2
>>> a.bin
'000010111010001011101000'
>>> list(a.findall('0b1011'))
[4, 14]
>>> a *= 2 # concatenation
>>> a.bin
'000010111010001011101000000010111010001011101000'
>>> a.replace('0b00001', '0xe')
2 # 2 replacements made
>>> a.bin
'1110011101000101110100011100111010001011101000'
I'm not sure of what your exact needs are, so all this could be overkill and you might not want to use an external library in any case, but Python in-built support for bit arrays is a little basic.

Wrong results with Python multiply() and prod()

Can anyone explain the following? I'm using Python 2.5
Consider 1*3*5*7*9*11 ... *49. If you type all that from within IPython(x,y) interactive console, you'll get 58435841445947272053455474390625L, which is correct. (why odd numbers: just the way I did it originally)
Python multiply.reduce() or prod() should yield the same result for the equivalent range. And it does, up to a certain point. Here, it is already wrong:
: k = range(1, 50, 2)
: multiply.reduce(k)
: -108792223
Using prod(k) will also generate -108792223 as the result. Other incorrect results start to appear for equivalent ranges of length 12 (that is, k = range(1,24,2)).
I'm not sure why. Can anyone help?

This is because numpy.multiply.reduce() converts the range list to an array of type numpy.int32, and the reduce operation overflows what can be stored in 32 bits at some point:
>>> type(numpy.multiply.reduce(range(1, 50, 2)))
<type 'numpy.int32'>
As Mike Graham says, you can use the dtype parameter to use Python integers instead of the default:
>>> res = numpy.multiply.reduce(range(1, 50, 2), dtype=object)
>>> res
58435841445947272053455474390625L
>>> type(res)
<type 'long'>
But using numpy to work with python objects is pointless in this case, the best solution is KennyTM's:
>>> import functools, operator
>>> functools.reduce(operator.mul, range(1, 50, 2))
58435841445947272053455474390625L

The CPU doesn't multiply arbitrarily large numbers, it only performs specific operations defined on particular ranges of numbers represented in base 2, 0-1 bits.
Python '*' handles large integers perfectly through a proper representation and special code beyond the CPU or FPU instructions for multiply.
This is actually unusual as languages go.
In most other languages, usually a number is represented as a fixed array of bits. For example in C or SQL you could choose to have an 8 bit integer that can represent 0 to 255, or -128 to +127 or you could choose to have a 16 bit integer that can represent up to 2^16-1 which is 65535. When there is only a range of numbers that can be represented, going past the limit with some operation like * or + can have an undesirable effect, like getting a negative number. You may have encountered such a problem when using the external library which is probably natively C and not python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.