Using 'Decimal' numbers with scipy? - python

I have numbers too large to use with Python's inbuilt types and as such am using the decimal library. I want to use scipy.optimise.brentq with a function that operates on 'decimals' but when the function returns decimal it obviously cannot be used with the optimisation function's float based internals. How can I get around this: How can I use scipy optimisation techniques with the Decimal class for big numbers?

You can't. Scipy heavily relies on numerical algorithms that only deal with true numerical data types, and can't deal with the decimal class.
As a general rule of thumb: If your problem is well-defined and well-conditioned (that's something that numerical mathematicians define), you can just scale it so that it fits into normal python floats, and then you can apply scipy functionality to it.
If your problem, however, involves very small numbers as well as numbers that can't fit into float, there's little you can numerically do about that problem usually: It's hard to find a good solution.
If, however, your function only returns values that would fit into float, then you could just use
lambda x: float(your_function(x))
instead of your_function in brentq.

Related

Numpy array_equal and float exact equality check

I know that similar precision questions have been asked here however I am reading a code of a project that is doing an exact equality comparison among floats and is puzzling me.
Assume that x1 and x2 are of type numpy.ndarray and of dtype np.float32. These two variables have been computed by the same code executed on the same data but x1 has been computed by one machine and x2 by another (this is done on an AWS cluster which communicates with MPI).
Then the values are compared as follows
numpy.array_equal(x1, x2)
Hence, exact equality (no tolerance) is crucial for this program to work and it seems to work fine. This is confusing me. How can one compare two np.float32 computed on different machines and face no precision issues? When can these two (or more) floats can be equal?
The arithmetic specified by IEEE-754 is deterministic given certain constraints discussed in its clause 11 (2008 version), including suitable rules for expression evaluation (such as unambiguous translation from expressions in a programming language to IEEE-754 operations, such as a+b+c must give (a+b)+c, not a+(b+c)).
If parallelism is not used or is constructed suitably, such as always partitioning a job into the same pieces and combining their results in the same way regardless of order of completion of computations, then obtaining identical results is not surprising.
Some factors that prevent reproducibility include varying parallelism, using different math libraries (with different implementations of functions such as pow), and using languages that are not strict about floating-point evaluation (such as permitting, but not requiring, extra precision).

Do sklearn algorithms internally work with double precision?

I am using sklearn for machine learning purposes. If I found out correctly, the float type in python works with double precision. Does sklearn work with the same precision internally? I pass data to sklearn in lists/numpy arrays filled with floats (is this even relevant?).
Do I have to be worried about error propagation? I guess I do not, if double precision is used.
Just want to make sure.
sklearn does not seem to specify how it works internally regarding data types. However, it probably makes sense to assume it retains at least the precision of the input data type. So, to be on the safe side, probably specify dtype as double in your data.
In practice error propagation should not be an issue, since most algorithms are approximative in nature, and some of them rely much more on the random initial conditions than accuracy. Recently, there is even the suggestion that we should limit accuracy to save resources, since the impact is small. See for example
https://arxiv.org/pdf/1502.02551.pdf

Pure Python way of calculating trig functions?

I have a small Python script that I want to convert into a single file executable using pyinstaller.
The script essentially takes an input file, manipulates it and writes an output file.
However, I need to calculate the tangent function in the script, and when I do this using either numpy.tan() or math.tan(), the final executable ends up being ridiculously huge because pyinstaller bundles either the whole numpy or math modules in the executable.
Thus, my question is, is there a pure python method to calculate trig functions?
My first though was that it must be possible to define sin, cos and tan purely mathematically, but I could not find a way to do this.
http://www.efunda.com/math/taylor_series/trig.cfm
the taylor series expansion is a usual numerical method to solve to arbitary precision.
By the way, you should take care of the cyclic property of the trig functions as the error is proportional to the input (well to some power of it), if you do not expect mostly well behaved usage.
It's possible to express trigonometric functions with complex numbers: Eulers formula
As this would however require you to perform complex math, and you would have to import cmath, you would need to implement complex math on your own if you want to go this way.
Otherwise, a simple approximation can be acquired by evaluating a taylor series. Again, as import math is not an option, this requires you to implement some pretty fundamental stuff like powers and factorials on your own.

Python's decimal module and the table maker's dilemma

According to the documentation, the .exp() operation in
Python's decimal module "is correctly rounded using ...".
Because of the table maker's dilemma, I hope that's not guaranteed, since I'd prefer a guarantee
that it's computation on a normal-looking input with moderately low precision won't take, eg., a year.
How does Python address this?
(Is it different between versions?)
The exp() and pow() functions are different.
The "table-maker's dilemma" explanation you link to states that xy cannot be correctly rounded by any known algorithm with a bounded amount of time. However, this is obviously not the case for all subsets of its domain. If we restrict the domain to x=3 and y=2, then I can tell you what the correctly-rounded answer is.
A quick Google search turns up Correctly-Rounded Exponential Function in Double-Precision Arithmetic, by David Defour, Florent de Dinechin, Jean-Michel Muller (CiteSeer, PDF). The article provides an algorithm for computing a correctly-rounded exp() and provides the worst-case bound on its running time.
This is not the radix=10 case, but it shows how the table-maker's dilemma does not necessarily apply to the exp() function.

Fixed-point arithmetic

Does anyone know of a library to do fixed point arithmetic in Python?
Or, does anyone has sample code?
If you are interested in doing fixed point arithmetic, the Python Standard Library has a decimal module that can do it.
Actually, it has a more flexible floating point ability than the built-in too. By flexible I mean that it:
Has "signals" for various exceptional conditions (these can be set to do a variety of things on signaling)
Has positive and negative
infinities, as well as NaN (not a
number)
Can differentiate between positive
and negative 0
Allows you to set different rounding
schemes.
Allows you to set your own min and
max values.
All in all, it is handy for a million household uses.
The deModel package sounds like what you're looking for.
Another option worth considering if you want to simulate the behaviour of binary fixed-point numbers beyond simple arithmetic operations, is the spfpm module. That will allow you to calculate square-roots, powers, logarithms and trigonometric functions using fixed numbers of bits. It's a pure-python module, so doesn't offer the ultimate performance but can do hundreds of thousands of arithmetic operations per second on 256-bit numbers.
recently I'm working on similar project, https://numfi.readthedocs.io/en/latest/
>>> from numfi import numfi
>>> x = numfi(0.68751,1,6,3)
>>> x + 1/3
numfi([1.125]) s7/3-r/s
>>> np.sin(x)
numfi([0.625 ]) s6/3-r/s

Categories