I am using a function that multiplies probabilities there by creating very small values. I am using decimal.Decimal module to handle it and then when the compuation is complete I convert that decimal to logofOdds using math.log module/function. But, below a certain proability python cannot convert these very small probabilities to log2 or 10 of likelyhood ratio.
I am getting ValueError: math domain error
So, I printed the value before the traceback started and it seems to be this number:
2.4876626750969332485460767406646530276378975654773588506772125620858727319570054153525540357327805722211631386444621446226193195409521079089382667946955357511114536197822067973513019098983691433561051610219726750413489309980667312714519374641433925197450250314924925500181809328656811236486523523785835600132361529950090E-366
Other small numbers like this are getting handled by math.log though in the same program:
5.0495856951184114023890172277484001329118412629157526209503867218204386939259819037402424581363918720565886924655927609161379229574865468595907661385853201472751861413845827437245978577896538019445515183910587509474989069747817303700894727201121392323641965506674606552182934813779310061601566189062725979740753305935661E-31
Is it true? any way to fix this. I know I can take the log of the probs and then sum it along the way, but when I tried to do that, it seems I have to update several places in my program - could take significant hours or days. and there is another process to convert it back to decimal.
Thanks,
If you want to take logarithms of Decimal objects, use the ln or log10 methods. Aside from a weird special case for huge ints, math.log casts inputs to float.
whatever_decimal.ln()
Related
There is a point in my program where I compare a number with zero and if it is zero I execute a series of commands. The number calculated should be zero for some conditions but it just comes out to be a really small number (**(-17)) instead of exactly zero. So, I am planning to use format to rectify that by making very small numbers zero. I test ran this below and this simple code did not work.
I avoided using round because that might hinder the precision I want in my my code.
Is there a way implement format so that it works how I want it to?
Here:
a='{:.5f}'.format(0.00000056)
print(a)
if a==0.:
print('success')
output:
0.00000
As i found out, decimal is is more precise at the cost of processing power.
And i found out that
getcontext().prec = 2
Decimal(number)
also counts the numbers before the point. In my case i need it only to calculate two numbers after the point no matter how big the number is. Because sometimes i got numbers like 12345.15 and sometimes numbers like 2.53. And what if the numbers are 5 or 298.1?
Im a bit confused with all these differences between float, decimal, rounding and truncate.
My main question is:
How can i calculate with a Number like 254.12 or 15.35 with fewest resource costs? Maybe it is even possible to fake these numbers? The rounding doesn't matter but calculating with floats and 8 digits after the point and then truncating them seems like a waste of resources to me. Please correct me if im wrong.
I also know how to do Benchmarks with
import time
start_time = time.clock()
main()
print time.clock() - start_time, "seconds"
But im sure there are enough things i dont know about. Since im quite new to programming i would be very happy if someone can give me a few hints with a piece of code to work and learn with. Thank you for taking your time to read this! :)
First, please be aware that floating point operations are not necessarily as expensive as you might fear. This will depend on the CPU you are using, but for example, a single floating point operation in a mainly integer program will cost about as much as an integer operation, due to pipelining. It's like going to a bathroom at a night club. There's always a line for the girls bathroom - integer ops - but never a line for the guys - floating point ops.
On the other hand, low-power CPUs may not even include floating point support at all, making any float operation hideously expensive! So before you get all judgy about whether you should use float or integer operations, do some profiling. You mention using time.clock and comparing start with end times. You should have a look at the timeit module shipped with python.
Worse than bad performance, though, is the fact that floats don't always represent the number you want. Regardless of decimal point, if a number is large enough, or if you do the wrong operation to it, you can end up with a float that "approximates" your result without storing it exactly.
If you know that your application requires two digits beyond the decimal, I'd suggest that you write a class to implement that behavior using integer numbers. Python's integers automatically convert to big numbers when they get large, and the precision is exact. So there's a performance penalty (bignum ops are slower than integer or float ops on top-end hardware). But you can guarantee whatever behavior you want.
If your application is financial, please be aware that you are going to have to spend some time dealing with rounding issues. Everybody saw Superman 3, and now they think you're stealing their .00001 cents...
In Python, all floats are all the same size, regardless of precision, because they are all represented in a single 'double' type. This means that either way you will be 'wasting' memory (24 bytes is a tiny amount, really). Using sys.getsizeof shows this:
>>> import sys
>>> sys.getsizeof(8.13333333)
24
>>> sys.getsizeof(8.13)
24
This is also shown in the fact that if an int is too big (ints have no max value) it can't be converted into a float - you get an OverflowError:
>>> 2**1024 + 0.5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float
Using Decimal is even less efficient, even when the precision is set up right. This is because class instances take up lots of space on their own, regardless of their content:
>>> import decimal
>>> sys.getsizeof(decimal.Decimal(0))
80
The prec setting actually affects the total number of digits, not only the ones after the decimal point. Regarding that, the documentation says
The significance of a new Decimal is determined solely by the number
of digits input. Context precision and rounding only come into play
during arithmetic operations.
and also
The quantize() method rounds a number to a fixed exponent. This method
is useful for monetary applications that often round results to a
fixed number of places
So these are a few things you can look for if you need to work with a fixed number of digits after the point.
Regarding performance, it seems to me that you are prematurely optimizing. You generally don't need to worry about the fastest way to do a calculation that will take less than a microsecond (unless, of course, you need to do something on the order of millions of such calculations per second). On a quick benchmark, a sum of two numbers takes 48 nanoseconds for floats and 82 nanoseconds for Decimals. The impact of that difference should be very little for most applications.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
This might seem really silly. But I am new to python and like to use equality conditions in my program, and has hit a very surprising road block. While the practical issue here is that the last condition r==rmax is not satisfied and I will miss out on an iteration of the loop, but that is not what is worrying me.
Rather than (trivial) work arounds, can someone explain to me what is going on in simple terms? (Also why the numbers turn out the same no matter how many times I run this loop, therefore it is something systematic and not something probabilistic).
And a proper way to make sure this does not happen (What I mean by proper is a programming practice I should adopt in all my coding, so that such unintentional discrepancy does not occur ever again)? I mean such loops are omnipresent in my codes and it makes me worried.
It seems I cannot trust numbers in python, which would make it useless as a computational tool.
PS : I am working on a scientific computing project with Numpy.
>>> while r<=r_max:
... print repr(r)
... r = r + r_step
...
2.4
2.5
2.6
2.7
2.8000000000000003
2.9000000000000004
3.0000000000000004
3.1000000000000005
3.2000000000000006
3.3000000000000007
3.400000000000001
3.500000000000001
3.600000000000001
3.700000000000001
3.800000000000001
3.9000000000000012
The simple answer is that floating-point numbers, as usually represented in computing, aren't exact, and you can't treat them as if they were exact. Treat them as if they're fuzzy; see if they're within a certain range, not whether they "equal" something.
The numbers turn out the same because it's all deterministic. The calculations are being done in exactly the same way each time. This isn't a case of random errors, it's a case of the machine representing floating-point numbers in an inexact way.
Python has some exact datatypes you can use instead of inexact floats; see the decimal and fractions modules.
There's a classic article called "What Every Computer Scientist Should Know About Floating-Point Arithmetic"; google it and pick any link you like.
Pierre G. is correct. Because computer calculate number in binary, it cannot present a lot float number exactly. But it should be precise with in certain digits depends the data type you use.
For your case, I think maybe you could use round(number, digits) function to get a round number and then compare it.
It's just the matter of float arithmetic. Actually If you know about the representation of floating numbers in the computer memory, then the representation of certain numbers though we consider to be whole integer, is not stored as a whole integer. It's stored with certain precision(in terms of number of digits after decimal point) only. This problem will remain always, whenever you are doing mathematical programming. I suggest you to use comparision, which can accept tolerance value. Numpy has such methods, which facilitates such comparision.
The title is self explanatory. What is going on here? How can I get this not to happen? Do I really have to change all of my units (it's a physics problem) just so that I can get a big enough answer that python doesn't round 1-x to 1?
code:
import numpy as np
import math
vel=np.array([5e-30,5e-30,5e-30])
c=9.7156e-12
def mag(V):
return math.sqrt(V[0]**2+V[1]**2+V[2]**2)
gam=(1-(mag(vel)/c)**2)**(-1/2)
print(mag(vel))
print(mag(vel)**2)
print(mag(vel)**2/(c**2))
print(1-mag(vel)**2/(c**2))
print(gam)
output:
>>> (executing lines 1 to 17 of "<tmp 1>")
8.660254037844386e-30
7.499999999999998e-59
7.945514251743055e-37
1.0
1.0
>>>
In python decimal may work and maybe mpmath.
as is discussed in this SO article
If you are willing to use Java (instead of python), you might be able to use BigDecimal, or apfloat or JScience.
8.66e-30 only uses 3 sigs, but to illustrate 1 minus that would require more than 30. Any more than 16 significant figures you will need to represent digits using something else, like very long strings. But it's difficult to do math with long strings. You could also perform binary computations on very long arrays of byte values. The byte values could be made to represent a very large integer value modified by a scale factor of your choice. So if you can support and integer larger than 1E60, then you can alternately scale the value so that you can represent 1E-60 with a maximum value of 1. You can probably do that with about 200 bits or 25 bytes, and with 400 bits, you should be able to precisely represent the entire range of 1E60 to 1E-60. There may already be utilities that can perform calculations of this type out there used by people that work in math or security as they may want to represent PI to a thousand places for instance, which you can't do with a double.
The other useful trick is to use scale factors. That is, in your original coordinate space you cannot do the subtraction because the digits will not be able to represent the values. But, if you make the assumption that if you are making small adjustments you do not simultaneously care about large adjustments, then you can perform a transform on the data. So for instance you subtract 1 from your numbers. Then you could represent 1-1E-60 as -1E-60. You could do as many operations very precisely in your transform space, but knowing full well that if you attempt to convert them back from your transform space they will be lost as irrelevant. This sort of tactic is useful when zooming in on a map. Making adjustments on the scale of micrometers in units of latitude and longitude for your single precision floating point DirectX calculations won't work. But you could temporarily change your scale while you are zoomed in so that the operations will work normally.
So complicated numbers can then be represented by a big number plus a second number that represents the small scale adjustment. So for instance, if you have 16 digits in a double, you can use the first number to represent the large portion of the value, like from 1 to 1E16, and the second double to represent the additional small portion. Except that using 16 digits might be flirting with errors from the double's ability to represent the big value accurately so you might use only 15 or 14 or so just to be safe.
1234567890.1234567890
becomes
1.234567890E9 + 1.23456789E-1.
and basically the bigger your precision the more terms your complex number gets. But while this sort of thing works pretty well when each term is more or less mathematically independent, in cases where you have to do lots of rigorous calculations that operate across the scales, doing the book-keeping between these values would likely be more of a pain than it would be worth.
I think you won't get the result you are expecting because you are dealing with computer math limits. The thing about this kind of calculations is that nobody can avoid this error, unless you make/find some models that has infinite (theoretically) decimals and you can operate with them. If that is too much for the problem you are trying to solve, maybe you just have to be careful and try to do whatever you need but trying to handle these errors in calculations.
There are a lot of bibliography out there with many different approaches to handle the errors in calculations that helps not to avoid but to minimize these errors.
Hope my answer helps and don't disappoint you..
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python rounding error with float numbers
I created an array with numpy as a = numpy.arange(0,1e5,1,dtype=int). a[18645] is 18645 as expected. When I create another array b=a*10e-15, b[18645] is 186.4999999999e-12. b[18644] is 186.44e-12. Why does Python create these trailing 9s?
This issue came up when I was trying to search for an element in the array with numpy.where. With the trailing 9s, the numpy.where function failed to find 184.45e-12 in b.
That's because it is converting to floating points, which aren't exact. Due to rounding errors, the result you get isn't 186.44 - it's a apparently number slightly less than 186.5, hence all the 9s being printed out.
There are actually several sources of error here. First off 1e-15 cannot be exactly represented as a floating point. Second, the multiplication may introduce further errors. Lastly, the result have to be converted back to decimal, but it helpfully truncates the result when printing it.
Some trivia - 1e-15 converted to a double is exactly 0.00000000000000100000000000000007770539987666107923830718560119501514549256171449087560176849365234375
Multiplying this number by 18644 gives 0.0000000000186440000000000017406180102322435293213387375033107673516497015953063
96484375
As you can see this is still fairly accurate. It appears that Numpy is using single floats which would magnify the error exponentially.
This representation is caused by the floating point representation.
This issue came up when I was trying to search for an element in the
array with numpy.where
You don't test for equality on floating points. You test if the difference is lower than a given precision. And you do this precisely because floating points operations may give unexcpected results.
As a matter of fact, numpy is based on a BLAS called ATLAS which is able to choose from a range of implementations to do a specific type of operation (based on the state of your machine). Therefor if you run twice the same program, you may obtain different results (if you print the full representation of the floats and look at the last numbers).
This is just an example to show that equality testing will almost never work the expected way on floats.