floats inside tuples changing values when accessed - python

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?

The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.

In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.

It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.

Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Related

Getting error in python: Value Error: invalid literal for int() with base 10: '470.21'

i want adding and subtracting this type of data: $12,587.30.which returns answer in same format.how can do this ?
Here is my code example:
print(int(col_ammount2.lstrip('$'))-int(col_ammount.lstrip('$')))
I removed $ sign and convert it to int but it gives me base 10 error.
You mentioned you want to do arithmetic operations to the numbers (addition/subtraction) so you probably want them in float instead. The difference between an integer (int) and float is that integers do not carry decimal points.
Additionally, as #officialaimm mentioned you need to remove the commas too, for example
float('$3,333.33'.replace('$', '').replace(',', ''))
will give you
3333.33
So putting it into your code
print(float(col_ammount2.lstrip('$').replace(',', ''))
- float(col_ammount.lstrip('$').replace(',', '')))
An additional note for when you parse a floating point number (same applies to integers too), you may want to watch out for empty values, i.e.
float('')
is bad. One of the things u can do in case col_amount and col_amount2 may be empty at some point is default them to 0 if that happens
float(col_amount.lstrip(...).replace(...) or 0)
You also want to read this to know about workaround to problems you may face with floating point arithmetic https://docs.python.org/3/tutorial/floatingpoint.html
There are two things you are missing here. Firstly python int(...) cannot parse numbers with commas so you will need to remove commas as well by using .replace(',',''). Secondly int() cannot parse floating point values you will have to use float(...) first and after that maybe typecast it to int using int or math.ceil, math.floor appropriately as per your choice and needs.
Maybe something like this will solve your problem:
col_ammount2='$1,587.30'
col_ammount = '$2,567.67'
print(int(float(col_ammount2.lstrip('$').replace(',','')))-int(float(col_ammount.lstrip('$').replace(',',''))))
If you are doing these sorts of things quite often in your code, making a function as such might be handy:
integerify_currency = lambda x:int(float(x.lstrip('$').replace(',','')))

How does Python know which number type to use in order to Multiply arbitrary two numbers?

In C, I have to set proper type, such as int, float, long for a simple arithmetic for multiplying two numbers. Otherwise, it will give me an incorrect answer.
But in Python, basically it can automatically give me the correct answer.
I have tried debug a simple 987*456 calculation to see the source code.
I set a break point at that line in PyCharm, but I cannot step into the source code, it just finished right away.
How can I see the source code? Is it possible? Or how does Python do that multiplication?
I mean, how does Python carry out the different of number type in the result of
98*76 or 987654321*123457789, does Python detect some out of range error and try another number type?
I mean, how does Python carry out the different of number type in the result of 98*76 or 987654321*123457789, does Python detect some out of range error and try another number type?
Pretty much. The source code for integer multiplication can be found in intobject.c. It multiplies the integers as C longs, then casts the longs to doubles and multiplies those. If the results are close, the long multiplication didn't overflow. If the results are very different, it switches to Python longs, which use a bignum representation.
The type promotion for mixed arithmetic is:
integer -> long -> float
The narrower type is converted to the wider type, and the multiplication is carried out.
https://docs.python.org/2/library/stdtypes.html#numeric-types-int-float-long-complex
Some examples to see what happens:
987*456 = 450072
987*456L = 450072L
987*456.0 = 450072.0
I hope I understood your question.
Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory.
Based on the data type of a variable, the interpreter allocates memory and decides what can be stored in the reserved memory. Therefore, by assigning different data types to variables, you can store integers, decimals or characters in these variables.
Python variables do not have to be explicitly declared to reserve memory space. The declaration happens automatically when you assign a value to a variable. The equal sign (=) is used to assign values to variables.
The operand to the left of the = operator is the name of the variable and the operand to the right of the = operator is the value stored in the variable.

Loss of precision float in python

I have a list called scores of varying -log probabilities.
when I call this function:
maxState = scores.pop(scores.index(max(scores)))
and print maxState, I realize that the maxState loses its precision as a float. Is there a way I can get the maxState without losing precision?
ex: I print out the list scores: [-35.7971525669589, -34.67875545008369]
and print maxState, I get this: -34.6787554501
(You can see it's rounded)
You are confusing string presentation with actual contents. Nowhere is precision lost, only the string produced to write to your console is using a rounded value rather than show you all digits. And always remember that float numbers are digital approximations, not precise values.
Python floats are formatted differently when using the str() and repr() functions; in a list or other container, repr() is used, but print it directly and str() is used.
If you don't like either option, format it explicitly with the format() function and specifying a precision:
print format(maxState, '.12f')
to print it with 8 decimals, for example.
Demo:
>>> maxState = -34.67875545008369
>>> repr(maxState)
'-34.67875545008369'
>>> str(maxState)
'-34.6787554501'
>>> format(maxState, '.8f')
'-34.67875545'
>>> format(maxState, '.12f')
'-34.678755450084'
The repr() output is roughly equivalent to using '.17g' as the format, while str() is equivalent to '.12g'; here the precision denotes when to use scientific notation (e) and when to display in floating point notation (f).
I say roughly because the repr() output aims to give you round-trippable output; see the change notes for Python 3.1 on float() representation, which where backported to Python 2.7:
What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).
The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

Numpy to weak to calculate a precise mean value

This question is very similar to this post - but not exactly
I have some data in a .csv file. The data has precision to the 4th digit (#.####).
Calculating the mean in Excel or SAS gives a result with precision to 5th digit (#.#####) but using numpy gives:
import numpy as np
data = np.recfromcsv(path2file, delimiter=';', names=['measurements'], dtype=np.float64)
rawD = data['measurements']
print np.average(rawD)
gives a number like this
#.#####999999999994
Clearly something is wrong..
using
from math import fsum
print fsum(rawD.ravel())/rawD.size
gives
#.#####
Is there anything in the np.average that I set wrong _______?
BONUS info:
I'm only working with 200 data points in the array
UPDATE
I thought I should make my case more clear.
I have numbers like 4.2730 in my csv (giving a 4 decimal precision - even though the 4th always is zero [not part of the subject so don't mind that])
Calculating an average/mean by numpy gives me this
4.2516499999999994
Which gives a print by
>>>print "%.4f" % np.average(rawD)
4.2516
During the same thing in Excel or SAS gives me this:
4.2517
Which I actually believe as being the true average value because it finds it to be 4.25165.
This code also illustrate it:
answer = 0
for number in rawD:
answer += int(number*1000)
print answer/2
425165
So how do I tell np.average() to calculate this value ___?
I'm a bit surprised that numpy did this to me... I thought that I only needed to worry if I was dealing with 16 digits numbers. Didn't expect a round off on the 4 decimal place would be influenced by this..
I know I could use
fsum(rawD.ravel())/rawD.size
But I also have other things (like std) I want to calculate with the same precision
UPDATE 2
I thought I could make a temp solution by
>>>print "%.4f" % np.float64("%.5f" % np.mean(rawD))
4.2416
Which did not solve the case. Then I tried
>>>print "%.4f" % float("4.24165")
4.2416
AHA! There is a bug in the formatter: Issue 5118
To be honest I don't care if python stores 4.24165 as 4.241649999... It's still a round off error - NO MATTER WHAT.
If the interpeter can figure out how to display the number
>>>print float("4.24165")
4.24165
Then should the formatter as well and deal with that number when rounding..
It still doesn't change the fact that I have a round off problem (now both with the formatter and numpy)
In case you need some numbers to help me out then I have made this modified .csv file:
Download it from here
(I'm aware that this file does not have the number of digits I explained earlier and that the average gives ..9988 at the end instead of ..9994 - it's modified)
Guess my qeustion boils down to how do I get a string output like the one excel gives me if I use =average()
and have it round off correctly if I choose to show only 4 digits
I know that this might seem strange for some.. But I have my reasons for wanting to reproduce the behavior of Excel.
Any help would be appreciated, thank you.
To get exact decimal numbers, you need to use decimal arithmetic instead of binary. Python provides the decimal module for this.
If you want to continue to use numpy for the calculations and simply round the result, you can still do this with decimal. You do it in two steps, rounding to a large number of digits to eliminate the accumulated error, then rounding to the desired precision. The quantize method is used for rounding.
from decimal import Decimal,ROUND_HALF_UP
ten_places = Decimal('0.0000000001')
four_places = Decimal('0.0001')
mean = 4.2516499999999994
print Decimal(mean).quantize(ten_places).quantize(four_places, rounding=ROUND_HALF_UP)
4.2517
The result value of average is a double. When you print out a double, by default all digits are printed. What you see here is the result of limited digital precision, which is not a problem of numpy, but a general computing problem. When you care of the presentation of your float value, use "%.4f" % avg_val. There is also a package for rational numbers, to avoid representing fractions as real numbers, but I guess that's not what you're looking for.
For your second statement, summarizing all the values by hand and then dividing it, I suppose you're using python 2.7 and all your input values are integer. In that way, you would have an integer division, which truncates everything after the dot, resulting in another integer value.

Why do Python's math.ceil() and math.floor() operations return floats instead of integers?

Can someone explain this (straight from the docs- emphasis mine):
math.ceil(x) Return the ceiling of x as a float, the smallest integer value greater than or equal to x.
math.floor(x) Return the floor of x as a float, the largest integer value less than or equal to x.
Why would .ceil and .floor return floats when they are by definition supposed to calculate integers?
EDIT:
Well this got some very good arguments as to why they should return floats, and I was just getting used to the idea, when #jcollado pointed out that they in fact do return ints in Python 3...
As pointed out by other answers, in python they return floats probably because of historical reasons to prevent overflow problems. However, they return integers in python 3.
>>> import math
>>> type(math.floor(3.1))
<class 'int'>
>>> type(math.ceil(3.1))
<class 'int'>
You can find more information in PEP 3141.
The range of floating point numbers usually exceeds the range of integers. By returning a floating point value, the functions can return a sensible value for input values that lie outside the representable range of integers.
Consider: If floor() returned an integer, what should floor(1.0e30) return?
Now, while Python's integers are now arbitrary precision, it wasn't always this way. The standard library functions are thin wrappers around the equivalent C library functions.
Because python's math library is a thin wrapper around the C math library which returns floats.
The source of your confusion is evident in your comment:
The whole point of ceil/floor operations is to convert floats to integers!
The point of the ceil and floor operations is to round floating-point data to integral values. Not to do a type conversion. Users who need to get integer values can do an explicit conversion following the operation.
Note that it would not be possible to implement a round to integral value as trivially if all you had available were a ceil or float operation that returned an integer. You would need to first check that the input is within the representable integer range, then call the function; you would need to handle NaN and infinities in a separate code path.
Additionally, you must have versions of ceil and floor which return floating-point numbers if you want to conform to IEEE 754.
Before Python 2.4, an integer couldn't hold the full range of truncated real numbers.
http://docs.python.org/whatsnew/2.4.html#pep-237-unifying-long-integers-and-integers
Because the range for floats is greater than that of integers -- returning an integer could overflow
This is a very interesting question! As a float requires some bits to store the exponent (=bits_for_exponent) any floating point number greater than 2**(float_size - bits_for_exponent) will always be an integral value! At the other extreme a float with a negative exponent will give one of 1, 0 or -1. This makes the discussion of integer range versus float range moot because these functions will simply return the original number whenever the number is outside the range of the integer type. The python functions are wrappers of the C function and so this is really a deficiency of the C functions where they should have returned an integer and forced the programer to do the range/NaN/Inf check before calling ceil/floor.
Thus the logical answer is the only time these functions are useful they would return a value within integer range and so the fact they return a float is a mistake and you are very smart for realizing this!
Maybe because other languages do this as well, so it is generally-accepted behavior. (For good reasons, as shown in the other answers)
This totally caught me off guard recently. This is because I've programmed in C since the 1970's and I'm only now learning the fine details of Python. Like this curious behavior of math.floor().
The math library of Python is how you access the C standard math library. And the C standard math library is a collection of floating point numerical functions, like sin(), and cos(), sqrt(). The floor() function in the context of numerical calculations has ALWAYS returned a float. For 50 YEARS now. It's part of the standards for numerical computation. For those of us familiar with the math library of C, we don't understand it to be just "math functions". We understand it to be a collection of floating-point algorithms. It would be better named something like NFPAL - Numerical Floating Point Algorithms Libary. :)
Those of us that understand the history instantly see the python math module as just a wrapper for the long-established C floating-point library. So we expect without a second thought, that math.floor() is the same function as the C standard library floor() which takes a float argument and returns a float value.
The use of floor() as a numerical math concept goes back to 1798 per the Wikipedia page on the subject: https://en.wikipedia.org/wiki/Floor_and_ceiling_functions#Notation
It never has been a computer science covert floating-point to integer storage format function even though logically it's a similar concept.
The floor() function in this context has always been a floating-point numerical calculation as all(most) the functions in the math library. Floating-point goes beyond what integers can do. They include the special values of +inf, -inf, and Nan (not a number) which are all well defined as to how they propagate through floating-point numerical calculations. Floor() has always CORRECTLY preserved values like Nan and +inf and -inf in numerical calculations. If Floor returns an int, it totally breaks the entire concept of what the numerical floor() function was meant to do. math.floor(float("nan")) must return "nan" if it is to be a true floating-point numerical floor() function.
When I recently saw a Python education video telling us to use:
i = math.floor(12.34/3)
to get an integer I laughed to myself at how clueless the instructor was. But before writing a snarkish comment, I did some testing and to my shock, I found the numerical algorithms library in Python was returning an int. And even stranger, what I thought was the obvious answer to getting an int from a divide, was to use:
i = 12.34 // 3
Why not use the built-in integer divide to get the integer you are looking for! From my C background, it was the obvious right answer. But low and behold, integer divide in Python returns a FLOAT in this case! Wow! What a strange upside-down world Python can be.
A better answer in Python is that if you really NEED an int type, you should just be explicit and ask for int in python:
i = int(12.34/3)
Keeping in mind however that floor() rounds towards negative infinity and int() rounds towards zero so they give different answers for negative numbers. So if negative values are possible, you must use the function that gives the results you need for your application.
Python however is a different beast for good reasons. It's trying to address a different problem set than C. The static typing of Python is great for fast prototyping and development, but it can create some very complex and hard to find bugs when code that was tested with one type of objects, like floats, fails in subtle and hard to find ways when passed an int argument. And because of this, a lot of interesting choices were made for Python that put the need to minimize surprise errors above other historic norms.
Changing the divide to always return a float (or some form of non int) was a move in the right direction for this. And in this same light, it's logical to make // be a floor(a/b) function, and not an "int divide".
Making float divide by zero a fatal error instead of returning float("inf") is likewise wise because, in MOST python code, a divide by zero is not a numerical calculation but a programming bug where the math is wrong or there is an off by one error. It's more important for average Python code to catch that bug when it happens, instead of propagating a hidden error in the form of an "inf" which causes a blow-up miles away from the actual bug.
And as long as the rest of the language is doing a good job of casting ints to floats when needed, such as in divide, or math.sqrt(), it's logical to have math.floor() return an int, because if it is needed as a float later, it will be converted correctly back to a float. And if the programmer needed an int, well then the function gave them what they needed. math.floor(a/b) and a//b should act the same way, but the fact that they don't I guess is just a matter of history not yet adjusted for consistency. And maybe too hard to "fix" due to backward compatibility issues. And maybe not that important???
In Python, if you want to write hard-core numerical algorithms, the correct answer is to use NumPy and SciPy, not the built-in Python math module.
import numpy as np
nan = np.float64(0.0) / 0.0 # gives a warning and returns float64 nan
nan = np.floor(nan) # returns float64 nan
Python is different, for good reasons, and it takes a bit of time to understand it. And we can see in this case, the OP, who didn't understand the history of the numerical floor() function, needed and expected it to return an int from their thinking about mathematical integers and reals. Now Python is doing what our mathematical (vs computer science) training implies. Which makes it more likely to do what a beginner expects it to do while still covering all the more complex needs of advanced numerical algorithms with NumPy and SciPy. I'm constantly impressed with how Python has evolved, even if at times I'm totally caught off guard.

Categories