I have an numpy array of floats in Python.
When I print the array, the first value is:
[7.14519700e+04, ....
If, however, I print out just the first value on it's own, the print out reads:
71451.9699799
Obviously these numbers should be identical, so I just wondered, is the array just showing me a rounded version of the element? The second number here has 12 significant figures, and the first only has 9.
I guess I just wonder why these numbers are different?
It's just in the printing, not in the storage. The only confusion might occur because the first example uses numpy's print precision settings, the second example general python's print settings.
You can adjust the numpy precision and print by
numpy.set_printoptions(precision=20)
print myarray`
(adjust precision to your needs), or select the number of significant figures in standard python formatted print:
print ('%.20f' % myarray[0])
The internal representation of the number is always the same.
The types in a numpy array are well defined. You can get how they are stored by inspecting the numpy.dtype property of an array.
For example:
import numpy
a = numpy.zeros(10)
print a.dtype
will show float64, that is a 64-bit floating point number.
You can specify the type of the array explicitly using either the commonly accepted dtype argument, or the dtype type object (that is, the thing that makes the dtype).
a = numpy.zeros(10, dtype='complex32') # a 32-bit floating point
b = numpy.longdouble(a) # create a long-double array from a
Regarding the printing, this is just a formatting issue. You can twiddle how numpy prints an array using numpy.set_printoptions:
>>> a = numpy.random.randn(3) # for interest, randn annoyingly doesn't support the dtype arg
>>> print a
[ 0.12584756 0.73540009 -0.17108244 -0.96818512]
>>> numpy.set_printoptions(precision=3)
>>> print a
[ 0.126 0.735 -0.171 -0.968]
Related
Inserting a nan in Python into a complex numpy array gives some (for me) unexpected behavior:
a = np.array([5+6*1j])
print a
array([5.+6.j])
a[0] = np.nan
print a
array([nan+0.j])
I expected Python to write nan+nanj. For analyses it often might not matter, since np.isnan of any complex with either real and/or imaginary parts is True. However, I did not know the behavior and when plotting the real and imaginary parts of my array it gave me the impression I had info on the imaginary (however there is none). A workaround is to write a[0] = np.nan + np.nan*1j. Can somebody explain the reason for this behavior to me?
The issue here is that when you create an array with complex values:
a = np.array([5+6*1j])
You've created an array of dtype complex:
a.dtype
# dtype('complex128')
So by adding a value which only contains real part, it will be converted to a complex value, and you will thus be inserting a number with a complex component equal to 0j, so:
np.complex(np.nan)
# (nan+0j)
Which explains the behaviour:
a[0] = np.array([np.nan])
print(a)
# [nan+0.j]
It probably hast to do with numpy representation of nan:
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to
infinity.
Essentially np.nan is a float. By setting x[0] = np.nan you are setting its value to a "real" float (but not changing the dtype of the array, which remains complex), so the imaginary part remains untouched as 0j.
That also explains why you can change the imaginary part by doing np.nan * 0j
I have a function which takes an array-like argument an a value argument as inputs. During the unit tests of this function (I use hypothesis), if a very large value is thrown (one that cannot be handled by np.float128), the function fails.
What is a good way to detect such values and handle them properly?
Below is the code for my function:
def find_nearest(my_array, value):
""" Find the nearest value in an unsorted array.
"""
# Convert to numpy array and drop NaN values.
my_array = np.array(my_array, copy=False, dtype=np.float128)
my_array = my_array[~np.isnan(my_array)]
return my_array[(np.abs(my_array - value)).argmin()]
Example which throws an error:
find_nearest([0.0, 1.0], 1.8446744073709556e+19)
Throws: 0.0, but the correct answer is 1.0.
If I cannot throw the correct answer, at least I would like to be able to throw an exception. The problem is that now I do not know how to identify bad inputs. A more general answer that would fit other cases is preferable, as I see this as a recurring issue.
Beware, float128 isn't actually 128 bit precision! It's in fact a longdouble implementation: https://en.wikipedia.org/wiki/Extended_precision. The precision of this type of storage is 63 bits - this is why it fails around 1e+19, because that's 63 binary bits for you. Of course, if the differences in your array is more than 1, it will be able to distinguish that on that number, it simply means that whatever difference you're trying to make it distinguish must be larger than 1/2**63 of your input value.
What is the internal precision of numpy.float128? Here's an old answer that elaborate the same thing. I've done my test and have confirmed that np.float128 is exactly a longdouble with 63 bits of precision.
I suggest you set a maximum for value, and if your value is larger than that, either:
reduce the value to that number, on the premise that everything in your array is going to be smaller than that number.
Throw an error.
like this:
VALUE_MAX = 1e18
def find_nearest(my_array, value):
if value > VALUE_MAX:
value = VALUE_MAX
...
Alternatively, you can choose more scientific approach such as actually comparing your value to the maximum of the array:
def find_nearest(my_array, value):
my_array = np.array(my_array, dtype=np.float128)
if value > np.amax(my_array):
value = np.amax(my_array)
elif value < np.amin(my_array):
value = np.amin(my_array)
...
This way you'll be sure that you never run into this problem - since your value will always be at most as large as the maximum of your array or at minimum as minimum of your array.
The problem here doesn't seem to be that a float128 can't handle 1.844...e+19, but rather that you probably can't add two floating point numbers with such radically different scales and expect to get accurate results:
In [1]: 1.8446744073709556e+19 - 1.0 == 1.8446744073709556e+19
Out[1]: True
Your best bet, if you really need this amount of accuracy, would be to use Decimal objects and put them into a numpy array as dtype 'object':
In [1]: from decimal import Decimal
In [2]: big_num = Decimal(1.8446744073709556e+19)
In [3]: big_num # Note the slight innaccuracies due to floating point conversion
Out[3]: Decimal('18446744073709555712')
In [4]: a = np.array([Decimal(0.0), Decimal(1.0)], dtype='object')
In [5]: a[np.abs(a - big_num).argmin()]
Out[5]: Decimal('1')
Note that this will be MUCH slower than typical Numpy operations, because it has to revert to Python for each computation rather than being able to leverage its own optimized libraries (since numpy doesn't have a Decimal type).
EDIT:
If you don't need this solution and just want to know if your current code will fail, I suggest the very scientific approach of "just try":
fails = len(set(my_array)) == len(set(my_array - value))
This makes sure that, when you subtract value and a unique number X in my_array, you get a unique result. This is a generally true fact about subtraction, and if it fails then it's because the floating point arithmetic isn't precise enough to handle value - X as a number distinct from value or X.
I have a matrix with float values and I try to get the summary of columns and rows. This matrix is symmetric.
>>> np.sum(n2[1,:]) #summing second row
0.80822400592582844
>>> np.sum(n2[:,1]) #summing second col
0.80822400592582844
>>> np.sum(n2, axis=0)[1]
0.80822400592582899
>>> np.sum(n2, axis=1)[1]
0.80822400592582844
It gives different results. Why?
The numbers numpy uses are doubles, with accuracy up to 16 decimal places. It looks like the differences happen at the 16th place, with the rest of the digits being equal. If you don't need this accuracy, you could use the rounding function np.around(), or you could actually try using the np.longdouble type to get a higher degree of accuracy.
You can check the accuracy of the types using np.finfo:
>>> print np.finfo(np.double).precision
>>> 15
Some numpy functions won't accept long doubles I believe, and will cast it down to a double, truncating the extra digits. Numpy precision
I have a dataset on which I'm trying to apply some arithmetical method.
The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.
The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.
x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890
vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])
I have off course checked their types:
type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True
How can I compute this large numbers using numpy without seeing them turned into zeros?
Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):
x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549
vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
-224960293581823801, -368934881474191033])
The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?
As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation.
So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.
In [14]: x = np.array([18,30,31,31,15], dtype=object)
In [15]: 150**x
Out[15]:
array([1477891880035400390625000000000000000000L,
191751059232884086668491363525390625000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
437893890380859375000000000000000L], dtype=object)
In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.
But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.
Just for completeness, if precision is not an issue, you can also use floats:
In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)
In [20]: 150**x
Out[20]:
array([ 1.47789188e+39, 1.91751059e+65, 2.87626589e+67,
2.87626589e+67, 4.37893890e+32])
150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).
Python may be using an arbitrary length integer implementation, but NumPy doesn't.
As you deduced correctly, negative numbers are a symptom of an int overflow.
I have some math operations that produce a numpy array of results with about 8 significant figures. When I use tolist() on my array y_axis, it creates what I assume are 32-bit numbers.
However, I wonder if this is just garbage. I assume it is garbage, but it seems intelligent enough to change the last number so that rounding makes sense.
print "y_axis:",y_axis
y_axis = y_axis.tolist()
print "y_axis:",y_axis
y_axis: [-0.99636686 0.08357361 -0.01638707]
y_axis: [-0.9963668578012771, 0.08357361233570479, -0.01638706796138937]
So my question is: if this is not garbage, does using tolist actually help in accuracy for my calculations, or is Python always using the entire number, but just not displaying it?
When you call print y_axis on a numpy array, you are getting a truncated version of the numbers that numpy is actually storing internally. The way in which it is truncated depends on how numpy's printing options are set.
>>> arr = np.array([22/7, 1/13]) # init array
>>> arr # np.array default printing
array([ 3.14285714, 0.07692308])
>>> arr[0] # int default printing
3.1428571428571428
>>> np.set_printoptions(precision=24) # increase np.array print "precision"
>>> arr # np.array high-"precision" print
array([ 3.142857142857142793701541, 0.076923076923076927347012])
>>> float.hex(arr[0]) # actual underlying representation
'0x1.9249249249249p+1'
The reason it looks like you're "gaining accuracy" when you print out the .tolist()ed form of y_axis is that by default, more digits are printed when you call print on a list than when you call print on a numpy array.
In actuality, the numbers stored internally by either a list or a numpy array should be identical (and should correspond to the last line above, generated with float.hex(arr[0])), since numpy uses numpy.float64 by default, and python float objects are also 64 bits by default.
My understanding is that numpy is not showing you the full precision to make the matrices lay out consistently. The list shouldn't have any more precision than its numpy.array counterpart:
>>> v = -0.9963668578012771
>>> a = numpy.array([v])
>>> a
array([-0.99636686])
>>> a.tolist()
[-0.9963668578012771]
>>> a[0] == v
True
>>> a.tolist()[0] == v
True