Inserting a nan in Python into a complex numpy array gives some (for me) unexpected behavior:
a = np.array([5+6*1j])
print a
array([5.+6.j])
a[0] = np.nan
print a
array([nan+0.j])
I expected Python to write nan+nanj. For analyses it often might not matter, since np.isnan of any complex with either real and/or imaginary parts is True. However, I did not know the behavior and when plotting the real and imaginary parts of my array it gave me the impression I had info on the imaginary (however there is none). A workaround is to write a[0] = np.nan + np.nan*1j. Can somebody explain the reason for this behavior to me?
The issue here is that when you create an array with complex values:
a = np.array([5+6*1j])
You've created an array of dtype complex:
a.dtype
# dtype('complex128')
So by adding a value which only contains real part, it will be converted to a complex value, and you will thus be inserting a number with a complex component equal to 0j, so:
np.complex(np.nan)
# (nan+0j)
Which explains the behaviour:
a[0] = np.array([np.nan])
print(a)
# [nan+0.j]
It probably hast to do with numpy representation of nan:
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to
infinity.
Essentially np.nan is a float. By setting x[0] = np.nan you are setting its value to a "real" float (but not changing the dtype of the array, which remains complex), so the imaginary part remains untouched as 0j.
That also explains why you can change the imaginary part by doing np.nan * 0j
Related
I have a function which takes an array-like argument an a value argument as inputs. During the unit tests of this function (I use hypothesis), if a very large value is thrown (one that cannot be handled by np.float128), the function fails.
What is a good way to detect such values and handle them properly?
Below is the code for my function:
def find_nearest(my_array, value):
""" Find the nearest value in an unsorted array.
"""
# Convert to numpy array and drop NaN values.
my_array = np.array(my_array, copy=False, dtype=np.float128)
my_array = my_array[~np.isnan(my_array)]
return my_array[(np.abs(my_array - value)).argmin()]
Example which throws an error:
find_nearest([0.0, 1.0], 1.8446744073709556e+19)
Throws: 0.0, but the correct answer is 1.0.
If I cannot throw the correct answer, at least I would like to be able to throw an exception. The problem is that now I do not know how to identify bad inputs. A more general answer that would fit other cases is preferable, as I see this as a recurring issue.
Beware, float128 isn't actually 128 bit precision! It's in fact a longdouble implementation: https://en.wikipedia.org/wiki/Extended_precision. The precision of this type of storage is 63 bits - this is why it fails around 1e+19, because that's 63 binary bits for you. Of course, if the differences in your array is more than 1, it will be able to distinguish that on that number, it simply means that whatever difference you're trying to make it distinguish must be larger than 1/2**63 of your input value.
What is the internal precision of numpy.float128? Here's an old answer that elaborate the same thing. I've done my test and have confirmed that np.float128 is exactly a longdouble with 63 bits of precision.
I suggest you set a maximum for value, and if your value is larger than that, either:
reduce the value to that number, on the premise that everything in your array is going to be smaller than that number.
Throw an error.
like this:
VALUE_MAX = 1e18
def find_nearest(my_array, value):
if value > VALUE_MAX:
value = VALUE_MAX
...
Alternatively, you can choose more scientific approach such as actually comparing your value to the maximum of the array:
def find_nearest(my_array, value):
my_array = np.array(my_array, dtype=np.float128)
if value > np.amax(my_array):
value = np.amax(my_array)
elif value < np.amin(my_array):
value = np.amin(my_array)
...
This way you'll be sure that you never run into this problem - since your value will always be at most as large as the maximum of your array or at minimum as minimum of your array.
The problem here doesn't seem to be that a float128 can't handle 1.844...e+19, but rather that you probably can't add two floating point numbers with such radically different scales and expect to get accurate results:
In [1]: 1.8446744073709556e+19 - 1.0 == 1.8446744073709556e+19
Out[1]: True
Your best bet, if you really need this amount of accuracy, would be to use Decimal objects and put them into a numpy array as dtype 'object':
In [1]: from decimal import Decimal
In [2]: big_num = Decimal(1.8446744073709556e+19)
In [3]: big_num # Note the slight innaccuracies due to floating point conversion
Out[3]: Decimal('18446744073709555712')
In [4]: a = np.array([Decimal(0.0), Decimal(1.0)], dtype='object')
In [5]: a[np.abs(a - big_num).argmin()]
Out[5]: Decimal('1')
Note that this will be MUCH slower than typical Numpy operations, because it has to revert to Python for each computation rather than being able to leverage its own optimized libraries (since numpy doesn't have a Decimal type).
EDIT:
If you don't need this solution and just want to know if your current code will fail, I suggest the very scientific approach of "just try":
fails = len(set(my_array)) == len(set(my_array - value))
This makes sure that, when you subtract value and a unique number X in my_array, you get a unique result. This is a generally true fact about subtraction, and if it fails then it's because the floating point arithmetic isn't precise enough to handle value - X as a number distinct from value or X.
In numpy, why does subtraction of integers sometimes produce floating point numbers?
>>> x = np.int64(2) - np.uint64(1)
>>> x
1.0
>>> x.dtype
dtype('float64')
This seems to only occur when using multiple different integer types (e.g. signed and unsigned), and when no larger integer type is available.
This is a conscious design decision by the numpy authors. When deciding on the resulting type, only the types of the operands are considered, not their actual values. And for the operation you perform, there is a risk of having a result outside the valid range, e.g. if you subtract a very large uint64 number, the result would not fit in an int64. The safe selection is thus to convert to float64, which certainly will fit the result (possibly with reduced precision, though).
Compare with an example of x = np.int32(2) - np.uint32(1). This can always be safely represented as an int64, therefore that type is chosen. The same would be true for x = np.int64(2) - np.uint32(1). This will also yield an int64.
The alternative would be to follow e.g. the c rules, which would cast everything to uint64. But that could, of course, lead to very strange results with over/underflows.
If you want to know ahead of time what type you will end up with, look into np.result_type(), np.can_cast(), and np.promote_types(). Reading about this in the docs might also help you understand the issue a bit better.
I'm no expert on numpy, however, I suspect that since float64 is the smallest data type that can fit both the domain of int64 and uint64 that the subtraction converts both operands into a float64 so that the operation always succeeds.
For example, in a with int8 and uint8: +128 - (256) cannot fit in a int8 since -128 is not valid in int8, as it can only fit back to -127. Similarly, we can't use a uint8 since we obviously need the sign in this case. Hence, we settle on a float/double as it can fit both directions fine.
I have a dataset on which I'm trying to apply some arithmetical method.
The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.
The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.
x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890
vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])
I have off course checked their types:
type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True
How can I compute this large numbers using numpy without seeing them turned into zeros?
Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):
x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549
vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
-224960293581823801, -368934881474191033])
The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?
As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation.
So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.
In [14]: x = np.array([18,30,31,31,15], dtype=object)
In [15]: 150**x
Out[15]:
array([1477891880035400390625000000000000000000L,
191751059232884086668491363525390625000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
437893890380859375000000000000000L], dtype=object)
In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.
But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.
Just for completeness, if precision is not an issue, you can also use floats:
In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)
In [20]: 150**x
Out[20]:
array([ 1.47789188e+39, 1.91751059e+65, 2.87626589e+67,
2.87626589e+67, 4.37893890e+32])
150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).
Python may be using an arbitrary length integer implementation, but NumPy doesn't.
As you deduced correctly, negative numbers are a symptom of an int overflow.
I need to use a module that does some math on integers, however my input is in floats.
What I want to achieve is to convert a generic float value into a corresponding integer value and loose as little data as possible.
For example:
val : 1.28827339907e-08
result : 128827339906934
Which is achieved after multiplying by 1e22.
Unfortunately the range of values can change, so I cannot always multiply them by the same constant. Any ideas?
ADDED
To put it in other words, I have a matrix of values < 1, let's say from 1.323224e-8 to 3.457782e-6.
I want to convert them all into integers and loose as little data as possible.
The answers that suggest multiplying by a power of ten cause unnecessary rounding.
Multiplication by a power of the base used in the floating-point representation has no error in IEEE 754 arithmetic (the most common floating-point implementation) as long as there is no overflow or underflow.
Thus, for binary floating-point, you may be able to achieve your goal by multiplying the floating-point number by a power of two and rounding the result to the nearest integer. The multiplication will have no error. The rounding to integer may have an error up to .5, obviously.
You might select a power of two that is as large as possible without causing any of your numbers to exceed the bounds of the integer type you are using.
The most common conversion of floating-point to integer truncates, so that 3.75 becomes 3. I am not sure about Python semantics. To round instead of truncating, you might use a function such as round before converting to integer.
If you want to preserve the values for operations on matrices I would choose some value to multiply them all by.
For Example:
1.23423
2.32423
4.2324534
Multiply them all by 10000000 and you get
12342300
23242300
42324534
You can perform you multiplications, additions etc with your matrices. Once you have performed all your calculations you can convert them back to floats by dividing them all by the appropriate value depending on the operation you performed.
Mathematically it makes sense because
(Scalar multiplication)
M1` = M1 * 10000000
M2` = M2 * 10000000
Result = M1`.M2`
Result = (M1 x 10000000).(M2 x 10000000)
Result = (10000000 x 10000000) x (M1.M2)
So in the case of multiplication you would divide your result by 10000000 x 10000000.
If its addition / subtraction then you simply divide by 10000000.
You can either choose the value to multiply by through your knowledge of what decimals you expect to find or by scanning the floats and generating the value yourself at runtime.
Hope that helps.
EDIT: If you are worried about going over the maximum capacity of integers - then you would be happy to know that python automatically (and silently) converts integers to longs when it notices overflow is going to occur. You can see for yourself in a python console:
>>> i = 3423
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'long'>
If you are still worried about overflow, you can always choose a lower constant with a compromise for slightly less accuracy (since you will be losing some digits towards then end of the decimal point).
Also, the method proposed by Eric Postpischil seems to make sense - but I have not tried it out myself. I gave you a solution from a more mathematical perspective which also seems to be more "pythonic"
Perhaps consider counting the number of places after the decimal for each value to determine the value (x) of your exponent (1ex). Roughly something like what's addressed here. Cheers!
Here's one solution:
def to_int(val):
return int(repr(val).replace('.', '').split('e')[0])
Usage:
>>> to_int(1.28827339907e-08)
128827339907
I have an numpy array of floats in Python.
When I print the array, the first value is:
[7.14519700e+04, ....
If, however, I print out just the first value on it's own, the print out reads:
71451.9699799
Obviously these numbers should be identical, so I just wondered, is the array just showing me a rounded version of the element? The second number here has 12 significant figures, and the first only has 9.
I guess I just wonder why these numbers are different?
It's just in the printing, not in the storage. The only confusion might occur because the first example uses numpy's print precision settings, the second example general python's print settings.
You can adjust the numpy precision and print by
numpy.set_printoptions(precision=20)
print myarray`
(adjust precision to your needs), or select the number of significant figures in standard python formatted print:
print ('%.20f' % myarray[0])
The internal representation of the number is always the same.
The types in a numpy array are well defined. You can get how they are stored by inspecting the numpy.dtype property of an array.
For example:
import numpy
a = numpy.zeros(10)
print a.dtype
will show float64, that is a 64-bit floating point number.
You can specify the type of the array explicitly using either the commonly accepted dtype argument, or the dtype type object (that is, the thing that makes the dtype).
a = numpy.zeros(10, dtype='complex32') # a 32-bit floating point
b = numpy.longdouble(a) # create a long-double array from a
Regarding the printing, this is just a formatting issue. You can twiddle how numpy prints an array using numpy.set_printoptions:
>>> a = numpy.random.randn(3) # for interest, randn annoyingly doesn't support the dtype arg
>>> print a
[ 0.12584756 0.73540009 -0.17108244 -0.96818512]
>>> numpy.set_printoptions(precision=3)
>>> print a
[ 0.126 0.735 -0.171 -0.968]