numpy.random.multinomial bad outputs?

numpy.random.multinomial bad outputs? - python

I have this function:
import numpy as np
def unhot(vec):
""" takes a one-hot vector and returns the corresponding integer """
assert np.sum(vec) == 1 # this assertion shouldn't fail, but it did...
return list(vec).index(1)
that I call on the output of a call to:
numpy.random.multinomial(1, coe)
and I got an assertion error at some point when I ran it. How is this possible? Isn't the output of numpy.random.multinomial guaranteed to be a one-hot vector?
Then I removed the assertion error, and now I have:
ValueError: 1 is not in list
Is there some fine-print I am missing, or is this just broken?

Well, this is the problem, and I should've realized, because I've encountered it before:
np.random.multinomial(1,A([ 0., 0., np.nan, 0.]))
returns
array([0, 0, -9223372036854775807,0])
I was using an unstable softmax implementation that gave the Nans.
Now, I was trying to ensure that the parameters I passed multinomial had a sum <= 1, but I did it like this:
coe = softmax(coeffs)
while np.sum(coe) > 1-1e-9:
coe /= (1+1e-5)
and with NaNs in there, the while statement will never even get triggered, I think.

Related

Error "TypeError: type numpy.ndarray doesn't define round method"

import numpy
......
# Prediction
predictions = model.predict(X_test)
# round predictions
rounded = [round(x) for x in predictions]
print(rounded)
"predictions" is a list of decimals between [0,1] with sigmoid output.
Why does it always report this error:
File "/home/abigail/workspace/ml/src/network.py", line 41, in <listcomp>
rounded = [round(x) for x in predictions]
TypeError: type numpy.ndarray doesn't define __round__ method
If i don't use the 'round', it prints decimals correctly. This "round" should be the Python built-in function. Why does it have anything to do with numpy?
Edited:
for x in predictions:
print(x, end=' ')
The output is:
[ 0.79361773] [ 0.10443521] [ 0.90862566] [ 0.10312044] [ 0.80714297]
[ 0.23282401] [ 0.1730803] [ 0.55674052] [ 0.94095331] [ 0.11699325]
[ 0.1609294]

TypeError: type numpy.ndarray doesn't define round method
You tried applying round to numpy.ndarray. Apparently, this isn't supported.
Try this, use numpy.round:
rounded = [numpy.round(x) for x in predictions]
x is numpy array. You can also try this:
rounded = [round(y) for y in x for x in predictions]

What is model? From what module? It looks like predictions is a 2d array. What is predictions.shape? The error indicates that the x in [x for x in predictions] is an array. It may be a single element array, but it is never the less an array. You could try [x.shape for x in predictions] to see the shape of each element (row) of predictions.
I haven't had much occasion to use round, but evidently the Python function delegates the action to a .__round__ method (much as + delegates to __add__).
In [932]: round?
Docstring:
round(number[, ndigits]) -> number
Round a number to a given precision in decimal digits (default 0 digits).
This returns an int when called with one argument, otherwise the
same type as the number. ndigits may be negative.
Type: builtin_function_or_method
In [933]: x=12.34
In [934]: x.__round__?
Docstring:
Return the Integral closest to x, rounding half toward even.
When an argument is passed, work like built-in round(x, ndigits).
Type: builtin_function_or_method
In [935]: y=12
In [936]: y.__round__?
Docstring:
Rounding an Integral returns itself.
Rounding with an ndigits argument also returns an integer.
Type: builtin_function_or_method
Python integers have a different implementation than python floats.
Python lists and strings don't have definition for this, so round([1,2,3]) will return an AttributeError: 'list' object has no attribute '__round__'.
Same goes for a ndarray. But numpy has defined a np.round function, and a numpy array has a .round method.
In [942]: np.array([1.23,3,34.34]).round()
Out[942]: array([ 1., 3., 34.])
In [943]: np.round(np.array([1.23,3,34.34]))
Out[943]: array([ 1., 3., 34.])
help(np.around) gives the fullest documentation of the numpy version(s).
===================
From your last print I can reconstruct part of your predictions as:
In [955]: arr = np.array([[ 0.79361773], [ 0.10443521], [ 0.90862566]])
In [956]: arr
Out[956]:
array([[ 0.79361773],
[ 0.10443521],
[ 0.90862566]])
In [957]: for x in arr:
...: print(x, end=' ')
...:
[ 0.79361773] [ 0.10443521] [ 0.90862566]
arr.shape is (3,1) - a 2d array with 1 column.
np.round works fine, without needing the iteration:
In [958]: np.round(arr)
Out[958]:
array([[ 1.],
[ 0.],
[ 1.]])
the iteration produces your error.
In [959]: [round(x) for x in arr]
TypeError: type numpy.ndarray doesn't define __round__ method

I encountered the same error when I was trying the tutorial of Keras.
At first, I tried
rounded = [numpy.round(x) for x in predictions]
but it showed the result like this:
[array([1.], dtype=float32), array([0.],dtype=float32), ...]
then I tried this:
rounded = [float(numpy.round(x)) for x in predictions]
it showed the right outputs.
I think the "numpy.round(x)" returns list of ndarray, and contains the dtype parameter. but the outputs are correct with the value. So converting each element of the list to float type will show the right outputs as same as the tutorial.
My machine is Linux Mint 17.3(ubuntu 14.04) x64, and python interpreter is python 3.5.2, anaconda3(4.1.1), numpy 1.11.2

You're using a function that uses Numpy to store values. Instead of being a regular Python list, it is actually a Numpy array. This is generally because with machine learning, Numpy does a much better job at storing massive amounts of data compared to an ordinary list in Python. You can refer to the following documentation to convert to a regular list which you can then preform a comprehension:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html
Edit:
What happens if you try:
for x in predictions:
for y in x.:
print(y, end=' ')

This was driving me nuts too. I had stored a reference to a scipy function with type <class 'scipy.interpolate.interpolate.interp1d'>. This was returning a single value of type <class 'numpy.ndarray'> containing a single float. I had assumed this was actually a float and it propagated back up through my library code until round produced the same error described above.
It was a case of debugging the call stack to check what actual type was being passed on after each function return. I then cast the return value from my original function call along the lines of result = float(interp1d_reference(x)). Then my code behaved as I had expected/wanted.

How to write conditional code that's compatible with both plain Python values and NumPy arrays?

For writing “piecewise functions” in Python, I'd normally use if (in either the control-flow or ternary-operator form).
def spam(x):
return x+1 if x>=0 else 1/(1-x)
Now, with NumPy, the mantra is to avoid working on single values in favour of vectorisation, for performance. So I reckon something like this would be preferred:As Leon remarks, the following is wrong
def eggs(x):
y = np.zeros_like(x)
positive = x>=0
y[positive] = x+1
y[np.logical_not(positive)] = 1/(1-x)
return y
(Correct me if I've missed something here, because frankly I find this very ugly.)
Now, of course eggs will only work if x is actually a NumPy array, because otherwise x>=0 simply yields a single boolean, which can't be used for indexing (at least doesn't do the right thing).
Is there a good way to write code that looks more like spam but works idiomatic on Numpy arrays, or should I just use vectorize(spam)?

Use np.where. You'll get an array as the output even for plain number input, though.
def eggs(x):
y = np.asarray(x)
return np.where(y>=0, y+1, 1/(1-y))
This works for both arrays and plain numbers:
>>> eggs(5)
array(6.0)
>>> eggs(-3)
array(0.25)
>>> eggs(np.arange(-3, 3))
/home/praveen/.virtualenvs/numpy3-mkl/bin/ipython3:2: RuntimeWarning: divide by zero encountered in true_divide
array([ 0.25 , 0.33333333, 0.5 , 1. , 2. , 3. ])
>>> eggs(1)
/home/praveen/.virtualenvs/numpy3-mkl/bin/ipython3:3: RuntimeWarning: divide by zero encountered in long_scalars
# -*- coding: utf-8 -*-
array(2.0)
As ayhan remarks, this raises a warning, since 1/(1-x) gets evaluated for the whole range. But a warning is just that: a warning. If you know what you're doing, you can ignore the warning. In this case, you're only choosing 1/(1-x) from indices where it can never be inf, so you're safe.

I would use numpy.asarray (which is a no-op if the argument is already an numpy array) if I want to handle both numbers and numpy arrays
def eggs(x):
x = np.asfarray(x)
m = x>=0
x[m] = x[m] + 1
x[~m] = 1 / (1 - x[~m])
return x
(here I used asfarray to enforce a floating-point type, since your function requires floating-point computations).
This is less efficient than your spam function for single inputs, and arguably uglier. However it seems to be the easiest choice.
EDIT: If you want to ensure that x is not modified (as pointed out by Leon) you can replace np.asfarray(x) by np.array(x, dtype=np.float64), the array constructor copies by default.

Normalizing vector produces nan in Numpy

I'm getting some strange behavior from scipy/numpy that I suspect is a bug but someone may know better? I've got a pair of long arrays which I'm breaking into frames which are of length 2-4 for debugging purposes. I want to normalize each pair of frames and take the dot product. The code that does it (with some debugging output) is:
tf = numpy.copy(t_frame) / norm(t_frame)
pf = numpy.copy(p_frame) / norm(p_frame)
print "OPF:"
print p_frame
print "PF: "
print pf
print "TF norm is: " + str(norm(tf))
print "PF norm is: " + str(norm(pf))
print numpy.dot(tf, pf)
return numpy.dot(tf, pf)
This does what I'd expect for a while (specifically giving a norm of 1 for tf and pf) but then I start to see lines like this:
OPF:
[ -91 -119 -137 -132]
PF:
[ nan nan nan nan]
What?? This can be normalized fine in a new Python window:
>>> p = [ -91, -119, -137, -132]
>>> p / norm(p)
array([-0.37580532, -0.49143773, -0.56577285, -0.54512421])
For what it's worth, I've tried numpy.linalg.norm, scipy.linalg.norm, and defining a function to return the square root of the dot product.
Any ideas?
UPDATE:
Thanks for the suggestions! I tried switching the dtype to float128 and am sadly getting similar behavior. I'm actually inclined to believe that it's a bug in Python rather than numpy at this point:
If it were a straightforward overflow issue, it seems like I'd get it consistently with a given list. But the norm computes fine if I do it in a new python session.
I tried rolling my own:
def norm(v):
return ( sum(numpy.array(v)*numpy.array(v)))**(0.5)
This only uses numpy to represent the arrays. I still get the same issue, but later in the data set (and no runtime warnings). It's doing about 37000 of these computations.
I'm actually computing the norm on two frames, a t_frame and a p_frame. The computation of one chokes if and only if the computation for the other one does.
Put together, I think there's some weird buffer overflow somewhere in the bowels of Python (2.7.9)??? I ultimately need these computations to be fast as well; so I'm thinking of just switching over to Cython for that computation.
Update 2:
I tried really rolling my own:
def norm(v):
sum = float(0)
for i in range(len(v)):
sum += v[i]**2
return sum**(0.5)
and the problem disappears. So I would guess that it is a bug in numpy (1.9.0 on Gentoo Linux).

It looks like this is a bug in numpy. I can reproduce the problem if the data type of the array is np.int16:
In [1]: np.__version__
Out[1]: '1.9.2'
In [2]: x = np.array([ -91, -119, -137, -132], dtype=np.int16)
In [3]: x
Out[3]: array([ -91, -119, -137, -132], dtype=int16)
In [4]: np.linalg.norm(x)
/Users/warren/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:2061: RuntimeWarning: invalid value encountered in sqrt
return sqrt(sqnorm)
Out[4]: nan
The problem also occurs in the master branch of the development version of numpy. I created an issue here: https://github.com/numpy/numpy/issues/6128
If p_frame is, in fact, a 16 bit integer array, a simple work-around is something like:
x = np.asarray(p_frame, dtype=np.float64)
pf = x / norm(x)

Following one of Warren's links, I get this warning:
In [1016]: np.linalg.norm(100000*np.ones(2).astype('int16'))
/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py:2051: RuntimeWarning: invalid value encountered in sqrt
return sqrt(add.reduce((x.conj() * x).real, axis=None))
For this x2, the inner expression is negative - the result of overflow in a small dtype.
In [1040]: x2=100000*np.ones(2).astype('int16')
In [1041]: np.add.reduce((x2.conj()*x2).real,axis=None)
Out[1041]: -1474836480
similarly with an x1:
In [1042]: x1
Out[1042]: array([ -9100, -11900, -13700, -13200], dtype=int16)
In [1043]: np.add.reduce((x1.conj()*x1).real,axis=None)
Out[1043]: -66128
If the sum of the 'dot' becomes too large for the dtype, it can be negative, producing a nan when passed through sqrt.
(I'm using 1.8.2 and 1.9.0 under linux).

numpy: Invalid value encountered in true_divide

I have two numpy arrays and I am trying to divide one with the other and at the same time, I want to make sure that the entries where the divisor is 0, should just be replaced with 0.
So, I do something like:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
This gives me a run time warning of:
RuntimeWarning: invalid value encountered in true_divide
Now, I wanted to see what was going on and I did the following:
xx = np.isfinite(diff_images)
print (xx[xx == False])
xx = np.isfinite(b_0)
print (xx[xx == False])
However, both of these return empty arrays meaning that all the values in the arrays are finite. So, I am not sure where the invalid value is coming from. I am assuming checking b_0 > 0 in the np.where function takes care of the divide by 0.
The shape of the two arrays are (96, 96, 55, 64) and (96, 96, 55, 1)

You may have a NAN, INF, or NINF floating around somewhere. Try this:
np.isfinite(diff_images).all()
np.isfinite(b_0).all()
If one or both of those returns False, that's likely the cause of the runtime error.

The reason you get the runtime warning when running this:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
is that the inner expression
np.divide(diff_images, b_0)
gets evaluated first, and is run on all elements of diff_images and b_0 (even though you end up ignoring the elements that involve division-by-zero). In other words, the warning happens before the code that ignores those elements. That is why it's a warning and not an error: there are legitimate cases like this one where the division-by-zero is not a problem because it's being handled in a later operation.

Another useful Numpy command is nan_to_num(diff_images)
By default it replaces in a Numpy array; NaN to zero, -INF to -(large number) and +INF to +(large number)
You can change the defaults, see https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html

As #drammock pointed out, the cause of the warning is that some of the values in b_0 is 0 and the runtime warning is generated before the np.where is evaluated. While #Luca's suggestion of running np.errstate(invalid='ignore', divide='ignore'):" before the np.where will prevent the warning in this case, there may be other legitimate cases where this warning could be generated. For instance, corresponding elements of b_0 and diff_images are set to np.inf, which would return np.nan.
So to prevent warnings for known cases (i.e. b_0 = 0) and allow for warnings of unknown cases, evaluate the np.where first then evaluate the arithmetic:
#First, create log_norm_images
log_norm_images = np.zeros(b_0.shape)
#Now get the valid indexes
valid = where(b_0 > 0)
#Lastly, evaluate the division problem at the valid indexes
log_norm_images[valid] = np.divide(diff_images[valid], b_0[valid])

num = np.array([1,2,3,4,5])
den = np.array([1,1,0,1,1])
res = np.array([None]*5)
ix = (den!=0)
res[ix] = np.divide( num[ix], den[ix] )
print(res)
[1.0 2.0 None 4.0 5.0]

Division by zero in numpy (sub)arrays

I have three arrays that are processed with a mathematical function to get a final result array. Some of the arrays contain NaNs and some contain 0. However a division by zero logically raise a Warning, a calculation with NaN gives NaN. So I'd like to do certain operations on certain parts of the arrays where zeros are involved:
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
1.0*n*numpy.exp(r*(1-(n/k)))
e.g. in cases where k == 0, I'd like to get as a result 0. In all other cases I'd to calculate the function above. So what is the way to do such calculations on parts of the array (via indexing) to get a final single result array?

import numpy
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
indxZeros=numpy.where(k==0)
indxNonZeros=numpy.where(k!=0)
d=numpy.empty(k.shape)
d[indxZeros]=0
d[indxNonZeros]=n[indxNonZeros]/k[indxNonZeros]
print d

Is following what you need?
>>> rv = 1.0*n*numpy.exp(r*(1-(n/k)))
>>> rv[k==0] = 0
>>> rv
array([ nan, 0., nan])

So, you may think that the solution to this problem is to use numpy.where, but the following:
numpy.where(k==0, 0, 1.0*n*numpy.exp(r*(1-(n/k))))
still gives a warning, as the expression is actually evaluated for the cases where k is zero, even if those results aren't used.
If this really bothers you, you can use numexpr for this expression, which will actually branch on the where statement and not evaluate the k==0 case:
import numexpr
numexpr.evaluate('where(k==0, 0, 1.0*n*exp(r*(1-(n/k))))')
Another way, based on indexing as you asked for, involves a little loss in legibility
result = numpy.zeros_like(k)
good = k != 0
result[good] = 1.0*n[good]*numpy.exp(r[good]*(1-(n[good]/k[good])))
This can be bypassed somewhat by defining a gaussian function:
def gaussian(r, k, n):
return 1.0*n*numpy.exp(r*(1-(n/k)))
result = numpy.zeros_like(k)
good = k != 0
result[good] = gaussian(r[good], k[good], n[good])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.random.multinomial bad outputs? - python

Related

Error "TypeError: type numpy.ndarray doesn't define round method"

How to write conditional code that's compatible with both plain Python values and NumPy arrays?

Normalizing vector produces nan in Numpy

numpy: Invalid value encountered in true_divide

Division by zero in numpy (sub)arrays

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.random.multinomial bad outputs? - python

Related

Error "TypeError: type numpy.ndarray doesn't define __round__ method"

How to write conditional code that's compatible with both plain Python values and NumPy arrays?

Normalizing vector produces nan in Numpy

numpy: Invalid value encountered in true_divide

Division by zero in numpy (sub)arrays

Categories

Resources

Error "TypeError: type numpy.ndarray doesn't define round method"