How to numpy.nan*0=0 - python

I would like to have python make the product
numpy.nan*0
return 0 (instead of nan), but e.g.
numpy.nan*4
still return nan.
My application: I have some numpy matrices which I am multiplying with one another. These contain many nan entries, and plenty of zeros.
The nans always represent unknown, but finite values which are known to become zero when multiplied with zero.
So I would like A*B return [1,nan],[nan,1] in the following example:
import numpy as np
A=np.matrix('1 0; 0 1')
B=np.matrix([[1, np.nan],[np.nan, 1]])
Is this possible?
Many thanks

You can use the numpy function numpy.nan_to_num()
import numpy as np
A = np.matrix('1 0; 0 1')
B = np.matrix([[1, np.nan],[np.nan, 1]])
C = np.nan_to_num(A) * np.nan_to_num(B)
The outcome will be [[1., 0.], [0., 1.]].

I don't think it is possible to override the behavior of nan * 0 in numpy directly because that multiplication is performed at a very low level.
However, you can provide your own Python class with the desired multiplication behavior, but be warned: This will seriously kill performance.
import numpy as np
class MyNumber(float):
def __mul__(self, other):
if other == 0 and np.isnan(self) or self == 0 and np.isnan(other):
return 0.0
return float(self) * other
def convert(x):
x = np.asmatrix(x, dtype=object) # use Python objects as matrix elements
x.flat = [MyNumber(i) for i in x.flat] # convert each element to MyNumber
return x
A = convert([[1, 0], [0, 1]])
B = convert([[1, np.nan], [np.nan, 1]])
print(A * B)
# [[1.0 nan]
# [nan 1.0]]

Related

numpy vectorize use on (2,) array

I have a numpy array of (m, 2) and I want to transform it to shape of (m, 1) using a function below.
def func(x):
if x == [1., 1.]:
return 0.
if x == [-1., 1.] or x == [-1., -1.]:
return 1.
if x == [1., -1.]:
return 2.
I want this for applied on each (2,) vector inside the (m, 2) array resulting an (m, 1) array. I tried to use numpy.vectorize but it seems that the function gets applied in each element of a array (which makes sense in general purpose case). So I have failed to apply it.
My intension is not to use for loop. Can anyone help me with this? Thanks.
import numpy as np
def f(a, b):
return a + b
F = np.vectorize(f)
x = np.asarray([[1, 2], [3, 4], [5, 6]]).T
print(F(*x))
Output:
[3, 7, 11]

Python- Numpy array in function with conditions

I have a numpy array (say xs), for which I am writing a function to create another array (say ys), which has the same values as of xs until the first half of xs and two times of xs in the remaining half. for example, if xs=[0,1,2,3,4,5], the required output is [0,1,2,6,8,10]
I wrote the following function:
import numpy as np
xs=np.arange(0,6,1)
def step(xs):
ys1=np.array([]);ys2=np.array([])
if xs.all() <=2:
ys1=xs
else:
ys2=xs*2
return np.concatenate((ys1,ys2))
print(xs,step(xs))
Which produces output: `array([0., 1., 2., 3., 4., 5.]), ie the second condition is not executed. Does anybody know how to fix it? Thanks in advance.
You can use vectorised operations instead of Python-level iteration. With the below method, we first copy the array and then multiply the second half of the array by 2.
import numpy as np
xs = np.arange(0,6,1)
def step(xs):
arr = xs.copy()
arr[int(len(arr)/2):] *= 2
return arr
print(xs, step(xs))
[0 1 2 3 4 5] [ 0 1 2 6 8 10]
import numpy as np
xs=np.arange(0,6,1)
def f(a):
it = np.nditer([a, None])
for x, y in it:
y[...] = x if x <= 2 else x * 2
return it.operands[1]
print(f(xs))
[ 0 1 2 6 8 10]
Sorry, I did not find your bug, but I felt it can be implemented differently.

how to calculate geomatic average value with nans?

I would like to calculate the geometric mean of some data (including NaN), how can I do it?
I know how to calculate the mean value with NaNs, we can use the following code:
import numpy as np
M = np.nanmean(data, axis=2).
So how to do it with geomean?
You could use the identity (I only found it in the german Wikipedia but there are probably other sources as well):
This identity can be constructed using the "logarithm rules" on the normal definition of the geometric mean:
The base a can be chosen arbitarly, so you could use np.log (and np.exp as inverse operation):
import numpy as np
def nangmean(arr, axis=None):
arr = np.asarray(arr)
inverse_valids = 1. / np.sum(~np.isnan(arr), axis=axis) # could be a problem for all-nan-axis
rhs = inverse_valids * np.nansum(np.log(arr), axis=axis)
return np.exp(rhs)
And it seems to work:
>>> l = [[1, 2, 3], [1, np.nan, 3], [np.nan, 2, np.nan]]
>>> nangmean(l)
1.8171205928321397
>>> nangmean(l, axis=1)
array([ 1.81712059, 1.73205081, 2. ])
>>> nangmean(l, axis=0)
array([ 1., 2., 3.])
In NumPy 1.10 also np.nanprod was added, so you could also use the normal definition:
import numpy as np
def nangmean(arr, axis=None):
arr = np.asarray(arr)
valids = np.sum(~np.isnan(arr), axis=axis)
prod = np.nanprod(arr, axis=axis)
return np.power(prod, 1. / valids)

python numpy weighted average with nans

First things first: this is not a duplicate of NumPy: calculate averages with NaNs removed, i'll explain why:
Suppose I have an array
a = array([1,2,3,4])
and I want to average over it with the weights
weights = [4,3,2,1]
output = average(a, weights=weights)
print output
2.0
ok. So this is pretty straightforward. But now I have something like this:
a = array([1,2,nan,4])
calculating the average with the usual method yields of coursenan. Can I avoid this?
In principle I want to ignore the nans, so I'd like to have something like this:
a = array([1,2,4])
weights = [4,3,1]
output = average(a, weights=weights)
print output
1.75
Alternatively, you can use a MaskedArray as such:
>>> import numpy as np
>>> a = np.array([1,2,np.nan,4])
>>> weights = np.array([4,3,2,1])
>>> ma = np.ma.MaskedArray(a, mask=np.isnan(a))
>>> np.ma.average(ma, weights=weights)
1.75
First find out indices where the items are not nan, and then pass the filtered versions of a and weights to numpy.average:
>>> import numpy as np
>>> a = np.array([1,2,np.nan,4])
>>> weights = np.array([4,3,2,1])
>>> indices = np.where(np.logical_not(np.isnan(a)))[0]
>>> np.average(a[indices], weights=weights[indices])
1.75
As suggested by #mtrw in comments, it would be cleaner to use masked array here instead of index array:
>>> indices = ~np.isnan(a)
>>> np.average(a[indices], weights=weights[indices])
1.75
I would offer another solution, which is more scalable to bigger dimensions (eg when doing average over different axis). Attached code works with 2D array, which possibly contains nans, and takes average over axis=0.
a = np.random.randint(5, size=(3,2)) # let's generate some random 2D array
# make weights matrix with zero weights at nan's in a
w_vec = np.arange(1, a.shape[0]+1)
w_vec = w_vec.reshape(-1, 1)
w_mtx = np.repeat(w_vec, a.shape[1], axis=1)
w_mtx *= (~np.isnan(a))
# take average as (weighted_elements_sum / weights_sum)
w_a = a * w_mtx
a_sum_vec = np.nansum(w_a, axis=0)
w_sum_vec = np.nansum(w_mtx, axis=0)
mean_vec = a_sum_vec / w_sum_vec
# mean_vec is vector with weighted nan-averages of array a taken along axis=0
Expanding on #Ashwini and #Nicolas' answers, here is a version that can also handle an edge case where all the data values are np.nan, and that is designed to also work with pandas DataFrame without type-related issues:
def calc_wa_ignore_nan(df: pd.DataFrame, measures: List[str],
weights: List[Union[float, int]]) -> np.ndarray:
""" Calculates the weighted average of `measures`' values, ex-nans.
When nans are present in `measures`' values,
the weights are recalculated based only on the weights for non-nan measures.
Note:
The calculation used is NOT the same as just ignoring nans.
For example, if we had data and weights:
data = [2, 3, np.nan]
weights = [0.5, 0.2, 0.3]
calc_wa_ignore_nan approach:
(2*(0.5/(0.5+0.2))) + (3*(0.2/(0.5+0.2))) == 2.285714285714286
The ignoring nans approach:
(2*0.5) + (3*0.2) == 1.6
Args:
data: Multiple rows of numeric data values with `measures` as column headers.
measures: The str names of values to select from `row`.
weights: The numeric weights associated with `measures`.
Example:
>>> df = pd.DataFrame({"meas1": [1, 1],
"meas2": [2, 2],
"meas3": [3, 3],
"meas4": [np.nan, 0],
"meas5": [5, 5]})
>>> measures = ["meas2", "meas3", "meas4"]
>>> weights = [0.5, 0.2, 0.3]
>>> calc_wa_ignore_nan(df, measures, weights)
array([2.28571429, 1.6])
"""
assert not df.empty, "Nothing to calculate weighted average for: `df` is empty."
# Need to coerce type to np.float instead of python's float
# to avoid "ufunc 'isnan' not supported for the input types ..." error
data = np.array(df[measures].values, dtype=np.float64)
# Make a 2d array with the same weights for each row
# cast for safety and better errors
weights = np.array([weights, ] * data.shape[0], dtype=np.float64)
mask = np.isnan(data)
masked_data = np.ma.masked_array(data, mask=mask)
masked_weights = np.ma.masked_array(weights, mask=mask)
# np.nanmean doesn't support weights
weighted_avgs = np.average(masked_data, weights=masked_weights, axis=1)
# Replace masked elements with np.nan
# otherwise those elements will be interpretted as 0 when read into a pd.DataFrame
weighted_avgs = weighted_avgs.filled(np.nan)
return weighted_avgs
All the solutions above are very good, but has don't handle the cases when there is nan in weights. For doing so, using pandas :
def weighted_average_ignoring_nan(df, col_value, col_weight):
den = 0
num = 0
for index, row in df.iterrows():
if(~np.isnan(row[col_weight]) & ~np.isnan(row[col_value])):
den = den + row[col_weight]
num = num + row[col_weight]*row[col_value]
return num/den
Since you're looking for the mean another idea is to simply replace all the nan values with 0's:
>>>import numpy as np
>>>a = np.array([[ 3., 2., 5.], [np.nan, 4., np.nan], [np.nan, np.nan, np.nan]])
>>>w = np.array([[ 1., 2., 3.], [np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan]])
>>>a[np.isnan(a)] = 0
>>>w[np.isnan(w)] = 0
>>>np.average(a, weights=w)
3.6666666666666665
This can be used with the axis functionality of the average function but be carful that your weights don't sum up to 0.

Vectorized (partial) inverse of an N*M*M tensor with numpy

I'm almost exactly in a similar situation as the asker here over a year ago:
fast way to invert or dot kxnxn matrix
So I have a tensor with indices a[n,i,j] of dimensions (N,M,M) and I want to invert the M*M square matrix part for each n in N.
For example, suppose I have
In [1]: a = np.arange(12)
a.shape = (3,2,2)
a
Out[1]: array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
Then a for loop inversion would go like this:
In [2]: inv_a = np.zeros([3,2,2])
for m in xrange(0,3):
inv_a[m] = np.linalg.inv(a[m])
inv_a
Out[2]: array([[[-1.5, 0.5],
[ 1. , 0. ]],
[[-3.5, 2.5],
[ 3. , -2. ]],
[[-5.5, 4.5],
[ 5. , -4. ]]])
This will apparently be implemented in NumPy 2.0, according to this issue on github...
I guess I need to install the dev version as seberg noted in the github issue thread, but is there another way to do this in vectorized manner right now?
Update:
In NumPy 1.8 and later, the functions in numpy.linalg are generalized universal functions.
Meaning that you can now do something like this:
import numpy as np
a = np.random.rand(12, 3, 3)
np.linalg.inv(a)
This will invert each 3x3 array and return the result as a 12x3x3 array.
See the numpy 1.8 release notes.
Original Answer:
Since N is relatively small, how about we compute the LU decomposition manually for all the matrices at once.
This ensures that the for loops involved are relatively short.
Here's how this can be done with normal NumPy syntax:
import numpy as np
from numpy.random import rand
def pylu3d(A):
N = A.shape[1]
for j in xrange(N-1):
for i in xrange(j+1,N):
#change to L
A[:,i,j] /= A[:,j,j]
#change to U
A[:,i,j+1:] -= A[:,i,j:j+1] * A[:,j,j+1:]
def pylusolve(A, B):
N = A.shape[1]
for j in xrange(N-1):
for i in xrange(j+1,N):
B[:,i] -= A[:,i,j] * B[:,j]
for j in xrange(N-1,-1,-1):
B[:,j] /= A[:,j,j]
for i in xrange(j):
B[:,i] -= A[:,i,j] * B[:,j]
#usage
A = rand(1000000,3,3)
b = rand(3)
b = np.tile(b,(1000000,1))
pylu3d(A)
# A has been replaced with the LU decompositions
pylusolve(A, b)
# b has been replaced to the solutions of
# A[i] x = b[i] for each A[i] and b[i]
As I have written it, pylu3d modifies A in place to compute the LU decomposition.
After replacing each NxN matrix with its LU decomposition, pylusolve can be used to solve an MxN array b representing the right hand sides of your matrix systems.
It modifies b in place and does the proper back substitutions to solve the system.
As it is written, this implementation does not include pivoting, so it isn't numerically stable, but it should work well enough in most cases.
Depending on how your array is arranged in memory, it is probably still a good bit faster to use Cython.
Here are two Cython functions that do the same thing, but they iterate along M first.
It's not vectorized, but it is relatively fast.
from numpy cimport ndarray as ar
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
def lu3d(ar[double,ndim=3] A):
cdef int n, i, j, k, N=A.shape[0], h=A.shape[1], w=A.shape[2]
for n in xrange(N):
for j in xrange(h-1):
for i in xrange(j+1,h):
#change to L
A[n,i,j] /= A[n,j,j]
#change to U
for k in xrange(j+1,w):
A[n,i,k] -= A[n,i,j] * A[n,j,k]
#cython.boundscheck(False)
#cython.wraparound(False)
def lusolve(ar[double,ndim=3] A, ar[double,ndim=2] b):
cdef int n, i, j, N=A.shape[0], h=A.shape[1]
for n in xrange(N):
for j in xrange(h-1):
for i in xrange(j+1,h):
b[n,i] -= A[n,i,j] * b[n,j]
for j in xrange(h-1,-1,-1):
b[n,j] /= A[n,j,j]
for i in xrange(j):
b[n,i] -= A[n,i,j] * b[n,j]
You could also try using Numba, though I couldn't get it to run as fast as Cython in this case.

Categories