Is there a quick way of replacing all NaN values in a numpy array with (say) the linearly interpolated values?
For example,
[1 1 1 nan nan 2 2 nan 0]
would be converted into
[1 1 1 1.3 1.6 2 2 1 0]
Lets define first a simple helper function in order to make it more straightforward to handle indices and logical indices of NaNs:
import numpy as np
def nan_helper(y):
"""Helper to handle indices and logical indices of NaNs.
Input:
- y, 1d numpy array with possible NaNs
Output:
- nans, logical indices of NaNs
- index, a function, with signature indices= index(logical_indices),
to convert logical indices of NaNs to 'equivalent' indices
Example:
>>> # linear interpolation of NaNs
>>> nans, x= nan_helper(y)
>>> y[nans]= np.interp(x(nans), x(~nans), y[~nans])
"""
return np.isnan(y), lambda z: z.nonzero()[0]
Now the nan_helper(.) can now be utilized like:
>>> y= array([1, 1, 1, NaN, NaN, 2, 2, NaN, 0])
>>>
>>> nans, x= nan_helper(y)
>>> y[nans]= np.interp(x(nans), x(~nans), y[~nans])
>>>
>>> print y.round(2)
[ 1. 1. 1. 1.33 1.67 2. 2. 1. 0. ]
---
Although it may seem first a little bit overkill to specify a separate function to do just things like this:
>>> nans, x= np.isnan(y), lambda z: z.nonzero()[0]
it will eventually pay dividends.
So, whenever you are working with NaNs related data, just encapsulate all the (new NaN related) functionality needed, under some specific helper function(s). Your code base will be more coherent and readable, because it follows easily understandable idioms.
Interpolation, indeed, is a nice context to see how NaN handling is done, but similar techniques are utilized in various other contexts as well.
I came up with this code:
import numpy as np
nan = np.nan
A = np.array([1, nan, nan, 2, 2, nan, 0])
ok = -np.isnan(A)
xp = ok.ravel().nonzero()[0]
fp = A[-np.isnan(A)]
x = np.isnan(A).ravel().nonzero()[0]
A[np.isnan(A)] = np.interp(x, xp, fp)
print A
It prints
[ 1. 1.33333333 1.66666667 2. 2. 1. 0. ]
Just use numpy logical and there where statement to apply a 1D interpolation.
import numpy as np
from scipy import interpolate
def fill_nan(A):
'''
interpolate to fill nan values
'''
inds = np.arange(A.shape[0])
good = np.where(np.isfinite(A))
f = interpolate.interp1d(inds[good], A[good],bounds_error=False)
B = np.where(np.isfinite(A),A,f(inds))
return B
For two dimensional data, the SciPy's griddata works fairly well for me:
>>> import numpy as np
>>> from scipy.interpolate import griddata
>>>
>>> # SETUP
>>> a = np.arange(25).reshape((5, 5)).astype(float)
>>> a
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
>>> a[np.random.randint(2, size=(5, 5)).astype(bool)] = np.NaN
>>> a
array([[ nan, nan, nan, 3., 4.],
[ nan, 6., 7., nan, nan],
[ 10., nan, nan, 13., nan],
[ 15., 16., 17., nan, 19.],
[ nan, nan, 22., 23., nan]])
>>>
>>> # THE INTERPOLATION
>>> x, y = np.indices(a.shape)
>>> interp = np.array(a)
>>> interp[np.isnan(interp)] = griddata(
... (x[~np.isnan(a)], y[~np.isnan(a)]), # points we know
... a[~np.isnan(a)], # values we know
... (x[np.isnan(a)], y[np.isnan(a)])) # points to interpolate
>>> interp
array([[ nan, nan, nan, 3., 4.],
[ nan, 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ nan, nan, 22., 23., nan]])
I am using it on 3D images, operating on 2D slices (4000 slices of 350x350). The whole operation still takes about an hour :/
Or building on Winston's answer
def pad(data):
bad_indexes = np.isnan(data)
good_indexes = np.logical_not(bad_indexes)
good_data = data[good_indexes]
interpolated = np.interp(bad_indexes.nonzero()[0], good_indexes.nonzero()[0], good_data)
data[bad_indexes] = interpolated
return data
A = np.array([[1, 20, 300],
[nan, nan, nan],
[3, 40, 500]])
A = np.apply_along_axis(pad, 0, A)
print A
Result
[[ 1. 20. 300.]
[ 2. 30. 400.]
[ 3. 40. 500.]]
It might be easier to change how the data is being generated in the first place, but if not:
bad_indexes = np.isnan(data)
Create a boolean array indicating where the nans are
good_indexes = np.logical_not(bad_indexes)
Create a boolean array indicating where the good values area
good_data = data[good_indexes]
A restricted version of the original data excluding the nans
interpolated = np.interp(bad_indexes.nonzero(), good_indexes.nonzero(), good_data)
Run all the bad indexes through interpolation
data[bad_indexes] = interpolated
Replace the original data with the interpolated values.
I use the interpolation for replacing all NaN values.
A = np.array([1, nan, nan, 2, 2, nan, 0])
np.interp(np.arange(len(A)),
np.arange(len(A))[np.isnan(A) == False],
A[np.isnan(A) == False])
Output :
array([1. , 1.33333333, 1.66666667, 2. , 2. , 1. , 0. ])
I needed an approach that would also fill in NaN's at the start of end of the data, which the main answer does not appear to do.
The function I came up with uses a linear regression to fill in the NaN's. This overcomes my problem:
import numpy as np
def linearly_interpolate_nans(y):
# Fit a linear regression to the non-nan y values
# Create X matrix for linreg with an intercept and an index
X = np.vstack((np.ones(len(y)), np.arange(len(y))))
# Get the non-NaN values of X and y
X_fit = X[:, ~np.isnan(y)]
y_fit = y[~np.isnan(y)].reshape(-1, 1)
# Estimate the coefficients of the linear regression
beta = np.linalg.lstsq(X_fit.T, y_fit)[0]
# Fill in all the nan values using the predicted coefficients
y.flat[np.isnan(y)] = np.dot(X[:, np.isnan(y)].T, beta)
return y
Here's an example usage case:
# Make an array according to some linear function
y = np.arange(12) * 1.5 + 10.
# First and last value are NaN
y[0] = np.nan
y[-1] = np.nan
# 30% of other values are NaN
for i in range(len(y)):
if np.random.rand() > 0.7:
y[i] = np.nan
# NaN's are filled in!
print (y)
print (linearly_interpolate_nans(y))
Slightly optimized version based on response of BRYAN WOODS. It handles starting and ending values of source data correctly, and it is faster on 25-30% than original version. Also you may use different kinds of interpolations (see scipy.interpolate.interp1d documentations for details).
import numpy as np
from scipy.interpolate import interp1d
def fill_nans_scipy1(padata, pkind='linear'):
"""
Interpolates data to fill nan values
Parameters:
padata : nd array
source data with np.NaN values
Returns:
nd array
resulting data with interpolated values instead of nans
"""
aindexes = np.arange(padata.shape[0])
agood_indexes, = np.where(np.isfinite(padata))
f = interp1d(agood_indexes
, padata[agood_indexes]
, bounds_error=False
, copy=False
, fill_value="extrapolate"
, kind=pkind)
return f(aindexes)
In [17]: adata = np.array([1, 2, np.NaN, 4])
Out[18]: array([ 1., 2., nan, 4.])
In [19]: fill_nans_scipy1(adata)
Out[19]: array([1., 2., 3., 4.])
Building on the answer by Bryan Woods, I modified his code to also convert lists consisting only of NaN to a list of zeros:
def fill_nan(A):
'''
interpolate to fill nan values
'''
inds = np.arange(A.shape[0])
good = np.where(np.isfinite(A))
if len(good[0]) == 0:
return np.nan_to_num(A)
f = interp1d(inds[good], A[good], bounds_error=False)
B = np.where(np.isfinite(A), A, f(inds))
return B
Simple addition, I hope it will be of use to someone.
Interpolation and extrapolation with padding keywords
The following solution interpolates the nan values in an array by np.interp, if a finite value is present on both sides. Nan values at the borders are handled by np.pad with modes like constant or reflect.
import numpy as np
import matplotlib.pyplot as plt
def extrainterpolate_nans_1d(
arr, kws_pad=({'mode': 'edge'}, {'mode': 'edge'})
):
"""Interpolates and extrapolates nan values.
Interpolation is linear, compare np.interp(..).
Extrapolation works with pad keywords, compare np.pad(..).
Parameters
----------
arr : np.ndarray, shape (N,)
Array to replace nans in.
kws_pad : dict or (dict, dict)
kwargs for np.pad on left and right side
Returns
-------
bool
Description of return value
See Also
--------
https://numpy.org/doc/stable/reference/generated/numpy.interp.html
https://numpy.org/doc/stable/reference/generated/numpy.pad.html
https://stackoverflow.com/a/43821453/7128154
"""
assert arr.ndim == 1
if isinstance(kws_pad, dict):
kws_pad_left = kws_pad
kws_pad_right = kws_pad
else:
assert len(kws_pad) == 2
assert isinstance(kws_pad[0], dict)
assert isinstance(kws_pad[1], dict)
kws_pad_left = kws_pad[0]
kws_pad_right = kws_pad[1]
arr_ip = arr.copy()
# interpolation
inds = np.arange(len(arr_ip))
nan_msk = np.isnan(arr_ip)
arr_ip[nan_msk] = np.interp(inds[nan_msk], inds[~nan_msk], arr[~nan_msk])
# detemine pad range
i0 = next(
(ids for ids, val in np.ndenumerate(arr) if not np.isnan(val)), 0)[0]
i1 = next(
(ids for ids, val in np.ndenumerate(arr[::-1]) if not np.isnan(val)), 0)[0]
i1 = len(arr) - i1
# print('pad in range [0:{:}] and [{:}:{:}]'.format(i0, i1, len(arr)))
# pad
arr_pad = np.pad(
arr_ip[i0:], pad_width=[(i0, 0)], **kws_pad_left)
arr_pad = np.pad(
arr_pad[:i1], pad_width=[(0, len(arr) - i1)], **kws_pad_right)
return arr_pad
# setup data
ys = np.arange(30, dtype=float)**2/20
ys[:5] = np.nan
ys[20:] = 20
ys[28:] = np.nan
ys[[7, 13, 14, 18, 22]] = np.nan
ys_ie0 = extrainterpolate_nans_1d(ys)
kws_pad_sym = {'mode': 'symmetric'}
kws_pad_const7 = {'mode': 'constant', 'constant_values':7.}
ys_ie1 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_sym, kws_pad_const7))
ys_ie2 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_const7, kws_pad_sym))
fig, ax = plt.subplots()
ax.scatter(np.arange(len(ys)), ys, s=15**2, label='ys')
ax.scatter(np.arange(len(ys)), ys_ie0, s=8**2, label='ys_ie0, left_pad edge, right_pad edge')
ax.scatter(np.arange(len(ys)), ys_ie1, s=6**2, label='ys_ie1, left_pad symmetric, right_pad 7')
ax.scatter(np.arange(len(ys)), ys_ie2, s=4**2, label='ys_ie2, left_pad 7, right_pad symmetric')
ax.legend()
As suggested by an earlier comment, the best way to do this is to use a peer reviewed implementation. The pandas library has an interpolation method for 1d data, which interpolates np.nan values in Series or DataFrame:
pandas.Series.interpolate or pandas.DataFrame.interpolate
The documentation is very concise, recommend reading through! My implementation:
import pandas as pd
magnitudes_series = pd.Series(magnitudes) # Convert np.array to pd.Series
magnitudes_series.interpolate(
# I used "akima" because the second derivative of my data has frequent drops to 0
method=interpolation_method,
# Interpolate from both sides of the sequence, up to you (made sense for my data)
limit_direction="both",
# Interpolate only np.nan sequences that have number sequences at the ends of the respective np.nan sequences
limit_area="inside",
inplace=True,
)
# I chose to remove np.nan at the tails of data sequence
magnitudes_series.dropna(inplace=True)
result_in_numpy_array = magnitudes_series.values
Importing scipy looks like overkill to me. Here's a simple way using numpy and maintaining the same conventions as np.interp
def interp_nans(x:[float],left=None, right=None, period=None)->[float]:
"""
e.g. [1 1 1 nan nan 2 2 nan 0] -> [1 1 1 1.3 1.6 2 2 1 0]
"""
xp = [i for i, yi in enumerate(x) if np.isfinite(yi)]
fp = [yi for i, yi in enumerate(x) if np.isfinite(yi)]
return list(np.interp(x=list(range(len(x))), xp=xp, fp=fp,left=left,right=right,period=period))
Related
I need the inverse Fourier transform of a complex array. ifft should return a real array, but it returns another complex array.
In MATLAB,
a=ifft(fft(a)), but in Python it does not work like that.
a = np.arange(6)
m = ifft(fft(a))
m # Google says m should = a, but m is complex
Output :
array([0.+0.00000000e+00j, 1.+3.70074342e-16j, 2.+0.00000000e+00j,
3.-5.68396583e-17j, 4.+0.00000000e+00j, 5.-3.13234683e-16j])
The imaginary part is result floating precision number calculation error. If it is very small, it rather can be dropped.
Numpy has built-in function real_if_close, to do so:
>>> np.real_if_close(np.fft.ifft(np.fft.fft(a)))
array([0., 1., 2., 3., 4., 5.])
You can read about floating system limitations here:
https://docs.python.org/3.8/tutorial/floatingpoint.html
if the imaginary part is close to zero you could discard it:
import numpy as np
arr = np.array(
[
0.0 + 0.00000000e00j,
1.0 + 3.70074342e-16j,
2.0 + 0.00000000e00j,
3.0 - 5.68396583e-17j,
4.0 + 0.00000000e00j,
5.0 - 3.13234683e-16j,
]
)
if all(np.isclose(arr.imag, 0)):
arr = arr.real
# [ 0. 1. 2. 3. 4. 5.]
(that's what real_if_close does in one line as in R2RT's answer).
You can test like this:
import numpy as np
from numpy import fft
a = np.arange(6)
print(a)
f = np.fft.fft(a)
print(f)
m = np.fft.ifft(f)
print(m)
[0 1 2 3 4 5]
[15.+0.j -3.+5.19615242j -3.+1.73205081j -3.+0.j
-3.-1.73205081j -3.-5.19615242j]
[0.+0.j 1.+0.j 2.+0.j 3.+0.j 4.+0.j 5.+0.j]
To get the real part only you can use:
print(m.real) # [0. 1. 2. 3. 4. 5.]
You are mistaken in "Ifft should return a real array". If you want a real valued output (i.e. you have the fft of real data and now want to perform the ifft) you should use irfft.
See this example from the docs:
>>> np.fft.ifft([1, -1j, -1, 1j])
array([ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j]) #Output is complex which is correct
>>> np.fft.irfft([1, -1j, -1])
array([ 0., 1., 0., 0.]) #Output is real valued
This is one of the first things I try to code in python (and any programming language) and my first question here, so I hope I provide everything neccessary to help me.
I have upper triangular matrix and I need to solve system of equations Wx=y, where W (3x3 matrix) and y (vector) are given. I cannot use numpy.linalg functions, so I try to implement this, but backwards of course.
After several failed attempts, I limited my task to 3x3 matrix. Without loop, code looks like this:
x[0,2]=y[2]/W[2,2]
x[0,1]=(y[1]-W[1,2]*x[0,2])/W[1,1]
x[0,0]=(y[0]-W[0,2]*x[0,2]-W[0,1]*x[0,1])/W[0,0]
Now, every new sum contains more elements, which are schematic, but nevertheless need to be defined somehow. I suppose there must be sum function in numpy, but not linalg, which does such things, but I cannot find it.
My newest, partial "attempt" begins with something like this:
n=3
for k in range(n):
for i in range(n-k-1):
x[0,n-k-1]=y[n-k-1]/W[n-k-1,n-k-1]
Which, of course, contains only first element of each sum.
I would be thankful for any assistance.
Example I am working on:
y=np.array([ 0.80064077, 2.64300842, -0.74912957])
W=np.array([[6.244998,2.88230677,-5.44435723],[0.,2.94827198,2.26990852],[0.,0.,0.45441135]]
n=W.shape[1]
x=np.zeros((1,n), dtype=np.float)
Proper solution should look like:
[-2.30857143 2.16571429 -1.64857143]
Here's one approach to use generic n and with one-loop -
def one_loop(y, W, n):
out = np.zeros((1,n))
for i in range(n-1,-1,-1):
sums = (W[i,i+1:]*out[0,i+1:]).sum()
out[0,i] = (y[i] - sums)/W[i,i]
return out
For performance, we can replace that sum-reduction step with a dot-product. Thus, sums could be alternatively computed like so -
sums = W[i,i+1:].dot(x[0,i+1:])
Sample runs
1) n = 3 :
In [149]: y
Out[149]: array([ 5., 8., 7.])
In [150]: W
Out[150]:
array([[ 6., 6., 2.],
[ 3., 3., 3.],
[ 4., 8., 5.]])
In [151]: x = np.zeros((1,3))
...: x[0,2]=y[2]/W[2,2]
...: x[0,1]=(y[1]-W[1,2]*x[0,2])/W[1,1]
...: x[0,0]=(y[0]-W[0,2]*x[0,2]-W[0,1]*x[0,1])/W[0,0]
...:
In [152]: x
Out[152]: array([[-0.9 , 1.26666667, 1.4 ]])
In [154]: one_loop(y, W, n=3)
Out[154]: array([[-0.9 , 1.26666667, 1.4 ]])
2) n = 4 :
In [156]: y
Out[156]: array([ 5., 8., 7., 6.])
In [157]: W
Out[157]:
array([[ 6., 2., 3., 3.],
[ 3., 4., 8., 5.],
[ 8., 6., 6., 4.],
[ 8., 4., 2., 2.]])
In [158]: x = np.zeros((1,4))
...: x[0,3]=y[3]/W[3,3]
...: x[0,2]=(y[2]-W[2,3]*x[0,3])/W[2,2]
...: x[0,1]=(y[1]-W[1,3]*x[0,3]-W[1,2]*x[0,2])/W[1,1]
...: x[0,0]=(y[0]-W[0,3]*x[0,3]-W[0,2]*x[0,2]-W[0,1]*x[0,1])/W[0,0]
...:
In [159]: x
Out[159]: array([[-0.22222222, -0.08333333, -0.83333333, 3. ]])
In [160]: one_loop(y, W, n=4)
Out[160]: array([[-0.22222222, -0.08333333, -0.83333333, 3. ]])
One more take (now updated to the state-of-the-art provided by Divakar in another answer):
import numpy as np
y=np.array([ 0.80064077, 2.64300842, -0.74912957])
W=np.array([[6.244998,2.88230677,-5.44435723],[0.,2.94827198,2.26990852],[0.,0.,0.45441135]])
n=W.shape[1]
x=np.zeros((1,n), dtype=np.float)
for i in range(n-1, -1, -1):
x[0,i] = (y[i]-W[i,i+1:].dot(x[0,i+1:]))/W[i,i]
print(x)
gives:
[[-2.30857143 2.16571429 -1.64857143]]
My take
n=3
for k in range(n):
print("s=y[%d]"% (n-k-1))
s = y[n-k-1]
for i in range(0,k):
print("s - W[%d,%d]*x[0,%d]" % (n-k-1, n-i-1, n-i-1))
s = s - W[n-k-1,n-i-1]*x[0,n-i-1]
print("x[0,%d] = s/W[%d,%d]" % (n-k-1,n-k-1,n-k-1))
x[0,n-k-1] = s/W[n-k-1,n-k-1]
print(x)
and without print statements
n=3
for k in range(n):
s = y[n-k-1]
for i in range(0,k):
s = s - W[n-k-1,n-i-1]*x[0,n-i-1]
x[0,n-k-1] = s/W[n-k-1,n-k-1]
print(x)
Output
s=y[2]
x[0,2] = s/W[2,2]
s=y[1]
s - W[1,2]*x[0,2]
x[0,1] = s/W[1,1]
s=y[0]
s - W[0,2]*x[0,2]
s - W[0,1]*x[0,1]
x[0,0] = s/W[0,0]
[[-2.30857143 2.16571429 -1.64857143]]
I need obtain a "W" matrix of multiples matrix multiplications (all multiplications result in column vectors).
from numpy import matrix
from numpy import transpose
from numpy import matmul
from numpy import dot
# Iterative matrix multiplication
def iterativeMultiplication(X, Y):
W = [] # Matrix of matricial products
X = matrix(X) # same number of rows
Y = matrix(Y) # same number of rows
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
return W
But, unexpectedly, I obtain a list of objects with their respective data types.
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[-0.2], [1.1], [5.9], [12.3]] # Edit Y column
iterativeMultiplication( X, Y )
Results in:
[array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]]),
array([[37.5],[73.3],[60.8]])]
I need any method for obtain only the numerical values for the matrix conversion.
W = matrix(W) # Results in error
It is the same using "matmul" function. Thx for your time.
If you want to stack multiple matrices, you can use numpy.vstack:
W = numpy.vstack(W)
Edit: There seems to be a discrepancy between your function, X and Y versus the "result" list in your question. But based on your comments below, what you're actually looking for is numpy.hstack (horizontal stack) which will give you the desired 3x3 matrix based on your "result" list.
W = numpy.hstack(W)
Of course you are going to get a list. You initial W as a list, and append the same calculation to it 3 times.
But your 3 element arrays don't make sense with this data, array([[ 3.36877336],[ 3.97112615],[ 3.8092797 ]]).
If I make Xm=np.matrix(X), etc:
In [162]: Xm
Out[162]:
matrix([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 2., 2., 2.],
[ 2., 5., 4.]])
In [163]: Ym
Out[163]:
matrix([[ 0.1, -0.2],
[ 0.9, 1.1],
[ 6.2, 5.9],
[ 11.9, 12.3]])
In [164]: Xm.T.dot(Ym)
Out[164]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
In [165]: Xm.T*Ym # matrix interprets * as .dot
Out[165]:
matrix([[ 37.1, 37.5],
[ 71.9, 73.3],
[ 60.1, 60.8]])
You need to edit the question, to have both valid Python code (missing def and :), and results that match the inputs.
===============
In [173]: Y = [[-0.2], [1.1], [5.9], [12.3]]
In [174]: Ym=np.matrix(Y)
Out[176]:
matrix([[ 37.5],
[ 73.3],
[ 60.8]])
=====================
This iteration is clumsy:
h = 0
while (h < X.shape[1]):
W.append([])
W[h] = dot(transpose(X), Y) # using "dot" function
h += 1
A more Pythonic approach
for h in range(X.shape[1]):
W.append(np.dot(...))
Or even
W = [np.dot(....) for h in range(X.shape[1])]
Is there a filter similar to ndimage's generic_filter that supports vector output? I did not manage to make scipy.ndimage.filters.generic_filter return more than a scalar. Uncomment the line in the code below to get the error: TypeError: only length-1 arrays can be converted to Python scalars.
I'm looking for a generic filter that process 2D or 3D arrays and returns a vector at each point. Thus the output would have one added dimension. For the example below I'd expect something like this:
m.shape # (10,10)
res.shape # (10,10,2)
Example Code
import numpy as np
from scipy import ndimage
a = np.ones((10, 10)) * np.arange(10)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
def myfunc(x):
r = sum(x)
#r = np.array([1,1]) # uncomment this
return r
res = ndimage.generic_filter(a, myfunc, footprint=footprint)
The generic_filter expects myfunc to return a scalar, never a vector.
However, there is nothing that precludes myfunc from also adding information
to, say, a list which is passed to myfunc as an extra argument.
Instead of using the array returned by generic_filter, we can generate our vector-valued array by reshaping this list.
For example,
import numpy as np
from scipy import ndimage
a = np.ones((10, 10)) * np.arange(10)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
ndim = 2
def myfunc(x, out):
r = np.arange(ndim, dtype='float64')
out.extend(r)
return 0
result = []
ndimage.generic_filter(
a, myfunc, footprint=footprint, extra_arguments=(result,))
result = np.array(result).reshape(a.shape+(ndim,))
I think I get what you're asking, but I'm not completely sure how does the ndimage.generic_filter work (how abstruse is the source!).
Here's just a simple wrapper function. This function will take in an array, all the parameters ndimage.generic_filter needs. Function returns an array where each element of the former array is now represented by an array with shape (2,), result of the function is stored as the second element of that array.
def generic_expand_filter(inarr, func, **kwargs):
shape = inarr.shape
res = np.empty(( shape+(2,) ))
temp = ndimage.generic_filter(inarr, func, **kwargs)
for row in range(shape[0]):
for val in range(shape[1]):
res[row][val][0] = inarr[row][val]
res[row][val][1] = temp[row][val]
return res
Output, where res denotes just the generic_filter and res2 denotes generic_expand_filter, of this function is:
>>> a.shape #same as res.shape
(10, 10)
>>> res2.shape
(10, 10, 2)
>>> a[0]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> res[0]
array([ 3., 8., 16., 24., 32., 40., 48., 56., 64., 69.])
>>> print(*res2[0], sep=", ") #this is just to avoid the vertical default output
[ 0. 3.], [ 1. 8.], [ 2. 16.], [ 3. 24.], [ 4. 32.], [ 5. 40.], [ 6. 48.], [ 7. 56.], [ 8. 64.], [ 9. 69.]
>>> a[0][0]
0.0
>>> res[0][0]
3.0
>>> res2[0][0]
array([ 0., 3.])
Of course you probably don't want to save the old array, but instead have both fields as new results. Except I don't know what exactly you had in mind, if the two values you want stored are unrelated, just add a temp2 and func2 and call another generic_filter with the same **kwargs and store that as the first value.
However if you want an actual vector quantity that is calculated using multiple inarr elements, meaning that the two new created fields aren't independent, you are just going to have to write that kind of a function, one that takes in an array, idx, idy indices and returns a tuple\list\array value which you can then unpack and assign to the result.
I'm relatively new to python but I'm trying to understand something which seems basic.
Create a vector:
x = np.linspace(0,2,3)
Out[38]: array([ 0., 1., 2.])
now why isn't x[:,0] a value argument?
IndexError: invalid index
It must be x[0]. I have a function I am calling which calculates:
np.sqrt(x[:,0]**2 + x[:,1]**2 + x[:,2]**2)
Why can't what I have just be true regardless of the input? It many other languages, it is independent of there being other rows in the array. Perhaps I misunderstand something fundamental - sorry if so. I'd like to avoid putting:
if len(x) == 1:
norm = np.sqrt(x[0]**2 + x[1]**2 + x[2]**2)
else:
norm = np.sqrt(x[:,0]**2 + x[:,1]**2 + x[:,2]**2)
everywhere. Surely there is a way around this... thanks.
Edit: An example of it working in another language is Matlab:
>> b = [1,2,3]
b =
1 2 3
>> b(:,1)
ans =
1
>> b(1)
ans =
1
Perhaps you are looking for this:
np.sqrt(x[...,0]**2 + x[...,1]**2 + x[...,2]**2)
There can be any number of dimensions in place of the ellipsis ...
See also What does the Python Ellipsis object do?, and the docs of NumPy basic slicing
It looks like the ellipsis as described by #JanneKarila has answered your question, but I'd like to point out how you might make your code a bit more "numpythonic". It appears you want to handle an n-dimensional array with the shape (d_1, d_2, ..., d_{n-1}, 3), and compute the magnitudes of this collection of three-dimensional vectors, resulting in an (n-1)-dimensional array with shape (d_1, d_2, ..., d_{n-1}). One simple way to do that is to square all the elements, then sum along the last axis, and then take the square root. If x is the array, that calculation can be written np.sqrt(np.sum(x**2, axis=-1)). The following shows a few examples.
x is 1-D, with shape (3,):
In [31]: x = np.array([1.0, 2.0, 3.0])
In [32]: np.sqrt(np.sum(x**2, axis=-1))
Out[32]: 3.7416573867739413
x is 2-D, with shape (2, 3):
In [33]: x = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
In [34]: x
Out[34]:
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
In [35]: np.sqrt(np.sum(x**2, axis=-1))
Out[35]: array([ 3.74165739, 8.77496439])
x is 3-D, with shape (2, 2, 3):
In [36]: x = np.arange(1.0, 13.0).reshape(2,2,3)
In [37]: x
Out[37]:
array([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[ 7., 8., 9.],
[ 10., 11., 12.]]])
In [38]: np.sqrt(np.sum(x**2, axis=-1))
Out[38]:
array([[ 3.74165739, 8.77496439],
[ 13.92838828, 19.10497317]])
I tend to solve this is by writing
x = np.atleast_2d(x)
norm = np.sqrt(x[:,0]**2 + x[:,1]**2 + x[:,2]**2)
Matlab doesn't have 1D arrays, so b=[1 2 3] is still a 2D array and indexing with two dimensions makes sense. It can be a novel concept for you, but they're quite useful in fact (you can stop worrying whether you need to multiply by the transpose, insert a row or a column in another array...)
By the way, you could write a fancier, more general norm like this:
x = np.atleast_2d(x)
norm = np.sqrt((x**2).sum(axis=1))
The problem is that x[:,0] in Python isn't the same as in Matlab.
If you want to extract the first element in the single row vector you should go with
x[:1]
This is called a "slice". In this example it means that you take everything in the array from the first element to the element with index 1 (not included).
Remember that Python has zero-based numbering.
Another example may be:
x[0:2]
which would return the first and the second element of the array.