Python - construction of meshgrid (irregular grid) with numpy - python

My aim is to interpolate some data. To do that i have to create a meshgrid.
To do this step, i got an array with my 2D coordinate "coord" (first column : element number, second : X and third : Y).
I do a meshgrid with np.meshgrid as you can see below.
But my results seem to be strange, so i would like to know if i have done
a mistake...Must i have to reorganize my data before meshgrid step?
import numpy as np
coord = np.array([[ 1. , -1.38888667, -1.94444333],
[ 2. , -1.94444333, -1.38888667],
[ 3. , 0.27777667, -1.94444333],
[ 4. , -0.27777667, -1.38888667],
[ 5. , 1.94444333, -1.94444333],
[ 6. , 1.38888667, -1.38888667],
[ 7. , -1.38888667, -0.27777667],
[ 8. , -1.94444333, 0.27777667],
[ 9. , 0.27777667, -0.27777667],
[ 10. , -0.27777667, 0.27777667],
[ 11. , 1.94444333, -0.27777667],
[ 12. , 1.38888667, 0.27777667],
[ 13. , -1.38888667, 1.38888667],
[ 14. , -1.94444333, 1.94444333],
[ 15. , 0.27777667, 1.38888667],
[ 16. , -0.27777667, 1.94444333],
[ 17. , 1.94444333, 1.38888667],
[ 18. , 1.38888667, 1.94444333]])
[Y,X]=np.meshgrid(coord[:,2],coord[:,1])
If i plot Y, i got that :
plt.imshow(Y);plt.colorbar();plt.show()
---- EDIT LATER -----
I m wondering (for example) if the coordinates with meshgrid have to be strictly increasing? if there is a better way when i have some coordinates not organized?
For the interpolation, i would like to use :
def interpolate(values, tri,uv,d=2):
simplex = tri.find_simplex(uv)
vertices = np.take(tri.simplices, simplex, axis=0)
temp = np.take(tri.transform, simplex, axis=0)
delta = uv- temp[:, d]
bary = np.einsum('njk,nk->nj', temp[:, :d, :], delta)
return np.einsum('nj,nj->n', np.take(values, vertices), np.hstack((bary, 1.0 - bary.sum(axis=1, keepdims=True))))
which was used in Stack before Speedup scipy griddata for multiple interpolations between two irregular grids allowing to limit the calculation time

Related

Different results on using function and it's content

I was trying to understand the working of the function fast_knn of impyute library. So, I tried to execute it line by line in order to understand the working. Here it is:
import numpy as np
from scipy.spatial import KDTree
def shepards(distances, power=2):
return to_percentage(1/np.power(distances, power))
def to_percentage(vec):
return vec/np.sum(vec)
data_temp = np.arange(25).reshape((5, 5)).astype(np.float)
data_temp[0][2] = np.nan
k=4
eps=0
p=2
distance_upper_bound=np.inf
leafsize=10
idw_fn=shepards
init_impute_fn=mean
nan_xy = np.argwhere(np.isnan(data_temp))
data_temp_c = init_impute_fn(data_temp)
kdtree = KDTree(data_temp_c, leafsize=leafsize)
for x_i, y_i in nan_xy:
distances, indices = kdtree.query(data_temp_c[x_i], k=k+1, eps=eps,
p=p, distance_upper_bound=distance_upper_bound)
# Will always return itself in the first index. Delete it.
distances, indices = distances[1:], indices[1:]
# Add small constant to distances to avoid division by 0
distances += 1e-3
weights = idw_fn(distances)
# Assign missing value the weighted average of `k` nearest neighbours
data_temp[x_i][y_i] = np.dot(weights, [data_temp_c[ind][y_i] for ind in indices])
data_temp
This outputs:
array([[ 0. , 1. , 10.06569379, 3. , 4. ],
[ 5. , 6. , 7. , 8. , 9. ],
[10. , 11. , 12. , 13. , 14. ],
[15. , 16. , 17. , 18. , 19. ],
[20. , 21. , 22. , 23. , 24. ]])
whereas the function has a different output. The code :
from impyute import fast_knn
import numpy as np
data_temp = np.arange(25).reshape((5, 5)).astype(np.float)
data_temp[0][2] = np.nan
fast_knn(data_temp, k=4)
and the output
array([[ 0. , 1. , 16.78451885, 3. , 4. ],
[ 5. , 6. , 7. , 8. , 9. ],
[10. , 11. , 12. , 13. , 14. ],
[15. , 16. , 17. , 18. , 19. ],
[20. , 21. , 22. , 23. , 24. ]])
``
There seems to be discrepancies with the GitHub repository code and library source code ( the repository has not been updated). The following is the library source code :
def fast_knn(data, k=3, eps=0, p=2, distance_upper_bound=np.inf, leafsize=10, **kwargs):
null_xy = find_null(data)
data_c = mean(data)
kdtree = KDTree(data_c, leafsize=leafsize)
for x_i, y_i in null_xy:
distances, indices = kdtree.query(data_c[x_i], k=k+1, eps=eps,
p=p, distance_upper_bound=distance_upper_bound)
# Will always return itself in the first index. Delete it.
distances, indices = distances[1:], indices[1:]
weights = distances/np.sum(distances)
# Assign missing value the weighted average of `k` nearest neighbours
data[x_i][y_i] = np.dot(weights, [data_c[ind][y_i] for ind in indices])
return data
The weights are computed in a different manner (not using the shepards function). Hence, the difference in outputs.
Maybe you used the code on the current master branch of impyute. But the impyute package version you used maybe v0.0.8 — the current recent version — whose code is at the release/0.0.8 branch.
The difference in the definition of fast_knn is below.
On the current master branch:
# Will always return itself in the first index. Delete it.
distances, indices = distances[1:], indices[1:]
# Add small constant to distances to avoid division by 0
distances += 1e-3
weights = idw_fn(distances)
On release/0.0.8 branch:
# Will always return itself in the first index. Delete it.
distances, indices = distances[1:], indices[1:]
weights = distances/np.sum(distances)
If you use the code in the release/0.0.8 branch, you will get the same result as you use the impyute package.

Vectorize an index-based matrix operation in numpy

How can I vectorize the following loop?
def my_fnc():
m = np.arange(27.).reshape((3,3,3))
ret = np.empty_like(m)
it = np.nditer(m, flags=['multi_index'])
for x in it:
i,j,k = it.multi_index
ret[i,j,k] = x / m[i,j,i]
return ret
Basically I'm dividing each value in m by something similar to a diagonal. Not all values in m will be different, the arange is just an example.
Thanks in advance! ~
P.S.: here's the output of the function above, don't mind the nans :)
array([[[ nan, inf, inf],
[ 1. , 1.33333333, 1.66666667],
[ 1. , 1.16666667, 1.33333333]],
[[ 0.9 , 1. , 1.1 ],
[ 0.92307692, 1. , 1.07692308],
[ 0.9375 , 1. , 1.0625 ]],
[[ 0.9 , 0.95 , 1. ],
[ 0.91304348, 0.95652174, 1. ],
[ 0.92307692, 0.96153846, 1. ]]])
Use advanced-indexing to get the m[i,j,i] equivalent in one go and then simply divide input array by it -
r = np.arange(len(m))
ret = m/m[r,:,r,None] # Add new axis with None to allow for broadcasting

Numpy calculate gradients accross matrices

I am using the following to calculate the running gradients between data in the same indexes across multiple matrices:
import numpy as np
array_1 = np.array([[1,2,3], [4,5,6]])
array_2 = np.array([[2,3,4], [5,6,7]])
array_3 = np.array([[1,8,9], [9,6,7]])
flat_1 = array_1.flatten()
flat_2 = array_2.flatten()
flat_3 = array_3.flatten()
print('flat_1: {0}'.format(flat_1))
print('flat_2: {0}'.format(flat_2))
print('flat_3: {0}'.format(flat_3))
data = []
gradient_list = []
for item in zip(flat_1,flat_2,flat_3):
data.append(list(item))
print('items: {0}'.format(list(item)))
grads = np.gradient(list(item))
print('grads: {0}'.format(grads))
gradient_list.append(grads)
grad_array=np.array(gradient_list)
print('grad_array: {0}'.format(grad_array))
This doesn't look like an optimal way of doing this - is there a vectorized way of calculating gradients between data in 2d arrays?
numpy.gradient takes axis as parameter, so you might just stack the arrays, and then calcualte the gradient along a certain axis; For instance, use np.dstack with axis=2; If you need a different shape as result, just use reshape method:
np.gradient(np.dstack((array_1, array_2, array_3)), axis=2)
#array([[[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ]],
# [[ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]]])
Or if flatten the arrays first:
np.gradient(np.column_stack((array_1.ravel(), array_2.ravel(), array_3.ravel())), axis=1)
#array([[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ],
# [ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]])

Building histogram from a dict without having to iterate over the keys

I have a dict containing numpy array of varying length:
MyDcit= {0:array([[ 15. , 3.89678216],
[ 36. , 9.49245167],
[ 53. , 3.82997799],
[ 83. , 5.25727272],
[ 86. , 8.76663208]]),
1:array([[ 4. , 4.1171155 ],
[ 16. , 12.68122196],
[ 31. , 8.64805222],
[ 37. , 6.07202959]]),
2:array([]),...,
90:array([[ 1. , 1. ],
[ 24. , 8.14221573],
[ 27. , 7.36309862]])}
I would like to obtain an histogram of all the values in the dict. The solution I have now is to iterate over the keys in the dict and fill a numpy array with an histogram of fixed length:
for KeysElements in MyDict.keys():
hist,bins = numpy.histogram(np.asarray(MyDict[KeysElements])[:,1],50)
numpy_hist[KeysElements,:] = hist
I then sum up all the histograms over the fist dimension of the numpy array to obtain the histogram of all the keys of the initial dict:
Total_hist = numpy.sum(numpy_hist,axis=0)
The problems with this solutions is that I do not knwo how to handle the bins which change for each iteration, so my question is: are there any possibilities to achieve this without having to built histograms in a loop?
Thanks for any advices or links.
Greg
You don't seem to use the MyDict index values or the 0th values in the 2nd axis of your np arrays. If this is the case then you could add all the numpy arrays together and do the histogram on that
import numpy as np
MyDict = {0:np.array([[ 15. , 3.89678216],
[ 36. , 9.49245167],
[ 53. , 3.82997799],
[ 83. , 5.25727272],
[ 86. , 8.76663208]]),
1:np.array([[ 4. , 4.1171155 ],
[ 16. , 12.68122196],
[ 31. , 8.64805222],
[ 37. , 6.07202959]]),
2:np.array([]),
90:np.array([[ 1. , 1. ],
[ 24. , 8.14221573],
[ 27. , 7.36309862]])}
np_array = np.array([]).reshape(0,2)
for i in MyDict:
a = MyDict[i]
if len(a.shape) == 2 and a.shape[1] == 2:
np_array = np.append(np_array, MyDict[i], axis=0)
print(np.histogram(np_array, 50))

Difference between scipy pairwise distance and X.X+Y.Y - X.Y^t

Let's imagine we have data as
d1 = np.random.uniform(low=0, high=2, size=(3,2))
d2 = np.random.uniform(low=3, high=5, size=(3,2))
X = np.vstack((d1,d2))
X
array([[ 1.4930674 , 1.64890721],
[ 0.40456265, 0.62262546],
[ 0.86893397, 1.3590808 ],
[ 4.04177045, 4.40938126],
[ 3.01396153, 4.60005842],
[ 3.2144552 , 4.65539323]])
I want to compare two methods for generating the pairwise distances:
assuming that X and Y are the same:
(X-Y)^2 = X.X + Y.Y - 2*X.Y^t
Here is the first method as it is used in scikit-learn for computing the pairwise distance, and later for kernel matrix.
import numpy as np
def cal_pdist1(X):
Y = X
XX = np.einsum('ij,ij->i', X, X)[np.newaxis, :]
YY = XX.T
distances = -2*np.dot(X, Y.T)
distances += XX
distances += YY
return(distances)
cal_pdist1(X)
array([[ 0. , 2.2380968 , 0.47354188, 14.11610424,
11.02241244, 12.00213414],
[ 2.2380968 , 0. , 0.75800718, 27.56880003,
22.62893544, 24.15871196],
[ 0.47354188, 0.75800718, 0. , 19.37122424,
15.1050792 , 16.36714548],
[ 14.11610424, 27.56880003, 19.37122424, 0. ,
1.09274896, 0.74497242],
[ 11.02241244, 22.62893544, 15.1050792 , 1.09274896,
0. , 0.04325965],
[ 12.00213414, 24.15871196, 16.36714548, 0.74497242,
0.04325965, 0. ]])
Now, if I use scipy pairwise distance function as below, I get
import scipy, scipy.spatial
pd_sparse = scipy.spatial.distance.pdist(X, metric='seuclidean')
scipy.spatial.distance.squareform(pd_sparse)
array([[ 0. , 0.92916653, 0.45646989, 2.29444795, 1.89740167,
2.00059442],
[ 0.92916653, 0. , 0.50798432, 3.22211357, 2.78788236,
2.90062103],
[ 0.45646989, 0.50798432, 0. , 2.72720831, 2.28001564,
2.39338343],
[ 2.29444795, 3.22211357, 2.72720831, 0. , 0.71411943,
0.58399694],
[ 1.89740167, 2.78788236, 2.28001564, 0.71411943, 0. ,
0.14102567],
[ 2.00059442, 2.90062103, 2.39338343, 0.58399694, 0.14102567,
0. ]])
The results are completely different! Shouldn't they be the same?
pdist(..., metric='seuclidean') computes the standardized Euclidean distance, not the squared Euclidean distance (which is what cal_pdist returns).
From the docs:
Y = pdist(X, 'seuclidean', V=None)
Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors u and v is
__________________
√∑(ui−vi)^2 / V[xi]
V is the variance vector; V[i] is the variance computed over all the i’th components of the points. If not passed, it is automatically computed.
Try passing metric='sqeuclidean', and you will see that both functions return the same result to within rounding error.

Categories