Implementing a PCA (Eigenvector based) in Python

Implementing a PCA (Eigenvector based) in Python - python

I try to implement a PCA in Python. My goal is to create a version which behaves similarly to Matlab's PCA implementation. However, I think I miss a crucial point as my tests partly produce a results with the wrong sign(+/-).
Can you find a mistake the algorithm? Why the signs are sometimes different?
An implementation of PCA based on eigen vectors:
new_array_rank=4
A_mean = np.mean(A, axis=0)
A = A - A_mean
covariance_matrix = np.cov(A.T)
eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)
new_index = np.argsort(eigen_values)[::-1]
eigen_vectors = eigen_vectors[:,new_index]
eigen_values = eigen_values[new_index]
eigen_vectors = eigen_vectors[:,:new_array_rank]
return np.dot(eigen_vectors.T, A.T).T
My test values:
array([[ 0.13298325, 0.2896928 , 0.53589224, 0.58164269, 0.66202221,
0.95414116, 0.03040784, 0.26290471, 0.40823539, 0.37783385],
[ 0.90521267, 0.86275498, 0.52696221, 0.15243867, 0.20894357,
0.19900414, 0.50607341, 0.53995902, 0.32014539, 0.98744942],
[ 0.87689087, 0.04307512, 0.45065793, 0.29415066, 0.04908066,
0.98635538, 0.52091338, 0.76291385, 0.97213094, 0.48815925],
[ 0.75136801, 0.85946751, 0.10508436, 0.04656418, 0.08164919,
0.88129981, 0.39666754, 0.86325704, 0.56718669, 0.76346602],
[ 0.93319721, 0.5897521 , 0.75065047, 0.63916306, 0.78810679,
0.92909485, 0.23751963, 0.87552313, 0.37663086, 0.69010429],
[ 0.53189229, 0.68984247, 0.46164066, 0.29953259, 0.10826334,
0.47944168, 0.93935082, 0.40331874, 0.18541041, 0.35594587],
[ 0.36399075, 0.00698617, 0.61030608, 0.51136309, 0.54185601,
0.81383604, 0.50003674, 0.75414875, 0.54689801, 0.9957493 ],
[ 0.27815017, 0.65417397, 0.57207255, 0.54388744, 0.89128334,
0.3512483 , 0.94441934, 0.05305929, 0.77389942, 0.93125228],
[ 0.80409485, 0.2749575 , 0.22270875, 0.91869706, 0.54683128,
0.61501493, 0.7830902 , 0.72055598, 0.09363186, 0.05103846],
[ 0.12357816, 0.29758902, 0.87807485, 0.94348706, 0.60896429,
0.33899019, 0.36310027, 0.02380186, 0.67207071, 0.28638936]])
My result of the PCA with eigen vectors:
array([[ 5.09548931e-01, -3.97079651e-01, -1.47555867e-01,
-3.55343967e-02, -4.92125732e-01, -1.78191399e-01,
-3.29543974e-02, 3.71406504e-03, 1.06404170e-01,
-1.66533454e-16],
[ -5.15879041e-01, 6.40833419e-01, -7.54601587e-02,
-2.00776798e-01, -7.07247669e-02, 2.68582368e-01,
-1.66124362e-01, 1.03414828e-01, 7.76738500e-02,
5.55111512e-17],
[ -4.42659342e-01, -5.13297786e-01, -1.65477203e-01,
5.33670847e-01, 2.00194213e-01, 2.06176265e-01,
1.31558875e-01, -2.81699724e-02, 6.19571305e-02,
-8.32667268e-17],
[ -8.50397468e-01, 5.14319846e-02, -1.46289906e-01,
6.51133920e-02, -2.83887201e-01, -1.90516618e-01,
1.45748370e-01, 9.49464768e-02, -1.05989648e-01,
4.16333634e-17],
[ -1.61040296e-01, -3.47929944e-01, -1.19871598e-01,
-6.48965493e-01, 7.53188055e-02, 1.31730340e-01,
1.33229858e-01, -1.43587499e-01, -2.20913989e-02,
-3.40005801e-16],
[ -1.70017435e-01, 4.22573148e-01, 4.81511942e-01,
2.42170125e-01, -1.18575764e-01, -6.87250591e-02,
-1.20660307e-01, -2.22865482e-01, -1.73666882e-02,
-1.52655666e-16],
[ 6.90841779e-02, -2.86233901e-01, -4.16612350e-01,
9.38935057e-03, 3.02325120e-01, -1.61783482e-01,
-3.55465509e-01, 1.15323059e-02, -5.04619674e-02,
4.71844785e-16],
[ 5.26189089e-01, 6.81324113e-01, -2.89960115e-01,
2.01781673e-02, 3.03159463e-01, -2.11777986e-01,
2.25937548e-01, -5.49219872e-05, 3.66268329e-02,
-1.11022302e-16],
[ 6.68680313e-02, -2.99715813e-01, 8.53428694e-01,
-1.30066853e-01, 2.31410283e-01, -1.02860624e-01,
1.95449586e-02, 1.30218425e-01, 1.68059569e-02,
2.22044605e-16],
[ 9.68303353e-01, 4.80944309e-02, 2.62865615e-02,
1.44821658e-01, -1.47094421e-01, 3.07366196e-01,
1.91849667e-02, 5.08517759e-02, -1.03558238e-01,
1.38777878e-16]])
Test result of the same data using Matlab's PCA function:
array([[ -5.09548931e-01, 3.97079651e-01, 1.47555867e-01,
3.55343967e-02, -4.92125732e-01, -1.78191399e-01,
-3.29543974e-02, -3.71406504e-03, -1.06404170e-01,
-0.00000000e+00],
[ 5.15879041e-01, -6.40833419e-01, 7.54601587e-02,
2.00776798e-01, -7.07247669e-02, 2.68582368e-01,
-1.66124362e-01, -1.03414828e-01, -7.76738500e-02,
-0.00000000e+00],
[ 4.42659342e-01, 5.13297786e-01, 1.65477203e-01,
-5.33670847e-01, 2.00194213e-01, 2.06176265e-01,
1.31558875e-01, 2.81699724e-02, -6.19571305e-02,
-0.00000000e+00],
[ 8.50397468e-01, -5.14319846e-02, 1.46289906e-01,
-6.51133920e-02, -2.83887201e-01, -1.90516618e-01,
1.45748370e-01, -9.49464768e-02, 1.05989648e-01,
-0.00000000e+00],
[ 1.61040296e-01, 3.47929944e-01, 1.19871598e-01,
6.48965493e-01, 7.53188055e-02, 1.31730340e-01,
1.33229858e-01, 1.43587499e-01, 2.20913989e-02,
-0.00000000e+00],
[ 1.70017435e-01, -4.22573148e-01, -4.81511942e-01,
-2.42170125e-01, -1.18575764e-01, -6.87250591e-02,
-1.20660307e-01, 2.22865482e-01, 1.73666882e-02,
-0.00000000e+00],
[ -6.90841779e-02, 2.86233901e-01, 4.16612350e-01,
-9.38935057e-03, 3.02325120e-01, -1.61783482e-01,
-3.55465509e-01, -1.15323059e-02, 5.04619674e-02,
-0.00000000e+00],
[ -5.26189089e-01, -6.81324113e-01, 2.89960115e-01,
-2.01781673e-02, 3.03159463e-01, -2.11777986e-01,
2.25937548e-01, 5.49219872e-05, -3.66268329e-02,
-0.00000000e+00],
[ -6.68680313e-02, 2.99715813e-01, -8.53428694e-01,
1.30066853e-01, 2.31410283e-01, -1.02860624e-01,
1.95449586e-02, -1.30218425e-01, -1.68059569e-02,
-0.00000000e+00],
[ -9.68303353e-01, -4.80944309e-02, -2.62865615e-02,
-1.44821658e-01, -1.47094421e-01, 3.07366196e-01,
1.91849667e-02, -5.08517759e-02, 1.03558238e-01,
-0.00000000e+00]])

The sign and other normalization choices for eigenvectors are arbitrary. Matlab and numpy norm the eigenvectors in the same way, but the sign is arbitrary and can depend on details of the linear algebra library that is used.
When I wrote the numpy equivalent of matlab's princomp, then I just normalized the sign of the eigenvectors when I compared them to those of matlab in my unit tests.

Related

Numpy dot product between a 3d matrix and 2d matrix

I have a 3d array that has shape (2, 10, 3) and a 2d array that has shape (2, 3) like this:
print(t) #2d array
Output:
[[1.003 2.32 3.11 ]
[1.214 5.32 2.13241]]
print(normal) #3d array
Output:
[[[0.69908573 0.0826756 0.84485978]
[0.51058213 0.4052637 0.5068118 ]
[0.45974276 0.25819549 0.10780089]
[0.27484999 0.33367648 0.128262 ]
[0.35963389 0.77600065 0.89393939]
[0.46937506 0.59291623 0.06620307]
[0.87603987 0.44414505 0.83394174]
[0.83186093 0.62491876 0.38160734]
[0.96819897 0.80183442 0.75102768]
[0.54182908 0.19403844 0.07925769]]
[[2.82248573 3.2341756 0.96825978]
[2.63398213 3.5567637 0.6302118 ]
[2.58314276 3.40969549 0.23120089]
[2.39824999 3.48517648 0.251662 ]
[2.48303389 3.92750065 1.01733939]
[2.59277506 3.74441623 0.18960307]
[2.99943987 3.59564505 0.95734174]
[2.95526093 3.77641876 0.50500734]
[3.09159897 3.95333442 0.87442768]
[2.66522908 3.34553844 0.20265769]]]
How can I get each row in the 2d array t to get the corresponding dot product in the 3d array normal such that the array I end up with a shape (2, 10) where each contains all 10 dot products between the nth row in 2d array and nth matrix in 3d array?
[0.62096458 0.62618459 0.37528887 0.5728386 1.19634398 0.79620507
1.997884 0.75229492 1.2236496 0.4210626 ]
[2.96347746 3.30738892 3.50596579 4.93082295 5.33811805 4.44872493
7.33480393 4.19173472 4.7406248 7.83229689]

You can use numpy.einsum (np.einsum('ijk,ik->ij', t, normal)) to get this result:
import numpy as np
normal = np.array([
[1.003,2.32,3.11],
[1.214,5.32,2.13241]
])
t = np.array([
[
[0.69908573, 0.0826756, 0.84485978],
[0.51058213, 0.4052637, 0.5068118 ],
[0.45974276, 0.25819549, 0.10780089],
[0.27484999, 0.33367648, 0.128262 ],
[0.35963389, 0.77600065, 0.89393939],
[0.46937506, 0.59291623, 0.06620307],
[0.87603987, 0.44414505, 0.83394174],
[0.83186093, 0.62491876, 0.38160734],
[0.96819897, 0.80183442, 0.75102768],
[0.54182908, 0.19403844, 0.07925769]
],
[
[2.82248573, 3.2341756, 0.96825978],
[2.63398213, 3.5567637, 0.6302118 ],
[2.58314276, 3.40969549, 0.23120089],
[2.39824999, 3.48517648, 0.251662 ],
[2.48303389, 3.92750065, 1.01733939],
[2.59277506, 3.74441623, 0.18960307],
[2.99943987, 3.59564505, 0.95734174],
[2.95526093, 3.77641876, 0.50500734],
[3.09159897, 3.95333442, 0.87442768],
[2.66522908, 3.34553844, 0.20265769]
]
])
np.einsum('ijk,ik->ij', t, normal)
This results in
array([[ 3.52050429, 3.02851036, 1.39539629, 1.44869879, 4.9411858 ,
2.05224039, 4.50264332, 3.47096686, 5.16705551, 1.24011516],
[22.69703871, 23.46350713, 21.76853041, 21.98926093, 26.07809129,
23.47223475, 24.81159677, 24.75511727, 26.64957859, 21.46600189]])
Which is the same as doing the two multiplications in order:
t[0] # normal[0]
t[1] # normal[1]
Gives the two:
array([3.52050429, 3.02851036, 1.39539629, 1.44869879, 4.9411858 ,
2.05224039, 4.50264332, 3.47096686, 5.16705551, 1.24011516])
array([22.69703871, 23.46350713, 21.76853041, 21.98926093, 26.07809129,
23.47223475, 24.81159677, 24.75511727, 26.64957859, 21.46600189])

finding eigen vectors and eigen values using np.linalg.svd()?

I am trying to find eigenvectors and eigenvalues of my covariance matrix for PCA.
My code:
values, vectors = np.linalg.eigh(covariance_matrix)
This is the output:
Eigen Vectors:
[[ 0.26199559 0.72101681 -0.37231836 0.52237162]
[-0.12413481 -0.24203288 -0.92555649 -0.26335492]
[-0.80115427 -0.14089226 -0.02109478 0.58125401]
[ 0.52354627 -0.6338014 -0.06541577 0.56561105]]
Eigen Values:
[0.02074601 0.14834223 0.92740362 2.93035378]
Then I found that np.linalg.svd() also returns the same.
U, S, V = np.linalg.svd(standardized_x.T)
print(U)
print(S)
print(V)
[[-0.52237162 -0.37231836 0.72101681 0.26199559]
[ 0.26335492 -0.92555649 -0.24203288 -0.12413481]
[-0.58125401 -0.02109478 -0.14089226 -0.80115427]
[-0.56561105 -0.06541577 -0.6338014 0.52354627]]
[20.89551896 11.75513248 4.7013819 1.75816839]
[[ 1.08374515e-01 9.98503796e-02 1.13323362e-01 ... -7.27833114e-02
-6.58701606e-02 -4.59092965e-02]
[-4.30198387e-02 5.57547718e-02 2.70926177e-02 ... -2.26960075e-02
-8.64611208e-02 1.89567788e-03]
[ 2.59377669e-02 4.83370288e-02 -1.09498919e-02 ... -3.81328738e-02
-1.98113038e-01 -1.12476331e-01]
...
[ 5.42576376e-02 5.32189412e-03 2.76010922e-02 ... 9.89545817e-01
-1.40226565e-02 -7.86338250e-04]
[ 1.60581494e-03 8.56651825e-02 1.78415121e-01 ... -1.24233079e-02
9.52228601e-01 -2.19591161e-02]
[ 2.27770498e-03 6.44405862e-03 1.49430370e-01 ... -6.58105858e-04
-2.32385318e-02 9.77215825e-01]]
The resulted U(eigenvector) is the same for both np.linalg.eigh() & svd() but S(variance/eigenvalue) values are not the same.
Am I missing something?
Can anyone explain what are U, S and V stand in np.linalg.svd() function?

Slice a 3D tensor, based on the given sequence length array in tensorflow

I want a tensorflow function, which accepts a 3D matrix and an array ( shape of the array is similar to the first dimension of a 3D matrix ) and I want to slice the elements from each 2D matrix inside the 3D matrix based on the given array. The equivalent numpy looks like as follows. The basic idea is to picking all hidden states of each input in a batch ( avoid the padded ) in a dynamic rnn
import numpy as np
a = np.random.uniform(-1,1,(3,5,7))
a_length = np.random.randint(5,size=(3))
a_tf = tf.convert_to_tensor(a)
a_length_tf = tf.convert_to_tensor(a_length)
res = []
for index, length_ in enumerate(a_length):
res.extend(a[index,:length_,:])
res = np.array(res)
Output
print(a_length)
array([1, 4, 4])
print(res)
array([[-0.060161 , 0.36000953, 0.46160677, -0.66576281, 0.28562044,
-0.60026872, 0.08034777],
[ 0.04776443, 0.38018207, -0.73352382, 0.61847258, -0.89731857,
0.57264147, -0.88192537],
[ 0.92657628, 0.6236141 , 0.41977008, 0.88720247, 0.44639323,
0.26165976, 0.2678753 ],
[-0.78125831, 0.76756136, -0.05716537, -0.64696257, 0.48918477,
0.15376225, -0.41974593],
[-0.625326 , 0.3509537 , -0.7884495 , 0.11773297, 0.23713942,
0.30296786, 0.12932378],
[ 0.88413986, -0.10958306, 0.9745586 , 0.8975006 , 0.23023047,
-0.89991669, -0.60032688],
[ 0.33462775, 0.62883724, -0.81839566, -0.70312966, -0.00246936,
-0.95542994, -0.33035891],
[-0.26355579, -0.58104982, -0.54748412, -0.30236209, -0.74270132,
0.46329941, 0.34277915],
[ 0.92837516, -0.06748299, 0.32837354, -0.62863672, 0.86226447,
0.63604586, 0.0905248 ]])
print(a)
array([[[-0.060161 , 0.36000953, 0.46160677, -0.66576281,
0.28562044, -0.60026872, 0.08034777],
[ 0.26379226, 0.67066755, -0.90139221, -0.86862163,
0.36405595, 0.71342926, -0.1265208 ],
[ 0.15007877, 0.82065234, 0.03984378, -0.20038364,
-0.09945102, 0.71605241, -0.55865999],
[ 0.27132257, -0.84289149, -0.15493576, 0.74683429,
-0.71159896, 0.50397217, -0.99025404],
[ 0.51546368, 0.45460343, 0.87519031, 0.0332339 ,
-0.53474897, -0.01733648, -0.02886814]],
[[ 0.04776443, 0.38018207, -0.73352382, 0.61847258,
-0.89731857, 0.57264147, -0.88192537],
[ 0.92657628, 0.6236141 , 0.41977008, 0.88720247,
0.44639323, 0.26165976, 0.2678753 ],
[-0.78125831, 0.76756136, -0.05716537, -0.64696257,
0.48918477, 0.15376225, -0.41974593],
[-0.625326 , 0.3509537 , -0.7884495 , 0.11773297,
0.23713942, 0.30296786, 0.12932378],
[ 0.44550219, -0.38828221, 0.35684203, 0.789946 ,
-0.8763921 , 0.90155917, -0.75549455]],
[[ 0.88413986, -0.10958306, 0.9745586 , 0.8975006 ,
0.23023047, -0.89991669, -0.60032688],
[ 0.33462775, 0.62883724, -0.81839566, -0.70312966,
-0.00246936, -0.95542994, -0.33035891],
[-0.26355579, -0.58104982, -0.54748412, -0.30236209,
-0.74270132, 0.46329941, 0.34277915],
[ 0.92837516, -0.06748299, 0.32837354, -0.62863672,
0.86226447, 0.63604586, 0.0905248 ],
[ 0.70272633, 0.17122912, -0.58209965, 0.55557024,
-0.46295566, -0.33845157, -0.62254313]]])

Here is a way to do that using tf.boolean_mask:
import tensorflow as tf
import numpy as np
# NumPy/Python implementation
a = np.random.uniform(-1,1,(3,5,7)).astype(np.float32)
a_length = np.random.randint(5,size=(3)).astype(np.int32)
res = []
for index, length_ in enumerate(a_length):
res.extend(a[index,:length_,:])
res = np.array(res)
# TensorFlow implementation
a_tf = tf.convert_to_tensor(a)
a_length_tf = tf.convert_to_tensor(a_length)
# Make a mask for all wanted elements
mask = tf.range(tf.shape(a)[1]) < a_length_tf[:, tf.newaxis]
# Apply mask
res_tf = tf.boolean_mask(a_tf, mask)
# Test
with tf.Session() as sess:
print(np.allclose(sess.run(res_tf), res))
Output:
True

mod.predict gives more columns than expected

I am using MXNet on IRIS dataset which has 4 features and it classifies the flowers as -'setosa', 'versicolor', 'virginica'. My training data has 89 rows. My label data is a row vector of 89 columns. I encoded the flower names into number -0,1,2 as it seems mx.io.NDArrayIter does not accept numpy ndarray with string values. Then I tried to predict using
re = mod.predict(test_iter)
I get a result which has the shape 14 * 10.
Why am I getting 10 columns when I have only 3 labels and how do I map these results to my labels. The result of predict is shown below:
[[ 0.11760861 0.12082944 0.1207106 0.09154381 0.09155304 0.09155869
0.09154817 0.09155204 0.09154914 0.09154641] [ 0.1176083 0.12082954 0.12071151 0.09154379 0.09155323 0.09155825
0.0915481 0.09155164 0.09154923 0.09154641] [ 0.11760829 0.1208293 0.12071083 0.09154385 0.09155313 0.09155875
0.09154838 0.09155186 0.09154932 0.09154625] [ 0.11760861 0.12082901 0.12071037 0.09154388 0.09155303 0.09155875
0.09154829 0.09155209 0.09154959 0.09154641] [ 0.11760896 0.12082863 0.12070955 0.09154405 0.09155299 0.09155875
0.09154839 0.09155225 0.09154996 0.09154646] [ 0.1176089 0.1208287 0.1207095 0.09154407 0.09155297 0.09155882
0.09154844 0.09155232 0.09154989 0.0915464 ] [ 0.11760896 0.12082864 0.12070941 0.09154408 0.09155297 0.09155882
0.09154844 0.09155234 0.09154993 0.09154642] [ 0.1176088 0.12082874 0.12070983 0.09154399 0.09155302 0.09155872
0.09154837 0.09155215 0.09154984 0.09154641] [ 0.11760852 0.12082904 0.12071032 0.09154394 0.09155304 0.09155876
0.09154835 0.09155209 0.09154959 0.09154631] [ 0.11760963 0.12082832 0.12070873 0.09154428 0.09155257 0.09155893
0.09154856 0.09155177 0.09155051 0.09154671] [ 0.11760966 0.12082829 0.12070868 0.09154429 0.09155258 0.09155892
0.09154858 0.0915518 0.09155052 0.09154672] [ 0.11760949 0.1208282 0.12070852 0.09154446 0.09155259 0.09155893
0.09154854 0.09155205 0.0915506 0.09154666] [ 0.11760952 0.12082817 0.12070853 0.0915444 0.09155261 0.09155891
0.09154853 0.09155206 0.09155057 0.09154668] [ 0.1176096 0.1208283 0.12070892 0.09154423 0.09155267 0.09155882
0.09154859 0.09155172 0.09155044 0.09154676]]

Using "y = mod.predict(val_iter,num_batch=1)" instead of "y = mod.predict(val_iter)", then you can get only one batch labels. For example,if you batch_size is 10, then you will only get the 10 labels.

Iterated Interpolation: First interpolate grids, then interpolate value

I want to interpolate from x onto z. But there's a caveat:
Depending on a state y, I have a different xGrid - which i need to interpolate.
I have a grid for y, yGrid. Say yGrid=[0,1]. And xGrid is given by
1 10
2 20
3 30
The corresponding zGrid, is
100 1000
200 2000
300 3000
This means that for y=0, [1,2,3] is the proper grid for x, and for y=1, [10,20,30] is the proper grid. And similar for z.
Everything is linear and even-spaced for demonstration of the problem, but it is not in the actual data.
In words,
if y=0, x=1.5, z is the interpolation of [1,2,3] onto [100, 200, 300] at 1.5 - which is 150.
If y=1, x=10, z=1000
Here's the problem: What if is y=0.5? In this simple case, I want the interpolated grids to be [5.5, 11, 33/2] and [550, 1100, 1650], so x=10 would be something close to 1000.
It appears to me, that I need to interpolate 3 times:
twice to get the correct xGrid, and zGrid, and
once to interpolate xGrid-> xGrid
This is part of a bottleneck and efficiency is vital. How do I code this most efficiently?
Here is how I can code it quite inefficiently:
xGrid = np.array([[1, 10], [2, 20], [3, 30]])
zGrid = np.array([[100, 1000], [200, 2000], [300, 3000]])
yGrid = np.array([0, 1])
yValue = 0.5
xInterpolated = np.zeros(xGrid.shape[0])
zInterpolated = np.zeros(zGrid.shape[0])
for i in np.arange(xGrid.shape[0]):
f1 = interpolate.interp1d(pGrid, xGrid[i,:])
f2 = interpolate.interp1d(pGrid, zGrid[i,:])
xInterpolated[i] = f1(yValue)
zInterpolated[i] = f2(yValue)
f3 = interpolate.interp1d(xInterpolated, zInterpolated)
And the output is
In[73]: xInterpolated, zInterpolated
Out[73]: (array([ 5.5, 11. , 16.5]), array([ 550., 1100., 1650.]))
In[75]: f3(10)
Out[75]: array(1000.0)
Actual use-case data
xGrid:
array([[ 0.30213582, 0.42091889, 0.48596506, 0.55045007,
0.61479495, 0.67906768, 0.74328653, 0.8074641 ,
0.8716093 , 0.93572867, 0.99982708, 1.06390825,
1.12797508, 1.19202984, 1.25607435, 1.32011008,
1.38413823, 1.44815978, 1.51217558, 1.57618631],
[ 1.09945362, 1.17100971, 1.23588956, 1.30034354,
1.36467675, 1.42894086, 1.49315319, 1.55732567,
1.62146685, 1.68558297, 1.74967873, 1.8137577 ,
1.87782269, 1.94187589, 2.00591907, 2.06995365,
2.1339808 , 2.1980015 , 2.26201653, 2.32602659],
[ 1.96474476, 2.03281806, 2.09757883, 2.16200519,
2.22632562, 2.29058026, 2.35478537, 2.41895223,
2.48308893, 2.54720144, 2.61129424, 2.67537076,
2.73943368, 2.80348513, 2.86752681, 2.93156011,
2.99558615, 3.05960586, 3.12362004, 3.18762935],
[ 2.97271432, 3.03917779, 3.10382629, 3.16822546,
3.23253177, 3.29677589, 3.36097295, 3.42513351,
3.48926519, 3.55337363, 3.61746308, 3.68153682,
3.74559741, 3.80964688, 3.87368686, 3.93771869,
4.00174345, 4.06576206, 4.12977526, 4.1937837 ],
[ 4.17324037, 4.23880534, 4.30336811, 4.36773934,
4.43202986, 4.49626215, 4.56045011, 4.62460351,
4.68872947, 4.75283326, 4.81691888, 4.88098942,
4.94504732, 5.0090945 , 5.07313252, 5.13716266,
5.20118595, 5.26520326, 5.32921533, 5.39322276],
[ 5.64337535, 5.70841895, 5.77290336, 5.83724805,
5.90152063, 5.96573939, 6.02991687, 6.094062 ,
6.15818132, 6.22227969, 6.28636083, 6.35042763,
6.41448236, 6.47852685, 6.54256256, 6.60659069,
6.67061223, 6.73462802, 6.79863874, 6.86264497],
[ 7.51378714, 7.57851747, 7.6429358 , 7.70725236,
7.77150412, 7.83570702, 7.89987216, 7.9640075 ,
8.0281189 , 8.09221078, 8.15628654, 8.22034883,
8.28439974, 8.34844097, 8.41247386, 8.47649955,
8.54051897, 8.60453289, 8.66854195, 8.73254673],
[ 10.03324294, 10.09777483, 10.162134 , 10.22641722,
10.29064401, 10.35482771, 10.41897777, 10.48310105,
10.54720264, 10.61128646, 10.67535549, 10.73941211,
10.80345821, 10.8674953 , 10.93152463, 10.99554722,
11.05956392, 11.12357544, 11.1875824 , 11.25158529],
[ 13.77079831, 13.83519161, 13.89949459, 13.96373623,
14.02793138, 14.09209044, 14.15622093, 14.2203284 ,
14.28441705, 14.34849012, 14.41255015, 14.47659914,
14.54063872, 14.6046702 , 14.66869465, 14.73271299,
14.79672596, 14.86073419, 14.92473821, 14.9887385 ],
[ 20.60440125, 20.66868421, 20.7329108 , 20.79709436,
20.8612443 , 20.92536747, 20.98946899, 21.05355274,
21.11762172, 21.1816783 , 21.24572435, 21.30976141,
21.37379071, 21.43781328, 21.50182995, 21.56584146,
21.6298484 , 21.69385127, 21.75785053, 21.82184654]])
zGrid:
array([[ 0.30213582, 0.42091889, 0.48596506, 0.55045007, 0.61479495,
0.67906768, 0.74328653, 0.8074641 , 0.8716093 , 0.93572867,
0.99982708, 1.06390825, 1.12797508, 1.19202984, 1.25607435,
1.32011008, 1.38413823, 1.44815978, 1.51217558, 1.57618631],
[ 0.35871288, 0.43026897, 0.49514882, 0.5596028 , 0.62393601,
0.68820012, 0.75241245, 0.81658493, 0.88072611, 0.94484223,
1.00893799, 1.07301696, 1.13708195, 1.20113515, 1.26517833,
1.32921291, 1.39324006, 1.45726076, 1.52127579, 1.58528585],
[ 0.37285697, 0.44093027, 0.50569104, 0.5701174 , 0.63443782,
0.69869247, 0.76289758, 0.82706444, 0.89120114, 0.95531365,
1.01940644, 1.08348296, 1.14754589, 1.21159734, 1.27563902,
1.33967232, 1.40369835, 1.46771807, 1.53173225, 1.59574155],
[ 0.38688189, 0.45334537, 0.51799386, 0.58239303, 0.64669934,
0.71094347, 0.77514053, 0.83930108, 0.90343277, 0.96754121,
1.03163066, 1.0957044 , 1.15976498, 1.22381445, 1.28785443,
1.35188626, 1.41591103, 1.47992963, 1.54394284, 1.60795127],
[ 0.40252392, 0.46808889, 0.53265166, 0.59702289, 0.66131341,
0.7255457 , 0.78973366, 0.85388706, 0.91801302, 0.98211681,
1.04620243, 1.11027297, 1.17433087, 1.23837805, 1.30241607,
1.36644621, 1.4304695 , 1.49448681, 1.55849888, 1.62250631],
[ 0.42106765, 0.48611125, 0.55059566, 0.61494035, 0.67921293,
0.74343169, 0.80760917, 0.87175431, 0.93587362, 0.99997199,
1.06405313, 1.12811993, 1.19217466, 1.25621915, 1.32025486,
1.38428299, 1.44830454, 1.51232032, 1.57633104, 1.64033728],
[ 0.4442679 , 0.50899823, 0.57341657, 0.63773312, 0.70198488,
0.76618779, 0.83035293, 0.89448826, 0.95859966, 1.02269154,
1.08676731, 1.15082959, 1.21488051, 1.27892173, 1.34295463,
1.40698032, 1.47099973, 1.53501365, 1.59902272, 1.66302749],
[ 0.47525152, 0.53978341, 0.60414258, 0.6684258 , 0.73265259,
0.79683629, 0.86098635, 0.92510963, 0.98921122, 1.05329504,
1.11736407, 1.18142069, 1.24546679, 1.30950388, 1.37353321,
1.4375558 , 1.5015725 , 1.56558403, 1.62959098, 1.69359387],
[ 0.52099935, 0.58539265, 0.64969564, 0.71393728, 0.77813242,
0.84229149, 0.90642197, 0.97052944, 1.03461809, 1.09869116,
1.16275119, 1.22680018, 1.29083976, 1.35487124, 1.4188957 ,
1.48291403, 1.546927 , 1.61093523, 1.67493926, 1.73893954],
[ 0.60440125, 0.66868421, 0.7329108 , 0.79709436, 0.8612443 ,
0.92536747, 0.98946899, 1.05355274, 1.11762172, 1.1816783 ,
1.24572435, 1.30976141, 1.37379071, 1.43781328, 1.50182995,
1.56584146, 1.6298484 , 1.69385127, 1.75785053, 1.82184654]])
yGrid:
array([ 1. , 6.21052632, 11.42105263, 16.63157895,
21.84210526, 27.05263158, 32.26315789, 37.47368421,
42.68421053, 47.89473684, 53.10526316, 58.31578947,
63.52631579, 68.73684211, 73.94736842, 79.15789474,
84.36842105, 89.57894737, 94.78947368, 100. ])
I've created the interpolater following the given answer, and then interpolated some points:
yGrid = yGrid + np.zeros(xGrid.shape)
f3 = interpolate.interp2d(xGrid,yGrid,zGrid,kind='linear')
import matplotlib.pyplot as plt
plt.plot(np.linspace(0.001, 5, 100), [f3(y, 2) for y in np.linspace(0.001, 5, 100)])
plt.plot(xGrid[:, 1], zGrid[:, 1])
plt.plot(xGrid[:, 0], zGrid[:, 0])
And here's the output:
The blue line is the interpolated one. I am worried that for very small values of x, it should be tilted downwards a bit (following the weighted average of the two functions), but it is not at all.

You're actually looking at 2d interpolation: you need z(x,y) with interpolated values of x and y. The only subtlety is that you need to broadcast yGrid to have the same shape as the x and z data:
import scipy.interpolate as interpolate
xGrid = np.array([[1, 10], [2, 20], [3, 30]])
zGrid = np.array([[100, 1000], [200, 2000], [300, 3000]])
yGrid = np.array([0, 1]) + np.zeros(xGrid.shape)
yValue = 0.5
f3 = interpolate.interp2d(xGrid,yGrid,zGrid,kind='linear')
This is a bivariate function, you can call it as
In [372]: f3(10,yValue)
Out[372]: array([ 1000.])
You can turn it into a univariate function returning a scalar by using a lambda:
f4 = lambda x,y=yValue: f3(x,y)[0]
this will return a single value for your (assumedly) single y value, which is set to be yValue at the moment of the lambda definition. Use it like so:
In [376]: f4(10)
Out[376]: 1000.0
However, the general f3 function might be more suited to your problem, as you can dynamically change the value of y according to your needs, and can use array input to obtain array output for z.
Update
For oddly shaped x,y data, interp2d might give unsatisfactory results, especially at the borders of the grid. So another approach is using interpolate.LinearNDInterpolator instead, which is based on a triangulation of your input data, inherently giving a local piecewise linear approximation
f4 = interpolate.LinearNDInterpolator((xGrid.flatten(),yGrid.flatten()),zGrid.flatten())
With your update data set:
plt.figure()
plt.plot(np.linspace(0.001, 5, 100), f4(np.linspace(0.001, 5, 100), 2))
plt.plot(xGrid[:, 0], zGrid[:, 0])
plt.plot(xGrid[:, 1], zGrid[:, 1])
Note that this interpolation also has its drawbacks. I suggest plotting both interpolated functions as a surface and looking at how they are distorted compared to your original data:
from mpl_toolkits.mplot3d import Axes3D
xx,yy=(np.linspace(0,10,20),np.linspace(0,20,40))
xxs,yys=np.meshgrid(xx,yy)
zz3=f3(xx,yy) #from interp2d
zz4=f4(xxs,yys) #from LinearNDInterpolator
#plot raw data
hf=plt.figure()
ax=hf.add_subplot(111,projection='3d')
ax.plot_surface(xGrid,yGrid,zGrid,rstride=1,cstride=1)
plt.draw()
#plot interp2d case
hf=plt.figure()
ax=hf.add_subplot(111,projection='3d')
ax.plot_surface(xxs,yys,zz3,rstride=1,cstride=1)
plt.draw()
#plot LinearNDInterpolator case
hf=plt.figure()
ax=hf.add_subplot(111,projection='3d')
ax.plot_surface(xxs,yys,f4(xxs,yys),rstride=1,cstride=1)
plt.draw()
This will allow you to rotate the surfaces around and see what kind of artifacts they contain (with an appropriate backend).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Implementing a PCA (Eigenvector based) in Python - python

Related

Numpy dot product between a 3d matrix and 2d matrix

finding eigen vectors and eigen values using np.linalg.svd()?

Slice a 3D tensor, based on the given sequence length array in tensorflow

mod.predict gives more columns than expected

Iterated Interpolation: First interpolate grids, then interpolate value

Categories

Resources