*Update* Creating an array for distance between two 2-D arrays - python

So I have two arrays that have x, y, z coordinates. I'm just trying to apply the 3D distance formula. Problem is, that I can't find a post that constitutes arrays with multiple values in each column and spits out an array.
print MW_FirstsubPos1
[[ 51618.7265625 106197.7578125 69647.6484375 ]
[ 33864.1953125 11757.29882812 11849.90332031]
[ 12750.09863281 58954.91015625 38067.0859375 ]
...,
[ 99002.6640625 96021.0546875 18798.44726562]
[ 27180.83984375 74350.421875 78075.78125 ]
[ 19297.88476562 82161.140625 1204.53503418]]
print MW_SecondsubPos1
[[ 51850.9140625 106004.0078125 69536.5234375 ]
[ 33989.9375 11847.11425781 12255.80859375]
[ 12526.203125 58372.3046875 37641.34765625]
...,
[ 98823.2734375 95837.1796875 18758.7734375 ]
[ 27047.19140625 74242.859375 78166.703125 ]
[ 19353.97851562 82375.8515625 1147.07556152]]
Yes, they are the same shape.
My attempt,
import numpy as np
xs1,ys1,zs1 = zip(*MW_FirstsubPos1)
xs11,ys11,zs11 = zip(*MW_SecondsubPos1)
squared_dist1 = (xs11 - xs1)**2 + (ys11 - ys1)**2 + (zs11 - zs1)**2
dist1 = np.sqrt(squared_dist1)
print dist1
This returns:
TypeError: unsupported operand type(s) for -: 'tuple' and 'tuple'
I'm just wanting to return a 1-D array of the same shape.
* --------------------- Update --------------------- *
Using what Sнаđошƒаӽ said,
Distance1 = []
for Fir1, Sec1 in zip(MW_FirstsubVel1, MW_SecondsubPos1):
dist1 = 0
for i in range(3):
dist1 += (Fir1[i]-Sec1[i])**2
Distance1.append(dist1**0.5)
But when comparing the distance formula for one element in my original post such as,
squared_dist1 = (xs11[0] - xs1[0])**2 + (ys11[0] - ys1[0])**2 + (zs11[0] - zs1[0])**2
dist1 = np.sqrt(squared_dist1)
print dist1
returns 322.178309762
while
result = []
for a, b in zip(MW_FirstsubVel1, MW_SecondsubPos1):
dist = 0
for i in range(3):
dist += (a[i]-b[i])**2
result.append(dist**0.5)
print result[0]
returns 137163.203004
What's wrong here?

Your solutions look good to me.
A better idea is to use the linear algebra module in scipy package, as it scales with multiple dimensional data. Here are my codes.
import scipy.linalg as LA
dist1 = LA.norm(MW_FirstsubPos1 - MW_SecondsubPos1, axis=1)

See if this works, assuming that aaa and bbb are normal python list of lists having the x, y and z coordinates (or that you can convert to such, using tolist or something like that perhaps). result will have the 1-D array you are looking for.
Edit: aaa and bbb are python lists of lists. Only code for printing the output have been added.
aaa = [[51618.7265625, 106197.7578125, 69647.6484375],
[33864.1953125, 11757.29882812, 11849.90332031],
[12750.09863281, 58954.91015625, 38067.0859375],
[99002.6640625, 96021.0546875, 18798.44726562],
[27180.83984375, 74350.421875, 78075.78125],
[19297.88476562, 82161.140625, 1204.53503418]]
bbb = [[51850.9140625, 106004.0078125, 69536.5234375],
[33989.9375, 11847.11425781, 12255.80859375],
[12526.203125, 58372.3046875, 37641.34765625],
[98823.2734375, 95837.1796875, 18758.7734375],
[27047.19140625, 74242.859375, 78166.703125],
[19353.97851562, 82375.8515625, 1147.07556152]]
result = []
for a, b in zip(aaa, bbb):
dist = 0
for i in range(3):
dist += (a[i]-b[i])**2
result.append(dist**0.5)
for elem in result:
print(elem)
Output:
322.178309762234
434.32361222259755
755.5206249710258
259.9327309143388
194.16071591842936
229.23543894772612

Here's a vectorized approach using np.einsum -
diffs = MW_FirstsubPos1 - MW_SecondsubPos1
dists = np.sqrt(np.einsum('ij,ij->i',diffs,diffs))
Sample run -
In [233]: MW_FirstsubPos1
Out[233]:
array([[2, 0, 0],
[8, 6, 1],
[0, 2, 8],
[7, 6, 3],
[3, 1, 7]])
In [234]: MW_SecondsubPos1
Out[234]:
array([[3, 4, 7],
[0, 8, 4],
[4, 7, 4],
[2, 5, 6],
[5, 0, 6]])
In [235]: diffs = MW_FirstsubPos1 - MW_SecondsubPos1
In [236]: np.sqrt(np.einsum('ij,ij->i',diffs,diffs))
Out[236]: array([ 8.1240384 , 8.77496439, 7.54983444, 5.91607978, 2.44948974])

Related

How to perform a vectorized function on a 2D numpy array?

vecs = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
def find_len(vector):
return (vector[0] ** 2 + vector[1] ** 2 + vector[2] ** 2) ** 0.5
vec_len = np.vectorize(find_len)
I want to apply find_len to every vector in the 2d array and create a new numpy array with the values returned. How can I do this?
try this
res= []
for i in range(vecs.shape[0]):
res.append(find_len(vecs[i]))
res=np.array(res)
results in
array([ 3.74165739, 8.77496439, 13.92838828])
you can also make this in one line:
res = np.array([find_len(x) for x in vecs[range(vecs.shape[0])]])
Are you just looking for this result:
array([ 3.74165739, 8.77496439, 13.92838828])
because you can achieve that without vectorize, just use:
(vecs**2).sum(axis=1)**0.5
This also has the advantage of not being specific to vectors of length 3.
Operations are already applied element-wise, so you can handle the squaring and square rooting normally. sum(axis=1) says to sum along the rows.

Optimize the python function with numpy without using the for loop

I have the following python function:
def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance):
'''
Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the
provided distance function) compared to all other data points in X. Returns the label of x1
Args:
u (np.ndarray): The vector (ndim=1) we want to classify
X (np.ndarray): A matrix (ndim=2) with training data points (vectors)
Y (np.ndarray): A vector containing the label of each data point in X
distance (callable): A function that receives two inputs and defines the distance function used
Returns:
int: The label of the data point which is closest to `u`
'''
xbest = None
ybest = None
dbest = float('inf')
for x, y in zip(X, Y):
d = distance(u, x)
if d < dbest:
ybest = y
xbest = x
dbest = d
return ybest
Where, npdistance simply gives distance between two points i.e.
def npdistance(x1, x2):
return(np.sum((x1-x2)**2))
I want to optimize npnearest by performing nearest neighbor search directly in numpy. This means that the function cannot use for/while loops.
Thanks
Since you don't need to use that exact function, you can simply change the sum to work over a particular axis. This will return a new list with the calculations and you can call argmin to get the index of the minimum value. Use that and lookup your label:
import numpy as np
def npdistance_idx(x1, x2):
return np.argmin(np.sum((x1-x2)**2, axis=1))
Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])
idx = npdistance_idx(X, u)
print(Y[idx]) # label 1
Numpy supports vectorized operations (broadcasting)
This means you can pass in arrays and operations will be applied to entire arrays in an optimized way (SIMD - single instruction, multiple data)
You can then get the address of the array minimum using .argmin()
Hope this helps
In [9]: numbers = np.arange(10); numbers
Out[9]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [10]: numbers -= 5; numbers
Out[10]: array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])
In [11]: numbers = np.power(numbers, 2); numbers
Out[11]: array([25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
In [12]: numbers.argmin()
Out[12]: 5

Iterating and making decision through a NumPy array

I've been trying to iterate through a NumPy array. I'm trying to check if every element of the array is greater than or equal to 0.1. If an element is greater than or equal to 0.1 the code should append a list.
The array elements should have numbers which where processed in another function. Example:
[[-0.68454815]
[-0.6868374 ]
[-0.72553124]
[-0.72324855]
[-0.69258814]
[ 0.30578739]
[ 0.2679637 ]
[ 0.27038732]
[-0.62690676]
[ 0.372456 ]
[ 0.32854353]
[ 0.33191556]
[-0.6357395 ]
[ 0.3649385 ]
[ 0.31940787]
[ 0.32325424]
[-0.71096214]
[ 0.29032854]
[ 0.2589025 ]
[ 0.26576582]
[-0.71774566]
[ 0.28527439]
[ 0.25350313]
[ 0.26095643]
[-0.65131719]
[ 0.35093124]
[ 0.30984058]]
I'm using this to create a graph with the outputs of the counter_list to create a convergence .
I've checked several threads on here on how to do it and I've came up with something like this:
def looper(self):
rows = error.shape[0]
for x in range(0, rows):
counter_list = list();
if error(rows) >= 0.2:
counter += 1
counter_list.append(counter)
print("Amount:",counter_list)
However, I'm getting this error:
TypeError: 'numpy.ndarray' object is not callable
I know that the error TypeError: 'numpy.ndarray' object is not callable means that you tried to call a numpy array as a function, however I still don't know how to fix it.
If this list is one dimensional, you can try just using numpys slicing.
return data[data >= 0.1]
returning a np.array where every item >= 0.1 in the data np array is kept.
Easier way to check that condition:
import numpy as np
def check(a):
if np.all(a >= 0.1):
print("All of them do!")
return True
else:
print("Someone doesnt :'(")
return False
a = np.array([1, 2, 3, 4, 5, 0.02])
b = np.array([1, 2, 3, 4, 5, 6])
check(a)
check(b)
Output:
Someone doesnt :'(
All of them do!
EDIT
To know the number there is a similar easiness way to do it:
print(np.count_nonzero(a < 0.1)) # prints 1
print(np.count_nonzero(a < 0.1)) # prints 0
Then you can do something like:
e = np.arange(1, np.count_nonzero(a < 0.1) + 1) # e is [1]
This does work with multidimensional arrays. ie:
a = np.array([[1, 2, 0.04, 4, 5, 3],[1, 2, 3, 4, 5, 0.02]])
check(a)
print(np.count_nonzero(a < 0.1))
e = np.arange(1, np.count_nonzero(a < 0.1) + 1)
print(e)
Output:
Someone doesnt :'(
2
[1 2]
EDIT 2
To do a representation of the increasing errors (error meaning value less than 0.1) the most elegant way I can think of doing it is with a cumulative sum:
import matplotlib.pyplot as plt
a = np.array([1, 2, 0.03, 4, 5, 0.06, 7, 8])
err = np.cumsum(a < 0.1)
plt.plot(np.arange(1, err.shape[0] + 1), err)
plt.savefig('image.png')
plt.show()
Let me know if this is what you wanted!

Sorting an Array Alongside a 2d Array

So I'm using NumPy's linear algebra routines to do some basic computational quantum mechanics. Say I have a matrix, hamiltonian, and I want its eigenvalues and eigenvectors
import numpy as np
from numpy import linalg as la
hamiltonian = np.zeros((N, N)) # N is some constant I have defined
# fill up hamiltonian here
energies, states = la.eig(hamiltonian)
Now, I want to sort the energies in increasing order, and I want to sort the states along with them. For example, if I do:
groundStateEnergy = min(energies)
groundStateIndex = np.where(energies == groundStateEnergy)
groundState = states[groundStateIndex, :]
I correctly plot the ground state (eigenvector with the lowest eigenvalue). However, if I try something like this:
energies, states = zip(*sorted(zip(energies, states)))
or even
energies, states = zip(*sorted(zip(energies, states), key = lambda pair:pair[0])))
plotting in the same way no longer plots the correct state.So how can I sort states alongside energies, but only by row? (i.e, I want to associate each row of states with a value in energies, and I want to rearrange the rows so that the ordering of the rows corresponds to the sorted ordering of the values in energies)
You can use argsort as follows:
>>> x = np.random.random((1,10))
>>> x
array([ 0.69719108, 0.75828237, 0.79944838, 0.68245968, 0.36232211,
0.46565445, 0.76552493, 0.94967472, 0.43531813, 0.22913607])
>>> y = np.random.random((10))
>>> y
array([ 0.64332275, 0.34984653, 0.55240204, 0.31019789, 0.96354724,
0.76723872, 0.25721343, 0.51629662, 0.13096252, 0.86220311])
>>> idx = np.argsort(x)
>>> idx
array([9, 4, 8, 5, 3, 0, 1, 6, 2, 7])
>>> xsorted= x[idx]
>>> xsorted
array([ 0.22913607, 0.36232211, 0.43531813, 0.46565445, 0.68245968,
0.69719108, 0.75828237, 0.76552493, 0.79944838, 0.94967472])
>>> ysordedbyx = y[idx]
>>> ysordedbyx
array([ 0.86220311, 0.96354724, 0.13096252, 0.76723872, 0.31019789,
0.64332275, 0.34984653, 0.25721343, 0.55240204, 0.51629662])
and as suggested by the comments an example where we sort a 2d array by it's first collumn
>>> x=np.random.random((10,2))
>>> x
array([[ 0.72789275, 0.29404982],
[ 0.05149693, 0.24411234],
[ 0.34863983, 0.58950756],
[ 0.81916424, 0.32032827],
[ 0.52958012, 0.00417253],
[ 0.41587698, 0.32733306],
[ 0.79918377, 0.18465189],
[ 0.678948 , 0.55039723],
[ 0.8287709 , 0.54735691],
[ 0.74044999, 0.70688683]])
>>> idx = np.argsort(x[:,0])
>>> idx
array([1, 2, 5, 4, 7, 0, 9, 6, 3, 8])
>>> xsorted = x[idx,:]
>>> xsorted
array([[ 0.05149693, 0.24411234],
[ 0.34863983, 0.58950756],
[ 0.41587698, 0.32733306],
[ 0.52958012, 0.00417253],
[ 0.678948 , 0.55039723],
[ 0.72789275, 0.29404982],
[ 0.74044999, 0.70688683],
[ 0.79918377, 0.18465189],
[ 0.81916424, 0.32032827],
[ 0.8287709 , 0.54735691]])

python all possible products between columns

I have a numpy matrix X and I would like to add to this matrix as new variables all the possible products between 2 columns.
So if X=(x1,x2,x3) I want X=(x1,x2,x3,x1x2,x2x3,x1x3)
Is there an elegant way to do that?
I think a combination of numpy and itertools should work
EDIT:
Very good answers but are they considering that X is a matrix? So x1,x1,.. x3 can eventually be arrays?
EDIT:
A Real example
a=array([[1,2,3],[4,5,6]])
Itertools should be the answer here.
a = [1, 2, 3]
p = (x * y for x, y in itertools.combinations(a, 2))
print list(itertools.chain(a, p))
Result:
[1, 2, 3, 2, 3, 6] # 1, 2, 3, 2 x 1, 3 x 1, 3 x 2
I think Samy's solution is pretty good. If you need to use numpy, you could transform it a little like this:
from itertools import combinations
from numpy import prod
x = [1, 2, 3]
print x + map(prod, combinations(x, 2))
Gives the same output as Samy's solution:
[1, 2, 3, 2, 3, 6]
If your arrays are small, then Samy's pure-Python solution using itertools.combinations should be fine:
from itertools import combinations, chain
def all_products1(a):
p = (x * y for x, y in combinations(a, 2))
return list(chain(a, p))
But if your arrays are large, then you'll get a substantial speedup by fully vectorizing the computation, using numpy.triu_indices, like this:
import numpy as np
def all_products2(a):
x, y = np.triu_indices(len(a), 1)
return np.r_[a, a[x] * a[y]]
Let's compare these:
>>> data = np.random.uniform(0, 100, (10000,))
>>> timeit(lambda:all_products1(data), number=1)
53.745754408999346
>>> timeit(lambda:all_products2(data), number=1)
12.26144006299728
The solution using numpy.triu_indices also works for multi-dimensional data:
>>> np.random.uniform(0, 100, (3,2))
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808]])
>>> all_products2(_)
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808],
[ 6014.22480172, 771.41777239],
[ 5625.39908354, 1373.00597677],
[ 8324.59122432, 4587.57109368]])
If you want to operate on columns rather than rows, use:
def all_products3(a):
x, y = np.triu_indices(a.shape[1], 1)
return np.c_[a, a[:,x] * a[:,y]]
For example:
>>> np.random.uniform(0, 100, (2,3))
array([[ 33.0062385 , 28.17575024, 20.42504351],
[ 40.84235995, 61.12417428, 58.74835028]])
>>> all_products3(_)
array([[ 33.0062385 , 28.17575024, 20.42504351, 929.97553238,
674.15385734, 575.4909246 ],
[ 40.84235995, 61.12417428, 58.74835028, 2496.45552756,
2399.42126888, 3590.94440122]])

Categories