Combination of rows in numpy.ndarray

Combination of rows in numpy.ndarray - python

I have the following numpy.ndarray
S=np.array([[[ -0.6, -0.2, 0. ],
[-60. , 2. , 0. ],
[ 6. , -20. , 0. ]],
[[ -0.4, -0.8, 0. ],
[-40. , 8. , 0. ],
[ 4. , -80. , 0. ]]])
I want to find all the possible combinations of sum of each row (sum of individual elements of a row except the last column) of S[0,:,:] with each row of S[1,:,:], i.e., my desired result is (order does not matter):
array([[-1, -1],
[-40.6, 7.8],
[3.4, -80.2],
[-60.4, 1.2],
[-100, 10],
[-56, -78],
[5.6, -20.8],
[-34, -12],
[10, -100]])
which is a 9-by-2 array resulting from 9 possible combinations of S[0,:,:] and S[1,:,:]. Although I have used a particular shape of S here, the shape may vary, i.e., for
x,y,z = np.shape(S)
in the above problem, x=2, y=3, and z=3, but these values may vary. Therefore, I am seeking for a generalized version.
Your help will be highly appreciated. Thank you for your time!
(Please no for loops if possible. It is pretty trivial then.)

You can use broadcast like this:
(S[0,:,None, :-1] + S[1,None,:,:-1]).reshape(-1,2)
Output:
array([[ -1. , -1. ],
[ -40.6, 7.8],
[ 3.4, -80.2],
[ -60.4, 1.2],
[-100. , 10. ],
[ -56. , -78. ],
[ 5.6, -20.8],
[ -34. , -12. ],
[ 10. , -100. ]])

Related

How can i sort an array based on the mean of each column in python?

Input:
array([[ 1. , 5. , 1. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ]])
Expected Output:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
i tried doing the below:
for i in M:
ls = i.mean()
x = np.append(i,ls)
print(x) #found the mean
After this i am unable to arrange each column based on the mean value in each row. All i can do
is to arrange each row in descending order but that is not what i wanted.

You can do this:
In [405]: row_idxs = np.argsort(np.mean(a * -1, axis=1))
In [406]: a[row_idxs, :]
Out[406]:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
Using argsort will sort the indices. Multiplying by -1 allows you to get descending order.

python: DELETE points out of a very big 2D array and elements are float, like discarding unwanted points in KNN

I have a 2D array and I want to delete a point out of it but suppose it's so big meaning I can't specify an index and just grab it and the values of the array are float
How can I delete this point? With a LOOP and WITHOUT LOOP?? the following is 2D array and I want to delete [ 32.9, 23.]
[[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]]
I tried this but doesn't work:
this_point = np.asarray([ 32.9, 23.])
[x for x in y if x == point]
del datapoints[this_point]
np.delete(datapoints,len(datapoints), axis=0)
for this_point in datapoints:
del this_point
when I do this, the this_point stays in after printing all points, what should I do?

Python can remove a list element by content, but numpy does only by index. So, use "where" to find the coordinates of the matching row:
import numpy as np
a = np.array([[ 1. , -1.4],
[ -2.9, -1.5],
[ -3.6, -2. ],
[ 1.5, 1. ],
[ 24. , 11. ],
[ -1. , 1.4],
[ 2.9, 1.5],
[ 3.6, 2. ],
[ -1.5, -1. ],
[ -24. , -11. ],
[ 32.9, 23. ],
[-440. , 310. ]])
find = np.array([32.9,23.])
row = np.where( (a == find).all(axis=1))
print( row )
print(np.delete( a, row, axis=0 ) )
Output:
(array([10], dtype=int64),)
[[ 1. -1.4]
[ -2.9 -1.5]
[ -3.6 -2. ]
[ 1.5 1. ]
[ 24. 11. ]
[ -1. 1.4]
[ 2.9 1.5]
[ 3.6 2. ]
[ -1.5 -1. ]
[ -24. -11. ]
[-440. 310. ]]
C:\tmp>

turning a list of numpy.ndarray to a matrix in order to perform multiplication

i have vectors of this form :
test=np.linspace(0,1,10)
i want to stack them horizontally in order to make a matrix .
problem is that i define them in a loop so the first stack is between an empty matrix and the first column vector , which gives the following error:
ValueError: all the input arrays must have same number of dimensions
bottom line - i have a for loop that with every iteration creates a vector p1 and i want to add it to a final matrix of the form :
[p1 p2 p3 p4] which i could then do matrix operations on such as multiplying by the transposed etc

If you've got a list of 1D arrays that you want horizontally stacked, you could convert them all to column first, but it's probably easier to just vertically stack them and then transpose:
In [6]: vector_list = [np.linspace(0, 1, 10) for _ in range(3)]
In [7]: np.vstack(vector_list).T
Out[7]:
array([[0. , 0. , 0. ],
[0.11111111, 0.11111111, 0.11111111],
[0.22222222, 0.22222222, 0.22222222],
[0.33333333, 0.33333333, 0.33333333],
[0.44444444, 0.44444444, 0.44444444],
[0.55555556, 0.55555556, 0.55555556],
[0.66666667, 0.66666667, 0.66666667],
[0.77777778, 0.77777778, 0.77777778],
[0.88888889, 0.88888889, 0.88888889],
[1. , 1. , 1. ]])

How did you get this dimension error? What does empty array have to do with it?
A list of arrays of the same length:
In [610]: alist = [np.linspace(0,1,6), np.linspace(10,11,6)]
In [611]: alist
Out[611]:
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]),
array([10. , 10.2, 10.4, 10.6, 10.8, 11. ])]
Several ways of making an array from them:
In [612]: np.array(alist)
Out[612]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
In [614]: np.stack(alist)
Out[614]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
If you want to join them in columns, you can transpose one of the above, or use:
In [615]: np.stack(alist, axis=1)
Out[615]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
np.column_stack is also handy.
In newer numpy versions you can do:
In [617]: np.linspace((0,10),(1,11),6)
Out[617]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
You don't specify how you create the 'empty array' and how you attempt to stack. I can't exactly recreate the error message (full traceback would have helped). But given that message did you check the number of dimensions of the inputs? Did they match?
Array stacking in a loop is tricky. You have to pay close attention to the shapes, especially of the initial 'empty' array. There isn't a close analog to the empty list []. np.array([]) is 1d with shape (1,). np.empty((0,6)) is 2d with shape (0,6). Also all the stacking functions create a new array with each call (non operate in-place), so they are inefficient (compared to list append).

How to resample a Numpy array of arbitrary dimensions?

There is scipy.misc.imresize for resampling the first two dimensions of 3D arrays. It also supports bilinear interpolation. However, there does not seem to be an existing function for resizing all dimensions of arrays with any number of dimensions. How can I resample any array given a new shape of the same rank, using multi-linear interpolation?

You want scipy.ndimage.zoom, which can be used as follows:
>>> x = np.arange(8, dtype=np.float_).reshape(2, 2, 2)
>>> scipy.ndimage.zoom(x, 1.5, order=1)
array([[[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ],
[ 2. , 2.5, 3. ]],
[[ 2. , 2.5, 3. ],
[ 3. , 3.5, 4. ],
[ 4. , 4.5, 5. ]],
[[ 4. , 4.5, 5. ],
[ 5. , 5.5, 6. ],
[ 6. , 6.5, 7. ]]])
Note that this function always preserves the boundaries of the image, essentially resampling a mesh with a node at each pixel center. You might want to look at other functions in scipy.ndimage if you need more control over exactly where the resampling occurs

Delete columns based on repeat value in one row in numpy array

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).
initial_array =
row 0 [[ 1, 1, 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2.5, 2, 1, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 3, 2.5, 1.5, 4,]
row 3 [228, 314, 173, 452, 168, 351, 300, 396]]
final_array =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 314, 173, 452, 351, 396]]
Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.
If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).
final_array_averaged =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 307, 170.5, 452, 351, 396]]
Thanks in advance for any help you can give to a beginner who is stumped!

You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -
_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C
Sample run -
In [212]: arr
Out[212]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2.5, 2. , 1. , 3.5],
[ 1. , 1.5, 3. , 4.5, 3. , 2.5, 1.5, 4. ],
[ 228. , 314. , 173. , 452. , 168. , 351. , 300. , 396. ]])
In [213]: out
Out[213]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2. , 2.5, 3.5, 4. ],
[ 1. , 1.5, 2.5, 3. , 4. , 4.5],
[ 228. , 307. , 351. , 170.5, 396. , 452. ]])
As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -
In [221]: out[:,unqID.argsort()]
Out[221]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 307. , 170.5, 452. , 351. , 396. ]])

You can find the indices of wanted columns using unique:
>>> indices = np.sort(np.unique(A[1], return_index=True)[1])
Then use a simple indexing to get the desire columns:
>>> A[:,indices]
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 314. , 173. , 452. , 351. , 396. ]])

This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)
Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combination of rows in numpy.ndarray - python

You can use broadcast like this: (S[0,:,None, :-1] + S[1,None,:,:-1]).reshape(-1,2) Output: array([[ -1. , -1. ], [ -40.6, 7.8], [ 3.4, -80.2], [ -60.4, 1.2], [-100. , 10. ], [ -56. , -78. ], [ 5.6, -20.8], [ -34. , -12. ], [ 10. , -100. ]])

Related

How can i sort an array based on the mean of each column in python?

python: DELETE points out of a very big 2D array and elements are float, like discarding unwanted points in KNN

turning a list of numpy.ndarray to a matrix in order to perform multiplication

How to resample a Numpy array of arbitrary dimensions?

Delete columns based on repeat value in one row in numpy array

Categories

Resources