Delete columns based on repeat value in one row in numpy array

Delete columns based on repeat value in one row in numpy array - python

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).
initial_array =
row 0 [[ 1, 1, 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2.5, 2, 1, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 3, 2.5, 1.5, 4,]
row 3 [228, 314, 173, 452, 168, 351, 300, 396]]
final_array =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 314, 173, 452, 351, 396]]
Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.
If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).
final_array_averaged =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 307, 170.5, 452, 351, 396]]
Thanks in advance for any help you can give to a beginner who is stumped!

You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -
_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C
Sample run -
In [212]: arr
Out[212]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2.5, 2. , 1. , 3.5],
[ 1. , 1.5, 3. , 4.5, 3. , 2.5, 1.5, 4. ],
[ 228. , 314. , 173. , 452. , 168. , 351. , 300. , 396. ]])
In [213]: out
Out[213]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2. , 2.5, 3.5, 4. ],
[ 1. , 1.5, 2.5, 3. , 4. , 4.5],
[ 228. , 307. , 351. , 170.5, 396. , 452. ]])
As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -
In [221]: out[:,unqID.argsort()]
Out[221]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 307. , 170.5, 452. , 351. , 396. ]])

You can find the indices of wanted columns using unique:
>>> indices = np.sort(np.unique(A[1], return_index=True)[1])
Then use a simple indexing to get the desire columns:
>>> A[:,indices]
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 314. , 173. , 452. , 351. , 396. ]])

This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)
Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.

Related

Insert values between other values in numpy array

Let's say I start with this array:
start_array = [[1.48, 1.79, 2.10, 2.80]
[63, 60, 57, 60]]
I want to take the values in this second array:
second_array = np.array([2.3,3.42, 4.47])
and insert them in in between the values in the first row, with a 1 in another row to code that something occurred there. The remaining places should be filed with zeros.
Result:
result = np.array([[1.48, 1.79, 2.10, 2.3, 2.80, 3.42, 4.47],
[63., 60., 57., 0, 60., 0, 0],
[0, 0, 0, 1, 0, 1, 1]
])

Here's a numpy based approach:
# flatten start, and searchsorted to see where to insert
start_array_view = start_array.ravel()
ixs = np.searchsorted(start_array_view, second_array) + np.arange(len(second_array))
# construct output array
x,y = start_array.shape
out = np.zeros((x,y+len(ixs)))
# insert values from second array
z_pad = [0]*(len(ixs)*out.shape[0]-len(second_array))
out[:,ixs] = np.r_[second_array,z_pad ].reshape(out.shape[0],-1)
# insert values from start array
ar = np.arange(out.shape[1])
ixs_start = ar[~np.isin(ar, ixs)]
out[:,ixs_start] = start_array
# add indicator row
z = np.zeros(out.shape[1])
z[ixs] = 1
out = np.vstack([out,z])
print(out)
array([[ 1.48, 1.79, 2.1 , 2.3 , 2.8 , 3.42, 4.47],
[63. , 60. , 57. , 0. , 60. , 0. , 0. ],
[ 0. , 0. , 0. , 1. , 0. , 1. , 1. ]])

Combination of rows in numpy.ndarray

I have the following numpy.ndarray
S=np.array([[[ -0.6, -0.2, 0. ],
[-60. , 2. , 0. ],
[ 6. , -20. , 0. ]],
[[ -0.4, -0.8, 0. ],
[-40. , 8. , 0. ],
[ 4. , -80. , 0. ]]])
I want to find all the possible combinations of sum of each row (sum of individual elements of a row except the last column) of S[0,:,:] with each row of S[1,:,:], i.e., my desired result is (order does not matter):
array([[-1, -1],
[-40.6, 7.8],
[3.4, -80.2],
[-60.4, 1.2],
[-100, 10],
[-56, -78],
[5.6, -20.8],
[-34, -12],
[10, -100]])
which is a 9-by-2 array resulting from 9 possible combinations of S[0,:,:] and S[1,:,:]. Although I have used a particular shape of S here, the shape may vary, i.e., for
x,y,z = np.shape(S)
in the above problem, x=2, y=3, and z=3, but these values may vary. Therefore, I am seeking for a generalized version.
Your help will be highly appreciated. Thank you for your time!
(Please no for loops if possible. It is pretty trivial then.)

You can use broadcast like this:
(S[0,:,None, :-1] + S[1,None,:,:-1]).reshape(-1,2)
Output:
array([[ -1. , -1. ],
[ -40.6, 7.8],
[ 3.4, -80.2],
[ -60.4, 1.2],
[-100. , 10. ],
[ -56. , -78. ],
[ 5.6, -20.8],
[ -34. , -12. ],
[ 10. , -100. ]])

turning a list of numpy.ndarray to a matrix in order to perform multiplication

i have vectors of this form :
test=np.linspace(0,1,10)
i want to stack them horizontally in order to make a matrix .
problem is that i define them in a loop so the first stack is between an empty matrix and the first column vector , which gives the following error:
ValueError: all the input arrays must have same number of dimensions
bottom line - i have a for loop that with every iteration creates a vector p1 and i want to add it to a final matrix of the form :
[p1 p2 p3 p4] which i could then do matrix operations on such as multiplying by the transposed etc

If you've got a list of 1D arrays that you want horizontally stacked, you could convert them all to column first, but it's probably easier to just vertically stack them and then transpose:
In [6]: vector_list = [np.linspace(0, 1, 10) for _ in range(3)]
In [7]: np.vstack(vector_list).T
Out[7]:
array([[0. , 0. , 0. ],
[0.11111111, 0.11111111, 0.11111111],
[0.22222222, 0.22222222, 0.22222222],
[0.33333333, 0.33333333, 0.33333333],
[0.44444444, 0.44444444, 0.44444444],
[0.55555556, 0.55555556, 0.55555556],
[0.66666667, 0.66666667, 0.66666667],
[0.77777778, 0.77777778, 0.77777778],
[0.88888889, 0.88888889, 0.88888889],
[1. , 1. , 1. ]])

How did you get this dimension error? What does empty array have to do with it?
A list of arrays of the same length:
In [610]: alist = [np.linspace(0,1,6), np.linspace(10,11,6)]
In [611]: alist
Out[611]:
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]),
array([10. , 10.2, 10.4, 10.6, 10.8, 11. ])]
Several ways of making an array from them:
In [612]: np.array(alist)
Out[612]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
In [614]: np.stack(alist)
Out[614]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
If you want to join them in columns, you can transpose one of the above, or use:
In [615]: np.stack(alist, axis=1)
Out[615]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
np.column_stack is also handy.
In newer numpy versions you can do:
In [617]: np.linspace((0,10),(1,11),6)
Out[617]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
You don't specify how you create the 'empty array' and how you attempt to stack. I can't exactly recreate the error message (full traceback would have helped). But given that message did you check the number of dimensions of the inputs? Did they match?
Array stacking in a loop is tricky. You have to pay close attention to the shapes, especially of the initial 'empty' array. There isn't a close analog to the empty list []. np.array([]) is 1d with shape (1,). np.empty((0,6)) is 2d with shape (0,6). Also all the stacking functions create a new array with each call (non operate in-place), so they are inefficient (compared to list append).

Concatenate and sort data in python

I have a few numpy arrays like so:
import numpy as np
a = np.array([[1, 2, 3, 4, 5], [14, 16, 17, 27, 38]])
b = np.array([[1, 2, 3, 4, 5], [.4, .2, .5, .1, .6]])
I'd like to be able to 1.Copy these arrays into a new single array and 2. Sort the data so that the result is as follows:
data = [[1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [14, .4, 16, .2, 17, .5, 27, .1, 38, .6]]
Or, in other words, I need all columns from the original array to be the same, just in an ascending order. I tried this:
data = np.hstack((a,b))
Which gave me the appended data, but I'm not sure how to sort it. I tried np.sort() but it didn't keep the columns the same. Thanks!

Stack those horizontally (as you already did), then get argsort indices for sorting first row and use those to sort all columns in the stacked array.
Thus, we need to add one more step, like so -
ab = np.hstack((a,b))
out = ab[:,ab[0].argsort()]
Sample run -
In [370]: a
Out[370]:
array([[ 1, 2, 3, 4, 5],
[14, 16, 17, 27, 38]])
In [371]: b
Out[371]:
array([[ 1. , 2. , 3. , 4. , 5. ],
[ 0.4, 0.2, 0.5, 0.1, 0.6]])
In [372]: ab = np.hstack((a,b))
In [373]: print ab[:,ab[0].argsort()]
[[ 1. 1. 2. 2. 3. 3. 4. 4. 5. 5. ]
[ 14. 0.4 16. 0.2 17. 0.5 27. 0.1 38. 0.6]]
Please note that to keep the order for identical elements, we need to use to use kind='mergesort' with argsort as described in the docs.

If you like something short.
np.array(zip(*sorted(zip(*np.hstack((a,b))))))
>>> array([[ 1. , 1. , 2. , 2. , 3. , 3. , 4. , 4. , 5. , 5. ],
[ 0.4, 14. , 0.2, 16. , 0.5, 17. , 0.1, 27. , 0.6, 38. ]])
Version that preserve second element order:
np.array(zip(*sorted(zip(*np.hstack((a,b))),key=lambda x:x[0])))
>>>array([[ 1. , 1. , 2. , 2. , 3. , 3. , 4. , 4. , 5. , 5. ],
[ 14. , 0.4, 16. , 0.2, 17. , 0.5, 27. , 0.1, 38. ,0.6]])

matlab ismember function in python

Although similar questions have been raised a couple of times, still I cannot make a function similar to the matlab ismember function in Python. In particular, I want to use this function in a loop, and compare in each iteration a whole matrix to an element of another matrix. Where the same value is occurring, I want to print 1 and in any other case 0.
Let say that I have the following matrices
d = np.reshape(np.array([ 2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ]),(1,10))
d_unique = np.unique(d)
then I have
d_unique
array([ 0. , 1. , 1.25, 1.5 , 1.75, 2.25])
Now I want to iterate like
J = np.zeros(np.size(d_unique))
for i in xrange(len(d_unique)):
J[i] = np.sum(ismember(d,d_unique[i]))
so as to take as an output:
J = [3,1,2,2,1,1]
Does anybody have any idea? Many thanks in advance.

In contrast to other answers, numpy has the built-in numpy.in1d for doing that.
Usage in your case:
bool_array = numpy.in1d(array1, array2)
Note: It also accepts lists as inputs.
EDIT (2021):
numpy now recommend using np.isin instead of np.in1d. np.isin preserves the shape of the input array, while np.in1d returns a flattened output.

To answer your question, I guess you could define a ismember similarly to:
def ismember(d, k):
return [1 if (i == k) else 0 for i in d]
But I am not familiar with numpy, so a little adjustement may be in order.
I guess you could also use Counter from collections:
>>> from collections import Counter
>>> a = [2.25, 1.25, 1.5, 1., 0., 1.25, 1.75, 0., 1.5, 0. ]
>>> Counter(a)
Counter({0.0: 3, 1.25: 2, 1.5: 2, 2.25: 1, 1.0: 1, 1.75: 1})
>>> Counter(a).keys()
[2.25, 1.25, 0.0, 1.0, 1.5, 1.75]
>>> c =Counter(a)
>>> [c[i] for i in sorted(c.keys())]
[3, 1, 2, 2, 1, 1]
Once again, not numpy, you will probably have to do some list(d) somewhere.

Try the following function:
def ismember(A, B):
return [ np.sum(a == B) for a in A ]
This should very much behave like the corresponding MALTAB function.

Try the ismember library from pypi.
pip install ismember
Example:
# Import library
from ismember import ismember
# data
d = [ 2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ]
d_unique = [ 0. , 1. , 1.25, 1.5 , 1.75, 2.25]
# Lookup
Iloc,idx = ismember(d, d_unique)
# Iloc is boolean defining existence of d in d_unique
print(Iloc)
# [[True True True True True True True True True True]]
# indexes of d_unique that exists in d
print(idx)
# array([5, 2, 3, 1, 0, 2, 4, 0, 3, 0], dtype=int64)
print(d_unique[idx])
array([2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ])
print(d[Iloc])
array([2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ])
# These vectors will match
d[Iloc]==d_unique[idx]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete columns based on repeat value in one row in numpy array - python

Related

Insert values between other values in numpy array

Combination of rows in numpy.ndarray

turning a list of numpy.ndarray to a matrix in order to perform multiplication

Concatenate and sort data in python

matlab ismember function in python

Categories

Resources