How to select first rows when column changes value in numpy.array - python

Is there a nice quick way to do the following selection for numpy arrays?
>>> A=np.array([[1,2], [2,2], [3,5], [4,5]])
>>> A
array([[1, 2],
[2, 2],
[3, 5],
[4, 5]])
I would like to select the first rows when the second column changes value. For the above array, the result would be:
array([[1, 2],
[3, 5]])

>>> xs = np.array([[1,2], [2,2], [3,5], [4,5]])
>>> j = scipy.r_[True, xs[:-1,1] != xs[1:,1]] # or np.concatenate here
>>> xs[j,:]
array([[1, 2],
[3, 5]])

Related

How to reshape numpy array with reorder?

I have a 1 x 2 x 3 array:
>>> a = np.array([[[1,2,3],[4,5,6]]])
>>> a
array([[[1, 2, 3],
[4, 5, 6]]])
>>> a.shape
(1, 2, 3)
I want to reshape it to (3,1,2), but so that the elements along original dim 3 are now along dim 1. I want the result to look like this:
>>> new_a
array([[[1, 4]],
[[2, 5]],
[[3, 6]]])
and when I just use reshape, I get the the right shape, but the elements are in the same order, not what I want:
>>> a.reshape((3,1,2))
array([[[1, 2]],
[[3, 4]],
[[5, 6]]])
How can I achieve this?
Simply use np.transpose -
a.transpose(2,0,1)
Sample run -
In [347]: a
Out[347]:
array([[[1, 2, 3],
[4, 5, 6]]])
In [348]: a.transpose(2,0,1)
Out[348]:
array([[[1, 4]],
[[2, 5]],
[[3, 6]]])
Alternatively :
With np.moveaxis -
np.moveaxis(a,2,0)
With np.rollaxis -
np.rollaxis(a,2,0)
There are a few ways, but transpose() is probably the easiest:
array.transpose(2,0,1)
import einops
einops.rearrange(x, 'x y z -> z x y')
And better use some meaningful axes names instead of x, y, z (like width, height, etc.)

What does x=x[class_id] do when used on NumPy arrays

I am learning Python and solving a machine learning problem.
class_ids=np.arange(self.x.shape[0])
np.random.shuffle(class_ids)
self.x=self.x[class_ids]
This is a shuffle function in NumPy but I can't understand what self.x=self.x[class_ids] means. because I think it gives the value of the array to a variable.
It's a very complicated way to shuffle the first dimension of your self.x. For example:
>>> x = np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]])
>>> x
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
Then using the mentioned approach
>>> class_ids=np.arange(x.shape[0]) # create an array [0, 1, 2, 3, 4]
>>> np.random.shuffle(class_ids) # shuffle the array
>>> x[class_ids] # use integer array indexing to shuffle x
array([[5, 5],
[3, 3],
[1, 1],
[4, 4],
[2, 2]])
Note that the same could be achieved just by using np.random.shuffle because the docstring explicitly mentions:
This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.
>>> np.random.shuffle(x)
>>> x
array([[5, 5],
[3, 3],
[1, 1],
[2, 2],
[4, 4]])
or by using np.random.permutation:
>>> class_ids = np.random.permutation(x.shape[0]) # shuffle the first dimensions indices
>>> x[class_ids]
array([[2, 2],
[4, 4],
[3, 3],
[5, 5],
[1, 1]])
Assuming self.x is a numpy array:
class_ids is a 1-d numpy array that is being used as an integer array index in the expression: x[class_ids]. Because the previous line shuffled class_ids, x[class_ids] evaluates to self.x shuffled by rows.
The assignment self.x=self.x[class_ids] assigns the shuffled array to self.x

Numpy merge two arrays can i make a numpy array like this?

I am trying two array like this. It's different from column_stack so I am not able to find how to do it from documentation or google search.
I have arrays a and b. How can I make c from them ?
a = [[1, 2],[3, 4]]
b = [[5 , 6]]
c = [[[1, 2],[5]],
[3, 4],[6]]]
I need this to input the values to theanets.
In [54]:
a = np.array([[1, 2],[3, 4]])
a
Out[54]:
array([[1, 2],
[3, 4]])
In [55]:
b = np.array([[5 , 6]])
b
Out[55]:
array([[5, 6]])
In [96]:
c = [[a[n].tolist() , b[:,n].tolist()] for n in range(len(a))]
c
Out[96]:
[[[1, 2], [5]], [[3, 4], [6]]]

Updating a NumPy array with another

Seemingly simple question: I have an array with two columns, the first represents an ID and the second a count. I'd like to update it with another, similar array such that
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
a.update(b) # ????
>>> np.array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3]])
Is there a way to do this with indexing/slicing such that I don't simply have to iterate over each row?
Generic case
Approach #1: You can use np.add.at to do such an ID-based adding operation like so -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
To find a_idx and b_idx, as probably a faster alternative, np.searchsorted could be used like so -
a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')
Sample input-output :
In [538]: a
Out[538]:
array([[1, 2],
[4, 2],
[3, 1],
[5, 5]])
In [539]: b
Out[539]:
array([[3, 7],
[1, 1],
[4, 0],
[2, 3],
[6, 2]])
In [540]: out
Out[540]:
array([[1, 3],
[2, 3],
[3, 8],
[4, 2],
[5, 5],
[6, 2]])
Approach #2: You can use np.bincount to do the same ID based adding -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))
# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)
# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)
# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
Specific case
If the ID columns in a and b are sorted, it becomes easier, as we can just use masks with np.in1d to index into the output ID array created with np.union like so -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
Sample run -
In [552]: a
Out[552]:
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[8, 5]])
In [553]: b
Out[553]:
array([[2, 2],
[3, 1],
[4, 0],
[5, 3],
[6, 2],
[8, 2]])
In [554]: out
Out[554]:
array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3],
[6, 2],
[8, 7]])
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[5, 3]])
Note that if you want the result become sorted you can use np.lexsort :
result[np.lexsort((result[:,0],result[:,0]))]
Explanation :
First you can find the unique ids with following command :
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])
Then find the different between the ids if a and all of ids :
>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])
Then find the items within b with the ids in diff :
>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])
And at last concatenate the result with list a:
>>> np.concatenate((a,val))
consider another example with sorting :
>>> a = np.array([[1, 2],
... [2, 2],
... [3, 1],
... [7, 5]])
>>>
>>> b = np.array([[2, 2],
... [3, 1],
... [4, 0],
... [5, 3]])
>>>
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
[2, 2],
[3, 1],
[4, 0],
[5, 3],
[7, 5]])
That's an old question but here is a solution with pandas (that could be generalized for other aggregation functions than sum). Also sorting will occur automatically:
import pandas as pd
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
print((pd.DataFrame(a[:, 1], index=a[:, 0])
.add(pd.DataFrame(b[:, 1], index=b[:, 0]), fill_value=0)
.astype(int))
.reset_index()
.to_numpy())
Output:
[[1 2]
[2 4]
[3 2]
[4 5]
[5 3]]

Combining an array using Python and NumPy

I have two arrays of the form:
a = np.array([1,2,3])
b = np.array([4,5,6])
Is there a NumPy function which I can apply to these arrays to get the followng output?
[[1,4],[2,5][3,6]]
np.vstack((a,b)).T
returns
array([[1, 4],
[2, 5],
[3, 6]])
and
np.vstack((a,b)).T.tolist()
returns exactly what you need:
[[1, 4], [2, 5], [3, 6]]

Categories