Let say I have 2 numpy arrays
import numpy as np
x = np.array([1,2,3])
y = np.array([1,2,3,4])
With this, I want to create a 2-dimensional array as below
Is there any method available to directly achieve this?
You problem is about writing the Cartesian product. In numpy, you can write it using repeat and tile:
out = np.c_[np.repeat(x, len(y)), np.tile(y, len(x))]
Python's builtin itertools module has a method designed for this: product:
from itertools import product
out = np.array(list(product(x,y)))
Output:
array([[1, 1],
[1, 2],
[1, 3],
[1, 4],
[2, 1],
[2, 2],
[2, 3],
[2, 4],
[3, 1],
[3, 2],
[3, 3],
[3, 4]])
I've tried and searched for a few days, I've come closer but need your help.
I have a 3d array in python,
shape(files)
>> (31,2049,2)
which corresponds to 31 input files with 2 columns of data with 2048 rows and a header.
I'd like to sort this array based on the header, which is a number, in each file.
I tried to follow NumPy: sorting 3D array but keeping 2nd dimension assigned to first , but i'm incredibly confused.
First I try to setup get my headers for the argsort, I thought I could do
sortval=files[:][0][0]
but this does not work..
Then I simply did a for loop to iterate and get my headers
for i in xrange(shape(files)[0]:
sortval.append([i][0][0])
Then
sortedIdx = np.argsort(sortval)
This works, however I dont understand whats happening in the last line..
files = files[np.arange(len(deck))[:,np.newaxis],sortedIdx]
Help would be appreciated.
Another way to do this is with np.take
header = a[:,0,0]
sorted = np.take(a, np.argsort(header), axis=0)
Here we can use a simple example to demonstrate what your code is doing:
First we create a random 3D numpy matrix:
a = (np.random.rand(3,3,2)*10).astype(int)
array([[[3, 1],
[3, 7],
[0, 3]],
[[2, 9],
[1, 0],
[9, 2]],
[[9, 2],
[8, 8],
[8, 0]]])
Then a[:] will gives a itself, and a[:][0][0] is just the first row in first 2D array in a, which is:
a[:][0]
# array([[3, 1],
# [3, 7],
# [0, 3]])
a[:][0][0]
# array([3, 1])
What you want is the header which are 3,2,9 in this example, so we can use a[:, 0, 0] to extract them:
a[:,0,0]
# array([3, 2, 9])
Now we sort the above list and get an index array:
np.argsort(a[:,0,0])
# array([1, 0, 2])
In order to rearrange the entire 3D array, we need to slice the array with correct order. And np.arange(len(a))[:,np.newaxis] is equal to np.arange(len(a)).reshape(-1,1) which creates a sequential 2D index array:
np.arange(len(a))[:,np.newaxis]
# array([[0],
# [1],
# [2]])
Without the 2D array, we will slice the array to 2 dimension
a[np.arange(3), np.argsort(a[:,0,0])]
# array([[3, 7],
# [2, 9],
# [8, 0]])
With the 2D array, we can perform 3D slicing and keeps the shape:
a[np.arange(3).reshape(-1,1), np.argsort(a[:,0,0])]
array([[[3, 7],
[3, 1],
[0, 3]],
[[1, 0],
[2, 9],
[9, 2]],
[[8, 8],
[9, 2],
[8, 0]]])
And above is the final result you want.
Edit:
To arange the 2D arrays:, one could use:
a[np.argsort(a[:,0,0])]
array([[[2, 9],
[1, 0],
[9, 2]],
[[3, 1],
[3, 7],
[0, 3]],
[[9, 2],
[8, 8],
[8, 0]]])
Seemingly simple question: I have an array with two columns, the first represents an ID and the second a count. I'd like to update it with another, similar array such that
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
a.update(b) # ????
>>> np.array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3]])
Is there a way to do this with indexing/slicing such that I don't simply have to iterate over each row?
Generic case
Approach #1: You can use np.add.at to do such an ID-based adding operation like so -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
To find a_idx and b_idx, as probably a faster alternative, np.searchsorted could be used like so -
a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')
Sample input-output :
In [538]: a
Out[538]:
array([[1, 2],
[4, 2],
[3, 1],
[5, 5]])
In [539]: b
Out[539]:
array([[3, 7],
[1, 1],
[4, 0],
[2, 3],
[6, 2]])
In [540]: out
Out[540]:
array([[1, 3],
[2, 3],
[3, 8],
[4, 2],
[5, 5],
[6, 2]])
Approach #2: You can use np.bincount to do the same ID based adding -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))
# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)
# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)
# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
Specific case
If the ID columns in a and b are sorted, it becomes easier, as we can just use masks with np.in1d to index into the output ID array created with np.union like so -
# First column of output array as the union of first columns of a,b
out_id = np.union1d(a[:,0],b[:,0])
# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])
# Initialize second column of output array
out_count = np.zeros_like(out_id)
# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])
# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))
Sample run -
In [552]: a
Out[552]:
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[8, 5]])
In [553]: b
Out[553]:
array([[2, 2],
[3, 1],
[4, 0],
[5, 3],
[6, 2],
[8, 2]])
In [554]: out
Out[554]:
array([[1, 2],
[2, 4],
[3, 2],
[4, 5],
[5, 3],
[6, 2],
[8, 7]])
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
[2, 2],
[3, 1],
[4, 5],
[5, 3]])
Note that if you want the result become sorted you can use np.lexsort :
result[np.lexsort((result[:,0],result[:,0]))]
Explanation :
First you can find the unique ids with following command :
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])
Then find the different between the ids if a and all of ids :
>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])
Then find the items within b with the ids in diff :
>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])
And at last concatenate the result with list a:
>>> np.concatenate((a,val))
consider another example with sorting :
>>> a = np.array([[1, 2],
... [2, 2],
... [3, 1],
... [7, 5]])
>>>
>>> b = np.array([[2, 2],
... [3, 1],
... [4, 0],
... [5, 3]])
>>>
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
[2, 2],
[3, 1],
[4, 0],
[5, 3],
[7, 5]])
That's an old question but here is a solution with pandas (that could be generalized for other aggregation functions than sum). Also sorting will occur automatically:
import pandas as pd
import numpy as np
a = np.array([[1, 2],
[2, 2],
[3, 1],
[4, 5]])
b = np.array([[2, 2],
[3, 1],
[4, 0],
[5, 3]])
print((pd.DataFrame(a[:, 1], index=a[:, 0])
.add(pd.DataFrame(b[:, 1], index=b[:, 0]), fill_value=0)
.astype(int))
.reset_index()
.to_numpy())
Output:
[[1 2]
[2 4]
[3 2]
[4 5]
[5 3]]