how to merge two 3d-arrays on the 2nd dimension efficiently? - python

Lets say I have two 3 dimensional arrays (a & b) of shape (1.000.000, ???, 50), (??? = see below).
How to merge them,
so that the result will be (1.000.000, {shape of a's + b's second dimension} , 50)?
Here are the samples, as you can see below: (np.arrays are also possible)
EDIT: added usable code, please scroll^^
[ #a
[
],
[
[1 2 3]
],
[
[0 2 7]
[1 Nan 3]
],
[
[10 0 3]
[NaN 9 9]
[10 NaN 3]
],
[
[8 2 0]
[2 2 3]
[8 1 3]
[1 2 3]
],
[
[0 2 3]
[1 2 9]
[1 2 3]
[1 0 3]
[1 2 3]
]
]
[#b
[
[7 2 3]
[1 2 9]
[1 2 3]
[8 0 3]
[1 7 3]
]
[
[3 9 0]
[2 2 3]
[8 1 3]
[0 2 3]
],
[
[10 0 3]
[0 NaN 9]
[10 NaN 3]
],
[
[0 2 NaN]
[1 Nan 3]
],
[
[1 2 NaN]
],
[
]
]
a = [ [ ],
[ [1, 2, 3] ],
[ [0, 2, 7], [1,np.nan,3] ],
[
[10,0,3], [np.nan,9,9], [10,np.nan,3]
],
[
[8,2,0], [2,2,3], [8,1,3], [1,2,3]
],
[
[0,2,3], [1,2,9], [1,2,3], [1,0,3], [1,2,3]
]
]
b = [
[
[7,2,3], [1,2,9], [1,2,3], [8,0,3], [1,7,3]
],
[
[3,9,0], [2,2,3], [8,1,3], [0,2,3]
],
[
[10,0,3], [0,np.nan,9], [10,np.nan,3]
],
[
[0,2,np.nan], [1,np.nan,3]
],
[
[1,2,np.nan]
],
[
]
]
expected outcome:
[
[ [7 2 3]# from b
[1 2 9]# from b
[1 2 3]# from b
[8 0 3]# from b
[1 7 3]# from b
],
[
[1 2 3]
[3 9 0]# from b
[2 2 3]# from b
[8 1 3]# from b
[0 2 3]# from b
],
[
[0 2 7]
[1 Nan 3]
[10 0 3]# from b
[0 NaN 9]# from b
[10 NaN 3]# from b
],
[
[10 0 3]
[NaN 9 9]
[10 NaN 3]
[0 2 NaN]# from b
[1 Nan 3]# from b
],
[
[8 2 0]
[2 2 3]
[8 1 3]
[1 2 3]
[1 2 NaN]# from b
],
[
[0 2 3]
[1 2 9]
[1 2 3]
[1 0 3]
[1 2 3]
]
]
Do you know a way to do that efficiently?
EDIT: tried concatenate (didnt work):
DF_LEN, COL_LEN, cols = 20,5,['A', 'B']
a = np.asarray(pd.DataFrame(1, index=range(DF_LEN), columns=cols))
a = list((map(lambda i: a[:i], range(1,a.shape[0]+1))))
b = np.asarray(pd.DataFrame(np.nan, index=range(DF_LEN), columns=cols))
b = list((map(lambda i: b[:i], range(1,b.shape[0]+1))))
b = b[::-1]
a_first = a[0]; del a[0]
b_last = b[-1]; del b[-1]
result = np.concatenate([a, b], axis=1)
>>>AxisError: axis 1 is out of bounds for array of dimension 1

You cannot have an array with variable length in a dimension. a and b are most likely list of lists and not arrays. You can use list comprehension along with zip:
np.array([x+y for x,y in zip(a,b)])
EDIT: or based on comment provided if a and b are lists of arrays:
np.array([np.vstack((x,y)) for x,y in zip(a,b)])
The output for your example looks like:
[[[ 7.  2.  3.]
  [ 1.  2.  9.]
  [ 1.  2.  3.]
  [ 8.  0.  3.]
  [ 1.  7.  3.]]
[[ 1.  2.  3.]
  [ 3.  9.  0.]
  [ 2.  2.  3.]
  [ 8.  1.  3.]
  [ 0.  2.  3.]]
[[ 0.  2.  7.]
  [ 1. nan  3.]
  [10.  0.  3.]
  [ 0. nan  9.]
  [10. nan  3.]]
[[10.  0.  3.]
  [nan  9.  9.]
  [10. nan  3.]
  [ 0.  2. nan]
  [ 1. nan  3.]]
[[ 8.  2.  0.]
  [ 2.  2.  3.]
  [ 8.  1.  3.]
  [ 1.  2.  3.]
  [ 1.  2. nan]]
[[ 0.  2.  3.]
  [ 1.  2.  9.]
  [ 1.  2.  3.]
  [ 1.  0.  3.]
  [ 1.  2.  3.]]]

To perform your concatenation, run:
result = np.concatenate([a, b], axis=1)
To test this code, I created a and b as:
a = np.stack([ np.full((2, 3), i) for i in range(1, 6)], axis=1)
b = np.stack([ np.full((2, 3), i + 10) for i in range(1, 4)], axis=1)
So they contain:
array([[[1, 1, 1], array([[[11, 11, 11],
[2, 2, 2], [12, 12, 12],
[3, 3, 3], [13, 13, 13]],
[4, 4, 4],
[5, 5, 5]], [[11, 11, 11],
[12, 12, 12],
[[1, 1, 1], [13, 13, 13]]])
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5]]])
and their shapes are: (2, 5, 3) and (2, 3, 3)
The result of my concatenation is:
array([[[ 1, 1, 1],
[ 2, 2, 2],
[ 3, 3, 3],
[ 4, 4, 4],
[ 5, 5, 5],
[11, 11, 11],
[12, 12, 12],
[13, 13, 13]],
[[ 1, 1, 1],
[ 2, 2, 2],
[ 3, 3, 3],
[ 4, 4, 4],
[ 5, 5, 5],
[11, 11, 11],
[12, 12, 12],
[13, 13, 13]]])
and the shape is (2, 8, 3), just as it should be.
Edit following the comment as of 19:56Z
I tried the code from your comment.
After you executed a = list((map(lambda i: a[:i], range(1,a.shape[0]+1)))),
the result is:
[array([[1, 1]], dtype=int64),
array([[1, 1],
[1, 1]], dtype=int64),
array([[1, 1],
[1, 1],
[1, 1]], dtype=int64),
array([[1, 1],
[1, 1],
[1, 1],
[1, 1]], dtype=int64),
array([[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1]], dtype=int64),
...
so a is a list of arrays of varying sizes.
Theres is something wrong in the way you construct your data.
First check that your both arrays are 3-D and their shapes differ
only in axis 1. Only then you can run my code on them.
For now both a and b are plain pythonic lists, not Numpy arrays!

Related

Get batched indices from stacked matrices - Python Jax

I would like to extract the indices of stacked matrices.
Let us say we have an array a of dimension (3, 2, 4), meaning that we have three arrays of dimension (2,4) and a list of indices (3, 2).
def get_cols(x,idx):
x = x[:,idx]
return x
idx = jnp.array([[0,1],[2,3],[1,2]])
a = jnp.array([[[1,2,3,4],
[3,2,2,4]],
[[100,20,3,50],
[5,5,2,4]],
[[1,2,3,4],
[3,2,2,4]]
])
e = jax.vmap(get_cols, in_axes=(None,0))(a,idx)
I want to extract the columns of the different matrices given a batch of indices. I expect the following result:
e = [[[[1,2],
[3,2]],
[[100,20],
[5,5]],
[[1,2],
[3,2]]],
[[[3,4],
[2,4]],
[[3,50],
[2,4]],
[[3,4],
[2,4]]],
[[[2,3],
[2,2]],
[[20,3],
[5,2]],
[[2,3],
[2,2]]]]
What am I missing?
It looks like you're interested in a double vmap over the inputs; e.g. something like this:
e = jax.vmap(jax.vmap(get_cols, in_axes=(0, None)), in_axes=(None, 0))(a, idx)
print(e)
[[[[ 1 2]
[ 3 2]]
[[100 20]
[ 5 5]]
[[ 1 2]
[ 3 2]]]
[[[ 3 4]
[ 2 4]]
[[ 3 50]
[ 2 4]]
[[ 3 4]
[ 2 4]]]
[[[ 2 3]
[ 2 2]]
[[ 20 3]
[ 5 2]]
[[ 2 3]
[ 2 2]]]]

How to change content of numpy array when indexing with a list?

could anyone explain me the reson why indexing the array using a list and using [x:x] lead to a very different result when manipulating numpy arrays?
Example:
a = np.array([[1,2,3,4],[3,4,5,5],[4,5,6,3], [1,2,5,5], [1, 2, 3, 4]])
print(a, '\n')
print(a[[3, 4]][:1][:, 1])
a[[3, 4]][:1][:, 1] = 99
print(a, '\n')
print(a[3:4][:1][:, 1])
a[3:4][:1][:, 1] = 99
print(a, '\n')
Output:
[[1 2 3 4]
[3 4 5 5]
[4 5 6 3]
[1 2 5 5]
[1 2 3 4]]
[2]
[[1 2 3 4]
[3 4 5 5]
[4 5 6 3]
[1 2 5 5]
[1 2 3 4]]
[2]
[[ 1 2 3 4]
[ 3 4 5 5]
[ 4 5 6 3]
[ 1 99 5 5]
[ 1 2 3 4]]
Is there a way to modify the array when indexing with a list?
Create an index that selects the desired elements without chaining:
In [114]: a[[3,4],1]=90
In [115]: a
Out[115]:
array([[ 1, 2, 3, 4],
[ 3, 4, 5, 5],
[ 4, 5, 6, 3],
[ 1, 90, 5, 5],
[ 1, 90, 3, 4]])

Python finding min. value in every column in 2D array

I have a 2D array, and I would like to find the min. value in every column and minus this min value in every column.
For example,
array = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]]
The smallest values in columns are 1, 2, 4.
I would like the result to be
array = [
[0, 0, 0],
[1, 2, 2],
[4, 5, 5]]
How can I achieve this?
If you use real numpy.array or pandas.DataFrame then you have arr.min(axis=0) and arr - arr.min(axis=0)
For numpy.array
import numpy as np
data = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]
]
arr = np.array(data)
print( arr.min(axis=0) )
print( arr - arr.min(axis=0) )
Result
[1 2 4]
[[0 0 0]
[1 2 2]
[4 5 5]]
Similar for pandas.DataFrame
import pandas as pd
data = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]
]
df = pd.DataFrame(data)
print( df.min(axis=0) )
print( df - df.min(axis=0) )
Result
0 1
1 2
2 4
dtype: int64
0 1 2
0 0 0 0
1 1 2 2
2 4 5 5

numpy nansum across first index

I have an example 2 x 2 x 2 array:
np.array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7 , 8]]])
I want the nansum of the array across the first index as follows:
Sum all values in:
[[ 1, 2],
[ 3, 4]]
and
[[ 5, 6],
[ 7 , 8]]
The sum of the first array would be 10 and the second would be 26
i.e.
array([10, 26])
I think you are looking for this
a = np.array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7 , 8]]])
np.nansum(a,axis=(1,2))
# array([10, 26])
because you want to sum on axis 1 and 2 only, and get one number per axis 0

Find Maximum of 3D np.array along Axis = 0

I have a 3D numpy array that looks like this:
X = [[[10 1] [ 2 10] [-5 3]]
[[-1 10] [ 0 2] [ 3 10]]
[[ 0 3] [10 3] [ 1 2]]
[[ 0 2] [ 0 0] [10 0]]]
At first I want the maximum along axis zero with X.max(axis = 0)):
which gives me:
[[10 10] [10 10] [10 10]]
The next step is now my problem; I would like to call the location of each 10 and create a new 2D array from another 3D array which has the same dimeonsions as X.
for example teh array with same dimensions looks like that:
Y = [[[11 2] [ 3 11] [-4 100]]
[[ 0 11] [ 100 3] [ 4 11]]
[[ 1 4] [11 100] [ 2 3]]
[[ 100 3] [ 1 1] [11 1]]]
I want to find the location of the maximum in X and create a 2D array from the numbers and location in Y.
the answer in this case should then be:
[[11 11] [11 11] [11 11]]
Thank you for your help in advance :)
you can do this with numpy.argmax and numpy.indices.
import numpy as np
X = np.array([[[10, 1],[ 2,10],[-5, 3]],
[[-1,10],[ 0, 2],[ 3,10]],
[[ 0, 3],[10, 3],[ 1, 2]],
[[ 0, 2],[ 0, 0],[10, 0]]])
Y = np.array([[[11, 2],[ 3,11],[-4, 100]],
[[ 0,11],[ 100, 3],[ 4,11]],
[[ 1, 4],[11, 100],[ 2, 3]],
[[ 100, 3],[ 1, 1],[11, 1]]])
ind = X.argmax(axis=0)
a1,a2=np.indices(ind.shape)
print X[ind,a1,a2]
# [[10 10]
# [10 10]
# [10 10]]
print Y[ind,a1,a2]
# [[11 11]
# [11 11]
# [11 11]]
The answer here provided the inspiration for this
You could try
Y[X==X.max(axis=0)].reshape(X.max(axis=0).shape)

Categories