How to select item from 2 arrays? - python

I have 2 arrays a and b that contain some data. I also have an array select that I wish to use to select from either a or b. I was just wondering if there is a pythonic way to do so. Here is my current implementation which puts each row of a and b into a list then selects from it.
a = np.zeros(shape=(10,2,1,3,4))
b = np.ones(shape=(10,2,1,3,4))
select = [1,1,1,0,1,0,1,0,1,0]
c = []
for a1,b1,select1 in zip(a,b,select):
a1b1 = [a1,b1]
c.append(a1b1[select])

If you choose only from two arrays, you can use select as a weight:
w = np.array(select)
c = (a.T * w + b.T * (1 - w)).T
This hack does not work if you want to combine more than two arrays.

In [512]: a = np.zeros(shape=(10,2,1,3,4))
...: b = np.ones(shape=(10,2,1,3,4))
...:
...: select = [1,1,1,0,1,0,1,0,1,0]
Using transpose in the same way that #DYZ does:
In [513]: res = np.where(select, a.T, b.T).T
In [514]: res.shape
Out[514]: (10, 2, 1, 3, 4)
If you don't want to transpose, you could add dimensions to select so that it broadcasts to the same shape as a and b:
In [516]: res1 = np.where(np.array(select)[:,None,None,None,None],a,b)
In [517]: res1.shape
Out[517]: (10, 2, 1, 3, 4)
In [518]: np.allclose(res,res1)
Out[518]: True

np.where is a very efficient solution but only works for up to two arrays.
I'd do it this way:
np.stack((a, b))[select, np.arange(len(a))]
This should scale to any number of arrays, e.g.:
a = ...
b = ...
# ...
z = ...
select = [0, 13, 2, 5, 25, ...]
np.stack((a, b, ... z))[select, np.arange(len(a))]

Related

Merge two arrays from a given index - PYTHON

I am wondering if there is an easy way to 'append' new elements to an array, but not at the end
Imagine I have a vector
a = np.array([1,2,3,4,5,6])
I want to append a new vector
b = np.array([1,1,1,1])
to a, starting from element 3 so that the new array would be
c = np.array([1,2,3,5,6,7,1])
that is the last 3 elements of array a are resulting from a+b while the new element just belong to C
Any ideas?
THX
I tried just append!
Using numpy with pad:
a = np.array([1,2,3,4,5,6])
b = np.array([1,1,1,1])
# or
# a = [1,2,3,4,5,6]
# b = [1,1,1,1]
n = 3
extra = len(b)-len(a)+n
c = np.pad(a, (0, extra))
c[n:] += b
Output:
array([1, 2, 3, 5, 6, 7, 1])

How can I construct an if statement for numpy arrays' elements comparison, to produce a new array with same dimension?

import numpy as np
a = np.array([[3.5,6,8,2]])
b = np.array([[6,2,8,2]])
c = np.array([[2,3,7,5]])
d = np.array([[3,2,5,1]])
if a > b:
e = 2*a+6*c
else:
e = 3*c + 4*d
print(e)
then I got
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
and if I type in print(e), I got
[2, 3, 7, 5, 2, 3, 7, 5, 2, 3, 7, 5, 3, 2, 5, 1, 3, 2, 5, 1, 3, 2, 5, 1, 3, 2, 5, 1]
The e I want to construct is an array that has the same dimension with a,b,c,d , and the if statement that decides what equation will be used to make each element.
In other words, for the elements in of the first place of a and b: 3.5<6, so e = 3c + 4d = 32 + 43 = 18
For the second elements: 6>2, e = 2a+6c = 26 + 63 = 30
Third: 8=8, e = 3c + 4d = 37 + 45 = 41
Fourth: 2 = 2, e = 3c + 4d = 35 + 41 = 19
e = [18,30,41,19]
I tried to find someone who asked about constructing a script doing such things, but I could find none, and all numpy questions about if statement(or equivalent) did not help. Thanks.(It seems that a.all or a.any from the python recommendation did not help as well.)
Use numpy.where:
Return elements chosen from x or y depending on condition.
e = np.where(a > b, 2*a+6*c, 3*c + 4*d)
In [370]: a = np.array([[3.5,6,8,2]])
...: b = np.array([[6,2,8,2]])
...: c = np.array([[2,3,7,5]])
...: d = np.array([[3,2,5,1]])
...:
In [371]: a.shape
Out[371]: (1, 4)
In [372]: a[0].shape
Out[372]: (4,)
The problem with the if is that a>b is an array. There's no one True/False value to "switch" on:
In [373]: a>b
Out[373]: array([[False, True, False, False]])
where does the array "switch"; an equivalent way is:
In [376]: mask = a>b
In [377]: e = 3*c + 4*d
In [378]: e
Out[378]: array([[18, 17, 41, 19]])
In [379]: e[mask] = 2*a[mask] + 6*c[mask]
In [380]: e
Out[380]: array([[18, 30, 41, 19]])
np.where itself does not iterate (many pandas users seem assume it does). It works with the whole arrays, the condition array (my mask) and the whole array values.
To use the if, we have to work with scalar values, not arrays. Wrap the if/else in a loop. For example:
In [381]: alist = []
In [382]: for i,j,k,l in zip(a[0],b[0],c[0],d[0]):
...: if i>j:
...: f = 2*i+6*k
...: else:
...: f = 3*k+4*l
...: alist.append(f)
...:
In [383]: alist
Out[383]: [18, 30.0, 41, 19]
This works because i and j are single numbers, not arrays.

How to perform matrix multiplication between two 3D tensors along the first dimension?

I wish to compute the dot product between two 3D tensors along the first dimension. I tried the following einsum notation:
import numpy as np
a = np.random.randn(30).reshape(3, 5, 2)
b = np.random.randn(30).reshape(3, 2, 5)
# Expecting shape: (3, 5, 5)
np.einsum("ijk,ikj->ijj", a, b)
Sadly it returns this error:
ValueError: einstein sum subscripts string includes output subscript 'j' multiple times
I went with Einstein sum after I failed at it with np.tensordot. Ideas and follow up questions are highly welcome!
Your two dimensions of size 5 and 5 do not correspond to the same axes. As such you need to use two different subscripts to designate them. For example, you can do:
>>> res = np.einsum('ijk,ilm->ijm', a, b)
>>> res.shape
(3, 5, 5)
Notice you are also required to change the subscript for axes of size 2 and 2. This is because you are computing the batched outer product (i.e. we iterate on two axes at the same time), not a dot product (i.e. we iterate simultaneously on the two axes).
Outer product:
>>> np.einsum('ijk,ilm->ijm', a, b)
Dot product over subscript k, which is axis=2 of a and axis=1 of b:
>>> np.einsum('ijk,ikm->ijm', a, b)
which is equivalent to a#b.
dot product ... along the first dimension is a bit unclear. Is the first dimension a 'batch' dimension, with 3 dot's on the rest? Or something else?
In [103]: a = np.random.randn(30).reshape(3, 5, 2)
...: b = np.random.randn(30).reshape(3, 2, 5)
In [104]: (a#b).shape
Out[104]: (3, 5, 5)
In [105]: np.einsum('ijk,ikl->ijl',a,b).shape
Out[105]: (3, 5, 5)
#Ivan's answer is different:
In [106]: np.einsum('ijk,ilm->ijm', a, b).shape
Out[106]: (3, 5, 5)
In [107]: np.allclose(np.einsum('ijk,ilm->ijm', a, b), a#b)
Out[107]: False
In [108]: np.allclose(np.einsum('ijk,ikl->ijl', a, b), a#b)
Out[108]: True
Ivan's sums the k dimension of one, and l of the other, and then does a broadcasted elementwise. That is not matrix multiplication:
In [109]: (a.sum(axis=-1,keepdims=True)* b.sum(axis=1,keepdims=True)).shape
Out[109]: (3, 5, 5)
In [110]: np.allclose((a.sum(axis=-1,keepdims=True)* b.sum(axis=1,keepdims=True)),np.einsum('ijk,ilm->ijm', a,
...: b))
Out[110]: True
Another test of the batch processing:
In [112]: res=np.zeros((3,5,5))
...: for i in range(3):
...: res[i] = a[i]#b[i]
...: np.allclose(res, a#b)
Out[112]: True

Numpy array insert every second element from second array

I have two arrays of the same shape and now want to combine them by making every odd element and 0 one of the first array and every even one of the second array in the same order.
E.g.:
a = ([0,1,3,5])
b = ([2,4,6])
c = ([0,1,2,3,4,5,6])
I tried something including modulo to identify uneven indices:
a = ([0,1,3,5])
b = ([2,4,6])
c = a
i = 0
j = 2
l = 0
for i in range(1,22):
k = (i+j) % 2
if k > 0:
c = np.insert(c, i, b[l])
l+=1
else:
continue
I guess there is some easier/faster slicing option, but can't figure it out.
np.insert would work well:
>>> A = np.array([1, 3, 5, 7])
>>> B = np.array([2, 4, 6, 8])
>>> np.insert(B, np.arange(len(A)), A)
array([1, 2, 3, 4, 5, 6, 7, 8])
However, if you don't rely on sorted values, try this:
>>> A = np.array([5, 3, 1])
>>> B = np.array([1, 2, 3])
>>> C = [ ]
>>> for element in zip(A, B):
C.extend(element)
>>> C
[5, 1, 3, 2, 1, 3]
read the documentation of the range
for i in range(0,10,2):
print(i)
will print [0,2,4,6,8]
From what I understand, the first element in a is always first the rest are just intereleaved. If that is the case, then some clever use of stacking and reshaping is probably enough.
a = np.array([0,1,3,5])
b = np.array([2,4,6])
c = np.hstack([a[:1], np.vstack([a[1:], b]).T.reshape((-1, ))])
You could try something like this
import numpy as np
A = [0,1,3,5]
B = [2,4,6]
lst = np.zeros(len(A)+len(B))
lst[0]=A[0]
lst[1::2] = A[1:]
lst[2::2] = B
Even though I don't understand why you would make it so complicated

what is the most pythonic way to split a 2d array to arrays of each row?

I have a function foo that returns an array with the shape (1000, 2)
how can I split it to two arrays a(1000) and b(1000)
I'm looking for something like this:
a;b = foo()
I'm looking for an answer that can easily generalize to the case in which the shape is (1000, 5) or so.
The zip(*...) idiom transposes a traditional more-dimensional Python list:
x = [[1,2], [3,4], [5,6]]
# get columns
a, b = zip(*x) # zip(*foo())
# a, b = map(list, zip(*x)) # if you prefer lists over tuples
a
# (1, 3, 5)
# get rows
a, b, c = x
a
# [1, 2]
Transpose and unpack?
a, b = foo().T
>>> a, b = np.arange(20).reshape(-1, 2).T
>>> a
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
>>> b
array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
You can use numpy.hsplit.
x = np.arange(12).reshape((3, 4))
np.hsplit(x, x.shape[1])
This returns a list of subarrays. Note that in the case of a 2d input, the subarrays will be shape (n, 1). Unless you wrap a function around it to squeeze them to 1d:
def split_1d(arr_2d):
"""Split 2d NumPy array on its columns."""
split = np.hsplit(arr_2d, arr_2d.shape[1])
split = [np.squeeze(arr) for arr in split]
return split
a, b, c, d = split_1d(x)
a
# array([0, 4, 8])
d
# array([ 3, 7, 11])
You could just use list comprehensions, e.g.
(a,b)=([i[0] for i in mylist],[i[1] for i in mylist])
To generalise you could use a comprehension within a comprehension:
(a,b,c,d,e)=([row[i] for row in mylist] for i in range(5))
You can do this simply by using zip function like:
def foo(mylist):
return zip(*mylist)
Now call foo with as much dimension as you have in mylist, and it would do the requisite like:
mylist = [[1, 2], [3, 4], [5, 6]]
a, b = foo(mylist)
# a = (1, 3, 5)
# b = (2, 4, 6)
So this is a little nuts, but if you want to assign different letters to each sub-array in your array, and do so for any number of sub-arrays (up to 26 because alphabet), you could do:
import string
letters = list(string.ascii_lowercase) # get all of the lower-case letters
arr_dict = {k: v for k, v in zip(letters, foo())}
or more simply (for the last line):
arr_dict = dict(zip(letters, foo()))
Then you can access each individual element as arr_dict['a'] or arr_dict['b']. This feels a little mad-scientist-ey to me, but I thought it was fun.

Categories