Convert a numpy array to an array of numpy arrays - python

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])

For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!

Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

Related

Generating an array of arrays in Python

I want to multiply each element of B to the whole array A to obtain P. The current and desired outputs are attached. The desired output is basically an array consisting of 2 arrays since there are two elements in B.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P=B*A
print(P)
It currently produces an error:
ValueError: operands could not be broadcast together with shapes (2,) (3,3)
The desired output is
array(([[0.02109, 0.04218, 0.06327],
[0.08436, 0.10545, 0.12654],
[0.14763, 0.16872, 0.18981]]),
([[0.00775858, 0.01551716, 0.02327574],
[0.03103432, 0.0387929 , 0.04655148],
[0.05431006, 0.06206864, 0.06982722]]))
You can do this by:
B.reshape(-1, 1, 1) * A
or
B[:, None, None] * A
where -1 or : refer to B.shape[0] which was 2 and 1, 1 or None, None add two additional dimensions to B to get the desired result shape which was (2, 3, 3).
The easiest way i can think of is using list comprehension and then casting back to numpy.ndarray
np.asarray([A*i for i in B])
Answer :
array([[[0.02109 , 0.04218 , 0.06327 ],
[0.08436 , 0.10545 , 0.12654 ],
[0.14763 , 0.16872 , 0.18981 ]],
[[0.00775858, 0.01551715, 0.02327573],
[0.03103431, 0.03879289, 0.04655146],
[0.05431004, 0.06206862, 0.0698272 ]]])
There are many possible ways for this:
Here is an overview on their runtime for the given array (bare in mind these will change for bigger arrays):
reshape: 0.000174 sec
tensordot: 0.000550 sec
einsum: 0.000196 sec
manual loop: 0.000326 sec
See the implementation for each of these:
numpy reshape
Find documentation here:
Link
Gives a new shape to an array without changing its data.
Here we reshape the array B so we can later multiply it:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = B.reshape(-1, 1, 1) * A
print(P)
numpy tensordot
Find documentation here:
Link
Given two tensors, a and b, and an array_like object containing two
array_like objects, (a_axes, b_axes), sum the products of a’s and b’s
elements (components) over the axes specified by a_axes and b_axes.
The third argument can be a single non-negative integer_like scalar,
N; if it is such, then the last N dimensions of a and the first N
dimensions of b are summed over.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.tensordot(B, A, 0)
print(P)
numpy einsum (Einstein summation)
Find documentation here:
Link
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.einsum('ij,k', A, B)
print(P)
Note: A has two dimensions, we assign ij for their indexes. B has one dimension, we assign k to its index
manual loop
Another simple approach would be a loop (is faster than tensordot for the given input). This approach could be made "numpy free" if you dont want to use numpy for some reason. Here is the version with numpy:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
products = []
for b in B:
products.append(b*A)
P = np.array(products)
print(P)
#or the same as one-liner: np.asarray([A * elem for elem in B])

Getting a column index in numpy

I'm pretty new to NumPy and I'm looking for a way to get the index of a current column I'm iterating over in a matrix.
import numpy as np
#sum of elements in each column
def p_b(mtrx):
b = []
for c in mtrx.T:
summ = 0
for i in c:
summ += i
b.append(summ)
return b
#return modified matrix where each element is equal to itself divided by
#the sum of the current column in the original matrix
def a_div_b(mtrx):
for c in mtrx:
for i in c:
#change i to be i/p_b(mtrx)[index_of_a_current_column]
return mtrx
For the input ([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) the result would be
([[1/12, 2/12, 3/12], [4/15, 5/15, 6/15], [7/18, 8/18, 9/18]]).
Any ideas about how I can achieve that?
You don't need those functions and loops to do that. Those will not be efficient. When using numpy, go for vectorized operations whenever is possible (in most cases it is possible). numpy broadcasting rules are used to perform mathematical operation between arrays of different dimensions, when possible, such that you can use vectorization, which is much more efficient than python loops.
In your case, say that your array arr is:
arr = np.arange(1, 10)
arr.shape = (3, 3)
#arr is:
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
you can achieve the desired result with:
res = (arr.T / arr.sum(axis=0)).T
>>> res
array([[0.08333333, 0.16666667, 0.25 ],
[0.26666667, 0.33333333, 0.4 ],
[0.38888889, 0.44444444, 0.5 ]])
numpy sum allows you to sum your array along a given axis if the axis parameter is given. 0 is the inner axis, the one you want to sum.
.T gives the transposed matrix. You need to transpose to perform the division on the correct axis and then transpose back.

find indeces of grouped-item matches between two arrays

a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
I have two lists of integers that each is grouped by every 2 consecutive items (ie indices [0, 1], [2, 3] and so on).
The pairs of items cannot be found as duplicates in either list, neither in the same or the reverse order.
One list is significantly larger and inclusive of the other.
I am trying to figure out an efficient way to get the indices
of the larger list's grouped items that are also in the smaller one.
The desired output in the example above should be:
[2,3,6,7,10,11] #indices
Notice that, as an example, the first group ([3,4]) should not get indices 11,12 as a match because in that case 3 is the second element of [1,3] and 4 the first element of [4,7].
Since you are grouping your arrays by pairs, you can reshape them into 2 columns for comparison. You can then compare each of the elements in the shorter array to the longer array, and reduce the boolean arrays. From there it is a simple matter to get the indices using a reshaped np.arange.
import numpy as np
from functools import reduce
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
# reshape a and b into columns
a2 = a.reshape((-1,2))
b2 = b.reshape((-1,2))
# create a generator of bools for the row of a2 that holds b2
b_in_a_generator = (np.all(a2==row, axis=1) for row in b2)
# reduce the generator to get an array of boolean that is True for each row
# of a2 that equals one of the rows of b2
ix_bool = reduce(lambda x,y: x+y, b_in_a_generator)
# grab the indices by slicing a reshaped np.arange array
ix = np.arange(len(a)).reshape((-1,2))[ix_bool]
ix
# returns:
array([[ 2, 3],
[ 6, 7],
[10, 11]])
If you want a flat array, simply ravel ix
ix.ravel()
# returns
array([ 2, 3, 6, 7, 10, 11])
Here's one approach making use of NumPy view of group of elements -
# Taken from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def grouped_indices(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
sidx = a0v.argsort()
idx = sidx[np.searchsorted(a0v,b0v, sorter=sidx)]
return ((idx*2)[:,None] + [0,1]).ravel()
If there isn't a membership between any group from b in a, we could filter that out using a mask : a0v[idx] == b0v.
Sample run -
In [345]: a
Out[345]: array([5, 8, 3, 4, 2, 5, 7, 8, 1, 9, 1, 3, 4, 7])
In [346]: b
Out[346]: array([3, 4, 7, 8, 1, 3])
In [347]: grouped_indices(a, b)
Out[347]: array([ 2, 3, 6, 7, 10, 11])
Another one using np.in1d to replace np.searchsorted -
def grouped_indices_v2(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
return (np.flatnonzero(np.in1d(a0v, b0v))[:,None]*2 + [0,1]).ravel()

Forming matrix from 2 vectors in Numpy, with repetition of 1 vector

Using numpy arrays I want to create such a matrix most economically:
given
from numpy import array
a = array(a1,a2,a3,...,an)
b = array(b1,...,bm)
shall be processed to matrix M:
M = array([[a1,a2,b1,...,an],
... ...,
[a1,a2,bm,...,an]]
I am aware of numpy array's broadcasting methods but couldn't figure out a good way.
Any help would be much appreciated,
cheers,
Rob
You can use numpy.resize on a first and then add b's items at the required indices using numpy.insert on the re-sized array:
In [101]: a = np.arange(1, 4)
In [102]: b = np.arange(4, 6)
In [103]: np.insert(np.resize(a, (b.shape[0], a.shape[0])), 2, b, axis=1)
Out[103]:
array([[1, 2, 4, 3],
[1, 2, 5, 3]])
You can use a combination of numpy.tile and numpy.hstack functions.
M = numpy.repeat(numpy.hstack(a, b), (N,1))
I'm not sure I understand your target matrix, though.

Euclidean distances between several images and one base image

I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?
All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances
use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])
Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!
a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)

Categories