Generating an array of arrays in Python

Generating an array of arrays in Python - python

I want to multiply each element of B to the whole array A to obtain P. The current and desired outputs are attached. The desired output is basically an array consisting of 2 arrays since there are two elements in B.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P=B*A
print(P)
It currently produces an error:
ValueError: operands could not be broadcast together with shapes (2,) (3,3)
The desired output is
array(([[0.02109, 0.04218, 0.06327],
[0.08436, 0.10545, 0.12654],
[0.14763, 0.16872, 0.18981]]),
([[0.00775858, 0.01551716, 0.02327574],
[0.03103432, 0.0387929 , 0.04655148],
[0.05431006, 0.06206864, 0.06982722]]))

You can do this by:
B.reshape(-1, 1, 1) * A
or
B[:, None, None] * A
where -1 or : refer to B.shape[0] which was 2 and 1, 1 or None, None add two additional dimensions to B to get the desired result shape which was (2, 3, 3).

The easiest way i can think of is using list comprehension and then casting back to numpy.ndarray
np.asarray([A*i for i in B])
Answer :
array([[[0.02109 , 0.04218 , 0.06327 ],
[0.08436 , 0.10545 , 0.12654 ],
[0.14763 , 0.16872 , 0.18981 ]],
[[0.00775858, 0.01551715, 0.02327573],
[0.03103431, 0.03879289, 0.04655146],
[0.05431004, 0.06206862, 0.0698272 ]]])

There are many possible ways for this:
Here is an overview on their runtime for the given array (bare in mind these will change for bigger arrays):
reshape: 0.000174 sec
tensordot: 0.000550 sec
einsum: 0.000196 sec
manual loop: 0.000326 sec
See the implementation for each of these:
numpy reshape
Find documentation here:
Link
Gives a new shape to an array without changing its data.
Here we reshape the array B so we can later multiply it:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = B.reshape(-1, 1, 1) * A
print(P)
numpy tensordot
Find documentation here:
Link
Given two tensors, a and b, and an array_like object containing two
array_like objects, (a_axes, b_axes), sum the products of a’s and b’s
elements (components) over the axes specified by a_axes and b_axes.
The third argument can be a single non-negative integer_like scalar,
N; if it is such, then the last N dimensions of a and the first N
dimensions of b are summed over.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.tensordot(B, A, 0)
print(P)
numpy einsum (Einstein summation)
Find documentation here:
Link
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.einsum('ij,k', A, B)
print(P)
Note: A has two dimensions, we assign ij for their indexes. B has one dimension, we assign k to its index
manual loop
Another simple approach would be a loop (is faster than tensordot for the given input). This approach could be made "numpy free" if you dont want to use numpy for some reason. Here is the version with numpy:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
products = []
for b in B:
products.append(b*A)
P = np.array(products)
print(P)
#or the same as one-liner: np.asarray([A * elem for elem in B])

Related

Getting a column index in numpy

I'm pretty new to NumPy and I'm looking for a way to get the index of a current column I'm iterating over in a matrix.
import numpy as np
#sum of elements in each column
def p_b(mtrx):
b = []
for c in mtrx.T:
summ = 0
for i in c:
summ += i
b.append(summ)
return b
#return modified matrix where each element is equal to itself divided by
#the sum of the current column in the original matrix
def a_div_b(mtrx):
for c in mtrx:
for i in c:
#change i to be i/p_b(mtrx)[index_of_a_current_column]
return mtrx
For the input ([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) the result would be
([[1/12, 2/12, 3/12], [4/15, 5/15, 6/15], [7/18, 8/18, 9/18]]).
Any ideas about how I can achieve that?

You don't need those functions and loops to do that. Those will not be efficient. When using numpy, go for vectorized operations whenever is possible (in most cases it is possible). numpy broadcasting rules are used to perform mathematical operation between arrays of different dimensions, when possible, such that you can use vectorization, which is much more efficient than python loops.
In your case, say that your array arr is:
arr = np.arange(1, 10)
arr.shape = (3, 3)
#arr is:
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
you can achieve the desired result with:
res = (arr.T / arr.sum(axis=0)).T
>>> res
array([[0.08333333, 0.16666667, 0.25 ],
[0.26666667, 0.33333333, 0.4 ],
[0.38888889, 0.44444444, 0.5 ]])
numpy sum allows you to sum your array along a given axis if the axis parameter is given. 0 is the inner axis, the one you want to sum.
.T gives the transposed matrix. You need to transpose to perform the division on the correct axis and then transpose back.

find indeces of grouped-item matches between two arrays

a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
I have two lists of integers that each is grouped by every 2 consecutive items (ie indices [0, 1], [2, 3] and so on).
The pairs of items cannot be found as duplicates in either list, neither in the same or the reverse order.
One list is significantly larger and inclusive of the other.
I am trying to figure out an efficient way to get the indices
of the larger list's grouped items that are also in the smaller one.
The desired output in the example above should be:
[2,3,6,7,10,11] #indices
Notice that, as an example, the first group ([3,4]) should not get indices 11,12 as a match because in that case 3 is the second element of [1,3] and 4 the first element of [4,7].

Since you are grouping your arrays by pairs, you can reshape them into 2 columns for comparison. You can then compare each of the elements in the shorter array to the longer array, and reduce the boolean arrays. From there it is a simple matter to get the indices using a reshaped np.arange.
import numpy as np
from functools import reduce
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
# reshape a and b into columns
a2 = a.reshape((-1,2))
b2 = b.reshape((-1,2))
# create a generator of bools for the row of a2 that holds b2
b_in_a_generator = (np.all(a2==row, axis=1) for row in b2)
# reduce the generator to get an array of boolean that is True for each row
# of a2 that equals one of the rows of b2
ix_bool = reduce(lambda x,y: x+y, b_in_a_generator)
# grab the indices by slicing a reshaped np.arange array
ix = np.arange(len(a)).reshape((-1,2))[ix_bool]
ix
# returns:
array([[ 2, 3],
[ 6, 7],
[10, 11]])
If you want a flat array, simply ravel ix
ix.ravel()
# returns
array([ 2, 3, 6, 7, 10, 11])

Here's one approach making use of NumPy view of group of elements -
# Taken from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def grouped_indices(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
sidx = a0v.argsort()
idx = sidx[np.searchsorted(a0v,b0v, sorter=sidx)]
return ((idx*2)[:,None] + [0,1]).ravel()
If there isn't a membership between any group from b in a, we could filter that out using a mask : a0v[idx] == b0v.
Sample run -
In [345]: a
Out[345]: array([5, 8, 3, 4, 2, 5, 7, 8, 1, 9, 1, 3, 4, 7])
In [346]: b
Out[346]: array([3, 4, 7, 8, 1, 3])
In [347]: grouped_indices(a, b)
Out[347]: array([ 2, 3, 6, 7, 10, 11])
Another one using np.in1d to replace np.searchsorted -
def grouped_indices_v2(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
return (np.flatnonzero(np.in1d(a0v, b0v))[:,None]*2 + [0,1]).ravel()

Efficently multiply a matrix with itself after offsetting it by one in numpy

I am trying to write a function that takes a matrix A, then offsets it by one, and does element wise matrix multiplication on the shared area. Perhaps an example will help. Suppose I have the matrix:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What i'd like returned is:
(1*2) + (4*5) + (7*8) = 78
The following code does it, but inefficently:
import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
Height = A.shape[0]
Width = A.shape[1]
Sum1 = 0
for y in range(0, Height):
for x in range(0,Width-2):
Sum1 = Sum1 + \
A.item(y,x)*A.item(y,x+1)
print("%d * %d"%( A.item(y,x),A.item(y,x+1)))
print(Sum1)
With output:
1 * 2
4 * 5
7 * 8
78
Here is my attempt to write the code more efficently with numpy:
import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(np.sum(np.multiply(A[:,0:-1], A[:,1:])))
Unfortunately, this time I get 186. I am at a loss where did I go wrong. i'd love someone to either correcty me or offer another way to implement this.
Thank you.

In this 3 column case, you are just multiplying the 1st 2 columns, and taking the sum:
A[:,:2].prod(1).sum()
Out[36]: 78
Same as (A[:,0]*A[:,1]).sum()
Now just how does that generalize to more columns?
In your original loop, you can cut out the row iteration by taking the sum of this list:
[A[:,x]*A[:,x+1] for x in range(0,A.shape[1]-2)]
Out[40]: [array([ 2, 20, 56])]
Your description talks about multiplying the shared area; what direction are you doing the offset? From the calculation it looks like the offset is negative.
A[:,:-1]
Out[47]:
array([[1, 2],
[4, 5],
[7, 8]])
If that is the offset logic, than I could rewrite my calculation as
A[:,:-1].prod(1).sum()
which should work for many more columns.
===================
Your 2nd try:
In [3]: [A[:,:-1],A[:,1:]]
Out[3]:
[array([[1, 2],
[4, 5],
[7, 8]]),
array([[2, 3],
[5, 6],
[8, 9]])]
In [6]: A[:,:-1]*A[:,1:]
Out[6]:
array([[ 2, 6],
[20, 30],
[56, 72]])
In [7]: _.sum()
Out[7]: 186
In other words instead of 1*2, you are calculating [1,2]*[2*3]=[2,6]. Nothing wrong with that, if that's you you really intend. The key is being clear about 'offset' and 'overlap'.

Way of easily finding the average of every nth element over a window of size k in a pandas.Series? (not the rolling mean)

The motivation here is to take a time series and get the average activity throughout a sub-period (day, week).
It is possible to reshape an array and take the mean over the y axis to achieve this, similar to this answer (but using axis=2):
Averaging over every n elements of a numpy array
but I'm looking for something which can handle arrays of length N%k != 0 and does not solve the issue by reshaping and padding with ones or zeros (e.g numpy.resize), i.e takes the average over the existing data only.
E.g Start with a sequence [2,2,3,2,2,3,2,2,3,6] of length N=10 which is not divisible by k=3. What I want is to take the average over columns of a reshaped array with mis-matched dimensions:
In: [[2,2,3],
[2,2,3],
[2,2,3],
[6]], k =3
Out: [3,2,3]
Instead of:
In: [[2,2,3],
[2,2,3],
[2,2,3],
[6,0,0]], k =3
Out: [3,1.5,2.25]
Thank you.

You can use a masked array to pad with special values that are ignored when finding the mean, instead of summing.
k = 3
# how long the array needs to be to be divisible by 3
padded_len = (len(in_arr) + (k - 1)) // k * k
# create a np.ma.MaskedArray with padded entries masked
padded = np.ma.empty(padded_len)
padded[:len(in_arr)] = in_arr
padded[len(in_arr):] = np.ma.masked
# now we can treat it an array divisible by k:
mean = padded.reshape((-1, k)).mean(axis=0)
# if you need to remove the masked-ness
assert not np.ma.is_masked(mean), "in_arr was too short to calculate all means"
mean = mean.data

You can easily do it by padding, reshaping and calculating by how many elements to divide each row:
>>> import numpy as np
>>> a = np.array([2,2,3,2,2,3,2,2,3,6])
>>> k = 3
Pad data
>>> b = np.pad(a, (0, k - a.size%k), mode='constant').reshape(-1, k)
>>> b
array([[2, 2, 3],
[2, 2, 3],
[2, 2, 3],
[6, 0, 0]])
Then create a mask:
>>> c = a.size // k # 3
>>> d = (np.arange(k) + c * k) < a.size # [True, False, False]
The first part of d will create an array that contains [9, 10, 11], and compare it to the size of a (10), generating the mentioned boolean mask.
And divide it:
>>> b.sum(0) / (c + 1.0 * d)
array([ 3., 2., 3.])
The above will divide the first column by 4 (c + 1 * True) and the rest by 3. This is vectorized numpy, thus, it scales very well to large arrays.
Everything can be written shorter, I just show all the steps to make it more clear.

Flatten the list In by unpacking and chaining. Create a new list that arranges the flattened list lst by columns, then use the map function to calculate the average of each column:
from itertools import chain
In = [[2, 2, 3], [2, 2, 3], [2, 2, 3], [6]]
lst = chain(*In)
k = 3
In_by_cols = [lst[i::k] for i in range(k)]
# [[2, 2, 2, 6], [2, 2, 2], [3, 3, 3]]
Out = map(lambda x: sum(x)/ float(len(x)), In_by_cols)
# [3.0, 2.0, 3.0]
Using float on the length of each sublist will provide a more accurate result on python 2.x as it won't do integer truncation.

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])

For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!

Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generating an array of arrays in Python - python

You can do this by: B.reshape(-1, 1, 1) * A or B[:, None, None] * A where -1 or : refer to B.shape[0] which was 2 and 1, 1 or None, None add two additional dimensions to B to get the desired result shape which was (2, 3, 3).

Related

Getting a column index in numpy

find indeces of grouped-item matches between two arrays

Efficently multiply a matrix with itself after offsetting it by one in numpy

Way of easily finding the average of every nth element over a window of size k in a pandas.Series? (not the rolling mean)

Convert a numpy array to an array of numpy arrays

Categories

Resources