Getting a column index in numpy - python

I'm pretty new to NumPy and I'm looking for a way to get the index of a current column I'm iterating over in a matrix.
import numpy as np
#sum of elements in each column
def p_b(mtrx):
b = []
for c in mtrx.T:
summ = 0
for i in c:
summ += i
b.append(summ)
return b
#return modified matrix where each element is equal to itself divided by
#the sum of the current column in the original matrix
def a_div_b(mtrx):
for c in mtrx:
for i in c:
#change i to be i/p_b(mtrx)[index_of_a_current_column]
return mtrx
For the input ([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) the result would be
([[1/12, 2/12, 3/12], [4/15, 5/15, 6/15], [7/18, 8/18, 9/18]]).
Any ideas about how I can achieve that?

You don't need those functions and loops to do that. Those will not be efficient. When using numpy, go for vectorized operations whenever is possible (in most cases it is possible). numpy broadcasting rules are used to perform mathematical operation between arrays of different dimensions, when possible, such that you can use vectorization, which is much more efficient than python loops.
In your case, say that your array arr is:
arr = np.arange(1, 10)
arr.shape = (3, 3)
#arr is:
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
you can achieve the desired result with:
res = (arr.T / arr.sum(axis=0)).T
>>> res
array([[0.08333333, 0.16666667, 0.25 ],
[0.26666667, 0.33333333, 0.4 ],
[0.38888889, 0.44444444, 0.5 ]])
numpy sum allows you to sum your array along a given axis if the axis parameter is given. 0 is the inner axis, the one you want to sum.
.T gives the transposed matrix. You need to transpose to perform the division on the correct axis and then transpose back.

Related

Generating an array of arrays in Python

I want to multiply each element of B to the whole array A to obtain P. The current and desired outputs are attached. The desired output is basically an array consisting of 2 arrays since there are two elements in B.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P=B*A
print(P)
It currently produces an error:
ValueError: operands could not be broadcast together with shapes (2,) (3,3)
The desired output is
array(([[0.02109, 0.04218, 0.06327],
[0.08436, 0.10545, 0.12654],
[0.14763, 0.16872, 0.18981]]),
([[0.00775858, 0.01551716, 0.02327574],
[0.03103432, 0.0387929 , 0.04655148],
[0.05431006, 0.06206864, 0.06982722]]))
You can do this by:
B.reshape(-1, 1, 1) * A
or
B[:, None, None] * A
where -1 or : refer to B.shape[0] which was 2 and 1, 1 or None, None add two additional dimensions to B to get the desired result shape which was (2, 3, 3).
The easiest way i can think of is using list comprehension and then casting back to numpy.ndarray
np.asarray([A*i for i in B])
Answer :
array([[[0.02109 , 0.04218 , 0.06327 ],
[0.08436 , 0.10545 , 0.12654 ],
[0.14763 , 0.16872 , 0.18981 ]],
[[0.00775858, 0.01551715, 0.02327573],
[0.03103431, 0.03879289, 0.04655146],
[0.05431004, 0.06206862, 0.0698272 ]]])
There are many possible ways for this:
Here is an overview on their runtime for the given array (bare in mind these will change for bigger arrays):
reshape: 0.000174 sec
tensordot: 0.000550 sec
einsum: 0.000196 sec
manual loop: 0.000326 sec
See the implementation for each of these:
numpy reshape
Find documentation here:
Link
Gives a new shape to an array without changing its data.
Here we reshape the array B so we can later multiply it:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = B.reshape(-1, 1, 1) * A
print(P)
numpy tensordot
Find documentation here:
Link
Given two tensors, a and b, and an array_like object containing two
array_like objects, (a_axes, b_axes), sum the products of a’s and b’s
elements (components) over the axes specified by a_axes and b_axes.
The third argument can be a single non-negative integer_like scalar,
N; if it is such, then the last N dimensions of a and the first N
dimensions of b are summed over.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.tensordot(B, A, 0)
print(P)
numpy einsum (Einstein summation)
Find documentation here:
Link
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.einsum('ij,k', A, B)
print(P)
Note: A has two dimensions, we assign ij for their indexes. B has one dimension, we assign k to its index
manual loop
Another simple approach would be a loop (is faster than tensordot for the given input). This approach could be made "numpy free" if you dont want to use numpy for some reason. Here is the version with numpy:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
products = []
for b in B:
products.append(b*A)
P = np.array(products)
print(P)
#or the same as one-liner: np.asarray([A * elem for elem in B])

Efficently multiply a matrix with itself after offsetting it by one in numpy

I am trying to write a function that takes a matrix A, then offsets it by one, and does element wise matrix multiplication on the shared area. Perhaps an example will help. Suppose I have the matrix:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What i'd like returned is:
(1*2) + (4*5) + (7*8) = 78
The following code does it, but inefficently:
import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
Height = A.shape[0]
Width = A.shape[1]
Sum1 = 0
for y in range(0, Height):
for x in range(0,Width-2):
Sum1 = Sum1 + \
A.item(y,x)*A.item(y,x+1)
print("%d * %d"%( A.item(y,x),A.item(y,x+1)))
print(Sum1)
With output:
1 * 2
4 * 5
7 * 8
78
Here is my attempt to write the code more efficently with numpy:
import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(np.sum(np.multiply(A[:,0:-1], A[:,1:])))
Unfortunately, this time I get 186. I am at a loss where did I go wrong. i'd love someone to either correcty me or offer another way to implement this.
Thank you.
In this 3 column case, you are just multiplying the 1st 2 columns, and taking the sum:
A[:,:2].prod(1).sum()
Out[36]: 78
Same as (A[:,0]*A[:,1]).sum()
Now just how does that generalize to more columns?
In your original loop, you can cut out the row iteration by taking the sum of this list:
[A[:,x]*A[:,x+1] for x in range(0,A.shape[1]-2)]
Out[40]: [array([ 2, 20, 56])]
Your description talks about multiplying the shared area; what direction are you doing the offset? From the calculation it looks like the offset is negative.
A[:,:-1]
Out[47]:
array([[1, 2],
[4, 5],
[7, 8]])
If that is the offset logic, than I could rewrite my calculation as
A[:,:-1].prod(1).sum()
which should work for many more columns.
===================
Your 2nd try:
In [3]: [A[:,:-1],A[:,1:]]
Out[3]:
[array([[1, 2],
[4, 5],
[7, 8]]),
array([[2, 3],
[5, 6],
[8, 9]])]
In [6]: A[:,:-1]*A[:,1:]
Out[6]:
array([[ 2, 6],
[20, 30],
[56, 72]])
In [7]: _.sum()
Out[7]: 186
In other words instead of 1*2, you are calculating [1,2]*[2*3]=[2,6]. Nothing wrong with that, if that's you you really intend. The key is being clear about 'offset' and 'overlap'.

Way of easily finding the average of every nth element over a window of size k in a pandas.Series? (not the rolling mean)

The motivation here is to take a time series and get the average activity throughout a sub-period (day, week).
It is possible to reshape an array and take the mean over the y axis to achieve this, similar to this answer (but using axis=2):
Averaging over every n elements of a numpy array
but I'm looking for something which can handle arrays of length N%k != 0 and does not solve the issue by reshaping and padding with ones or zeros (e.g numpy.resize), i.e takes the average over the existing data only.
E.g Start with a sequence [2,2,3,2,2,3,2,2,3,6] of length N=10 which is not divisible by k=3. What I want is to take the average over columns of a reshaped array with mis-matched dimensions:
In: [[2,2,3],
[2,2,3],
[2,2,3],
[6]], k =3
Out: [3,2,3]
Instead of:
In: [[2,2,3],
[2,2,3],
[2,2,3],
[6,0,0]], k =3
Out: [3,1.5,2.25]
Thank you.
You can use a masked array to pad with special values that are ignored when finding the mean, instead of summing.
k = 3
# how long the array needs to be to be divisible by 3
padded_len = (len(in_arr) + (k - 1)) // k * k
# create a np.ma.MaskedArray with padded entries masked
padded = np.ma.empty(padded_len)
padded[:len(in_arr)] = in_arr
padded[len(in_arr):] = np.ma.masked
# now we can treat it an array divisible by k:
mean = padded.reshape((-1, k)).mean(axis=0)
# if you need to remove the masked-ness
assert not np.ma.is_masked(mean), "in_arr was too short to calculate all means"
mean = mean.data
You can easily do it by padding, reshaping and calculating by how many elements to divide each row:
>>> import numpy as np
>>> a = np.array([2,2,3,2,2,3,2,2,3,6])
>>> k = 3
Pad data
>>> b = np.pad(a, (0, k - a.size%k), mode='constant').reshape(-1, k)
>>> b
array([[2, 2, 3],
[2, 2, 3],
[2, 2, 3],
[6, 0, 0]])
Then create a mask:
>>> c = a.size // k # 3
>>> d = (np.arange(k) + c * k) < a.size # [True, False, False]
The first part of d will create an array that contains [9, 10, 11], and compare it to the size of a (10), generating the mentioned boolean mask.
And divide it:
>>> b.sum(0) / (c + 1.0 * d)
array([ 3., 2., 3.])
The above will divide the first column by 4 (c + 1 * True) and the rest by 3. This is vectorized numpy, thus, it scales very well to large arrays.
Everything can be written shorter, I just show all the steps to make it more clear.
Flatten the list In by unpacking and chaining. Create a new list that arranges the flattened list lst by columns, then use the map function to calculate the average of each column:
from itertools import chain
In = [[2, 2, 3], [2, 2, 3], [2, 2, 3], [6]]
lst = chain(*In)
k = 3
In_by_cols = [lst[i::k] for i in range(k)]
# [[2, 2, 2, 6], [2, 2, 2], [3, 3, 3]]
Out = map(lambda x: sum(x)/ float(len(x)), In_by_cols)
# [3.0, 2.0, 3.0]
Using float on the length of each sublist will provide a more accurate result on python 2.x as it won't do integer truncation.

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])
For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!
Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

Forming matrix from 2 vectors in Numpy, with repetition of 1 vector

Using numpy arrays I want to create such a matrix most economically:
given
from numpy import array
a = array(a1,a2,a3,...,an)
b = array(b1,...,bm)
shall be processed to matrix M:
M = array([[a1,a2,b1,...,an],
... ...,
[a1,a2,bm,...,an]]
I am aware of numpy array's broadcasting methods but couldn't figure out a good way.
Any help would be much appreciated,
cheers,
Rob
You can use numpy.resize on a first and then add b's items at the required indices using numpy.insert on the re-sized array:
In [101]: a = np.arange(1, 4)
In [102]: b = np.arange(4, 6)
In [103]: np.insert(np.resize(a, (b.shape[0], a.shape[0])), 2, b, axis=1)
Out[103]:
array([[1, 2, 4, 3],
[1, 2, 5, 3]])
You can use a combination of numpy.tile and numpy.hstack functions.
M = numpy.repeat(numpy.hstack(a, b), (N,1))
I'm not sure I understand your target matrix, though.

Categories