This question already has answers here:
How to calculate the sum of all columns of a 2D numpy array (efficiently)
(6 answers)
Closed 4 years ago.
I have a numpy array
np.array(data).shape
(50,50)
Now, I want to find the cumulative sums across axis=1. The problem is cumsum creates an array of cumulative sums, but I just care about the final value of every row.
This is incorrect of course:
np.cumsum(data, axis=1)[-1]
Is there a succinct way of doing this without looping through the array.
You are almost there, but as you have it now, you are selecting just the final row. What you need is to select all rows from the last column, so your indexing at the end should be: [:,-1].
Example:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> a.cumsum(axis=1)[:,-1]
array([ 10, 35, 60, 85, 110])
Note, I'm leaving this up as I think it explains what was going wrong with your attempt, but admittedly, there are more effective ways of doing this in the other answers!
The final cumulative sum of every row, is in fact simply the sum of every row, or the row-wise sum, so we can implement this as:
>>> x.sum(axis=1)
array([ 10, 35, 60, 85, 110])
So here for every row, we calculate the sum of all the columns. We thus do not need to first generate the sums in between (well these will likely be stored in an accumulator in numpy), but not "emitted" in the array.
You can use numpy.ufunc.reduce if you don't need the intermediary accumulated results of any ufunc.
>>> a = np.arange(9).reshape(3,3)
>>> a
>>>
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>>
>>> np.add.reduce(a, axis=1)
>>> array([ 3, 12, 21])
However, in the case of sum, Willem's answer is clearly superior and to be preferred. Just keep in mind that in the general case, there's ufunc.reduce.
Related
If I have an array of time increments, for example:
intervals = np.random.normal(loc=1,scale=0.1,size=100)
one possible way to create the corresponding array of time instants is to create a list and manually make the sum:
Sum=0.
instants=[]
for k in range(len(intervals)):
Sum+=intervals[k]
instants.append(Sum)
instants=np.array(instants)
So, I have just switched from a array of dt(i) to an array of t(i).
But usually python offers elegant alternatives to using for loops. Is there a better way to do it?
What you here describe is the cumulative sum. You can calculate this with .cumsum() [numpy-doc]:
intervals.cumsum()
For example:
>>> intervals
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> intervals.cumsum()
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])
This is a follow up on this question.
From a 2d array, create another 2d array composed of randomly selected values from original array (values not shared among rows) without using a loop
I am looking for a way to create a 2D array whose rows are randomly selected unique values (non-repeating) from another row, without using a loop.
Here is a way to do it With using a loop.
pool = np.random.randint(0, 30, size=[4,5])
seln = np.empty([4,3], int)
for i in range(0, pool.shape[0]):
seln[i] =np.random.choice(pool[i], 3, replace=False)
print('pool = ', pool)
print('seln = ', seln)
>pool = [[ 1 11 29 4 13]
[29 1 2 3 24]
[ 0 25 17 2 14]
[20 22 18 9 29]]
seln = [[ 8 12 0]
[ 4 19 13]
[ 8 15 24]
[12 12 19]]
Here is a method that does not uses a loop, however, it can select the same value multiple times in each row.
pool = np.random.randint(0, 30, size=[4,5])
print(pool)
array([[ 4, 18, 0, 15, 9],
[ 0, 9, 21, 26, 9],
[16, 28, 11, 19, 24],
[20, 6, 13, 2, 27]])
# New array shape
new_shape = (pool.shape[0],3)
# Indices where to randomly choose from
ix = np.random.choice(pool.shape[1], new_shape)
array([[0, 3, 3],
[1, 1, 4],
[2, 4, 4],
[1, 2, 1]])
ixs = (ix.T + range(0,np.prod(pool.shape),pool.shape[1])).T
array([[ 0, 3, 3],
[ 6, 6, 9],
[12, 14, 14],
[16, 17, 16]])
pool.flatten()[ixs].reshape(new_shape)
array([[ 4, 15, 15],
[ 9, 9, 9],
[11, 24, 24],
[ 6, 13, 6]])
I am looking for a method that does not use a loop, and if a particular value from a row is selected, that value can Not be selected again.
Here is a way without explicit looping. However, it requires generating an array of random numbers of the size of the original array. That said, the generation is done using compiled code so it should be pretty fast. It can fail if you happen to generate two identical numbers, but the chance of that happening is essentially zero.
m,n = 4,5
pool = np.random.randint(0, 30, size=[m,n])
new_width = 3
mask = np.argsort(np.random.rand(m,n))<new_width
pool[mask].reshape(m,3)
How it works:
We generate a random array of floats, and argsort it. By default, when artsort is applied to a 2d array it is applied along axis 1 so the value of the i,j entry of the argsorted list is what place the j-th entry of the i-th row would appear if you sorted the i-th row.
We then find all the values in this array where the entries whose values are less than new_width. Each row contains the numbers 0,...,n-1 in a random order, so exactly new_width of them will be less than new_width. This means each row of mask will have exactly new_width number of entries which are True, and the rest will be False (when you use a boolean operator between a ndarray and a scalar it applies it component-wise).
Finally, the boolean mask is applied to the original data to grab new_width many entries from each row.
You could also use np.vectorize for your loop solution, although that is just shorthand for a loop.
I have the following matrix:
M = np.matrix([[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
And I receive a vector indexing the columns of the matrix:
index = np.array([1,1,2,2,2,2,3,4,4,4])
This vector has 4 different values, so my objective is to create a list containing four new matrices so that the first matrix is made by the first two columns of M, the second matrix is made by columns 3 to 6 and so on:
M1 = np.matrix([[1,2],[11,12],[21,22]])
M2 = np.matrix([[3,4,5,6],[13,14,15,16],[23,24,25,26]])
M3 = np.matrix([[7],[17],[27]])
M4 = np.matrix([[8,9,10],[18,19,20],[28,29,30]])
l = list(M1,M2,M3,M4)
I need to do this in a automated way, since the number of rows and columns of M as well as the indexing scheme are not fixed. How can I do this?
There are 3 points to note:
For a variable number of variables, as in this case, the recommended solution is to use a dictionary.
You can use simple numpy indexing for the individual case.
Unless you have a very specific reason, use numpy.array instead of numpy.matrix.
Combining these points, you can use a dictionary comprehension:
d = {k: np.array(M[:, np.where(index==k)[0]]) for k in np.unique(index)}
Result:
{1: array([[ 1, 2],
[11, 12],
[21, 22]]),
2: array([[ 3, 4, 5, 6],
[13, 14, 15, 16],
[23, 24, 25, 26]]),
3: array([[ 7],
[17],
[27]]),
4: array([[ 8, 9, 10],
[18, 19, 20],
[28, 29, 30]])}
import numpy as np
M = np.matrix([[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
index = np.array([1,1,2,2,2,2,3,4,4,4])
m = [[],[],[],[]]
for i,c in enumerate(index):
m[k-1].append(c)
for idx in m:
print M[:,idx]
this is a little hard coded, I assumed you will always want 4 matrixes and such.. you can change it for more generalisation
I need to defina a function which reads a numpy array and produces the mean for k nearest points to number p in the array.
Example:
array= np.array([1, 2, 3, 4, 5, 6, 7, 50, 24, 32, 9, 11, 12, 10])
p= 15 (**Note this is not a number in the array, I will need to find the
number closest to p or p number itself)
k = 3
In this case, I would need to generate the mean for ([11, 12, 10)]
as they are closest to p = 15
With the above numbers, I will need to find the mean for k number of points closest to p and p can be explicitly stated in the array or may not be.
I am new and very confused at this point and feel I have exhausted my resources. I feel this question has been asked before but the answers are much too complex for what I need.
Thanks in advance.
Given a (1d) array arr and scalar input p, here's how you could find the mean of the n nearest values:
def neighbor_mean(arr, p, n=3):
idx = np.abs(arr - p).argsort()[:n]
return arr[idx].mean()
arr = np.array([1, 2, 3, 4, 5, 6, 7, 50, 24, 32, 9, 11, 12, 10])
neighbor_mean(arr, p=15)
# 11.0
In the above, first you take the absolute differences:
np.abs(arr - 15)
# array([14, 13, 12, 11, 10, 9, 8, 35, 9, 17, 6, 4, 3, 5])
Then argsort() returns the indices that would sort an array. We're interested in the n-smallest absolute differences. This is what you're really looking for, rather than sorting the differences directly.
np.abs(arr - p).argsort()[:3]
# array([12, 11, 13])
Lastly you want to index your input array arr and take the mean of this:
arr[[12, 11, 13]]
# array([12, 11, 10]) # mean: 11.0
I have a question and I could not find the answer on the internet nor on this website. I am sure it is very easy though. Let's say I have a set of 20 numbers and I have them in a 5x4 matrix:
numbers = np.arange(20).reshape(5,4)
This yields the following matrix:
[ 0, 1, 2, 3]
[ 4, 5, 6, 7]
[ 8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]
Now I would like to have the minimum value of each row, in this case amounting to 0,4,8,12,16. However, I would like to add that for my problem the minimum value is NOT always in the first column, it can be at a random place in the matrix (i.e. first, second, third or fourth column for each row). If someone could shed some light on this it would be greatly appreciated.
You just need to specify the axis across which you want to take the minimum. To find the minimum value in each row, you need to specify axis 1:
>>> numbers.min(axis=1)
array([ 0, 4, 8, 12, 16])
For a 2D array, numbers.min() finds the single minimum value in the array, numbers.min(axis=0) returns the minimum value for each column and numbers.min(axis=1) returns the minimum value for each row.