I have an interesting puzzle. Suppose you have a numpy 2D array, in which each line corresponds to a measurement event and each column corresponds to different measured variable. One additional column in this array specifies the date at which the measurement was taken. The lines are sorted according to the time stamp. There are several (or many) measurements on each day. The goal is to identify the lines that correspond to a new day and subtract the respective values from the subsequent lines in that day.
I approach this problem by a loop that loops over the days, creating a boolean vector that selects the proper lines and then subtracting the first selected line. This approach works, but feels non-elegant. Are there better ways to do this?
Just a small example. The lines below define a matrix in which the first colum
is the day and the remaining two are the measured values
before = array([[ 1, 1, 2],
[ 1, 3, 4],
[ 1, 5, 6],
[ 2, 7, 8],
[ 3, 9, 10],
[ 3, 11, 12],
[ 3, 13, 14]])
at the end of the process I expect to see the following array:
array([[1, 0, 0],
[1, 2, 2],
[1, 4, 4],
[2, 0, 0],
[3, 0, 0],
[3, 2, 2],
[3, 4, 4]])
PS Please help me finding a better and more informative title for this post. I'm out of ideas
numpy.searchsorted is a convenient function for this:
In : before
Out:
array([[ 1, 1, 2],
[ 1, 3, 4],
[ 1, 5, 6],
[ 2, 7, 8],
[ 3, 9, 10],
[ 3, 11, 12],
[ 3, 13, 14]])
In : diff = before[before[:,0].searchsorted(x[:,0])]
In : diff[:,0] = 0
In : before - diff
Out:
array([[1, 0, 0],
[1, 2, 2],
[1, 4, 4],
[2, 0, 0],
[3, 0, 0],
[3, 2, 2],
[3, 4, 4]])
Longer explanation
If you take the first column, and search for itself you get the minimum indices for those particular values:
In : before
Out:
array([[ 1, 1, 2],
[ 1, 3, 4],
[ 1, 5, 6],
[ 2, 7, 8],
[ 3, 9, 10],
[ 3, 11, 12],
[ 3, 13, 14]])
In : before[:,0].searchsorted(x[:,0])
Out: array([0, 0, 0, 3, 4, 4, 4])
You can then use this to construct the matrix that you will subtract by indexing:
In : diff = before[before[:,0].searchsorted(x[:,0])]
In : diff
Out:
array([[ 1, 1, 2],
[ 1, 1, 2],
[ 1, 1, 2],
[ 2, 7, 8],
[ 3, 9, 10],
[ 3, 9, 10],
[ 3, 9, 10]])
You need to make the first column 0 so that they won't be subtracted.
In : diff[:,0] = 0
In : diff
Out:
array([[ 0, 1, 2],
[ 0, 1, 2],
[ 0, 1, 2],
[ 0, 7, 8],
[ 0, 9, 10],
[ 0, 9, 10],
[ 0, 9, 10]])
Finally, subtract two matrices to get the desired output:
In : before - diff
Out:
array([[1, 0, 0],
[1, 2, 2],
[1, 4, 4],
[2, 0, 0],
[3, 0, 0],
[3, 2, 2],
[3, 4, 4]])
Related
There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?
Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])
Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])
This is likely a repost but I'm not sure what wording to use for the title.
I'm trying to subtract the values of arrays inside arrays by reshaping them to create a larger array.
xn = np.array([[1,2,3],[4,5,6]])
yn = np.array(([1,2,3,4,5], [6,7,8,9,10]])
xn.shape
Out[42]: (2, 3)
yn.shape
Out[43]: (2, 5)
The functionality I want is:
yn.reshape(2,-1,1) - xn
This throws a value error, but the below works just fine when I remove the first dimension as a factor:
yn.reshape(2,-1,1)[0] - xn[0]
Out[44]:
array([[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2]])
Which would be the first output I would expect because xn and yn both have a first dimension of 2.
Is there a proper way to do this with the desired broadcasting?
Desired output:
array([[[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2]],
[[2, 1, 0],
[3, 2, 1],
[4, 3, 2],
[5, 4, 3],
[6, 5, 4]]])
>>> x
array([[1, 2, 3],
[4, 5, 6]])
>>> y
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
>>> z = y.reshape(2,-1,1)
Add another axis to x:
>>> z-x[:,None,:]
array([[[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2]],
[[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2],
[ 5, 4, 3],
[ 6, 5, 4]]])
>>>
Or just:
>>> y[...,None] - x[:,None,:]
array([[[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2]],
[[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2],
[ 5, 4, 3],
[ 6, 5, 4]]])
From broadcasting rules, to be able to broadcast the shapes must be equal or one of them needs to be equal to 1 (starting from trailing dimensions and moving forward). So swapping two last dimensions of xn will allow you to broadcast (after adding another dimension to xn):
yn.reshape(2, -1, 1) - xn.reshape(2, -1, 1).swapaxes(-1, -2)
array([[[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2]],
[[ 2, 1, 0],
[ 3, 2, 1],
[ 4, 3, 2],
[ 5, 4, 3],
[ 6, 5, 4]]])
The shape of yn.reshape(2, -1, 1) is (2, 5, 1) and the shape of xn.reshape(2, -1, 1).swapaxes(-1, -2) is (2, 1, 3). Now you can broadcast because the dimensions are equal or one of them is equal one by element-wise comparison starting from trailing dimensions.
I have a numpy ndarray as follows"
x = np.array([[1, 2, 3], [4, 5, 6],[4, 1, 6],[1, 5, 11],[4,3, 4]], np.int32)
i set the value to zero if index with same col, row
rows = x.shape[0]
cols = x.shape[1]
for k in range(0, rows):
for y in range(0, cols):
if k==y:
x[k,y]=0
expected output:
array([[ 0, 2, 3],
[ 4, 0, 6],
[ 4, 1, 0],
[ 1, 5, 11],
[ 4, 3, 4]])
Is there any simple (pythonic way to achieve the same results)? my actual matrix is very large...
Use np.fill_diagonal:
x = np.array([[1, 2, 3], [4, 5, 6],[4, 1, 6],[1, 5, 11],[4,3, 4]], np.int32)
np.fill_diagonal(x,0)
print(x)
array([[ 0, 2, 3],
[ 4, 0, 6],
[ 4, 1, 0],
[ 1, 5, 11],
[ 4, 3, 4]], dtype=int32)
Let's say I have the following array
import numpy as np
matrix = np.array([
[[1, 2, 3, 4], [0, 1], [2, 3, 4, 5]],
[[1, 2, 3], [4], [0, 1], [2, 0], [0, 0]],
[[2, 2], [3, 4, 0], [1, 1, 0, 0], [0]],
[[6, 3, 3, 4, 0], [4, 2, 3, 4, 5]],
[[1, 2, 3, 2], [0, 1, 2], [3, 4, 5]]])
As you can see, it's a staggered array. What I want to do is to sum the elements in a way so that the output is:
[11, 11, 15, 18, 0, 8, 9, 9, 12, 15]
I want to sum the elements in the "columns" of the matrix, but I don't know how to do it.
As mentioned by juanpa.arrivillaga in the comments, you don't have a multi-dimensional array, you have a 1-D array of lists of lists. You need to flatten the inner lists first :
>>> np.array([[z for y in x for z in y] for x in matrix])
array([[1, 2, 3, 4, 0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 0, 1, 2, 0, 0, 0],
[2, 2, 3, 4, 0, 1, 1, 0, 0, 0],
[6, 3, 3, 4, 0, 4, 2, 3, 4, 5],
[1, 2, 3, 2, 0, 1, 2, 3, 4, 5]])
It should be much easier to solve your problem now. This matrix has a shape of (5,10), and supports T for transposition and np.sum() for summing rows or columns.
You didn't write any code, so I won't solve the problem completely, but with this matrix, you're one step away from:
array([11, 11, 15, 18, 0, 8, 9, 9, 12, 15])
If i have an numpy array like:
x= [[3, 3], [2, 2]]
I want to add an element -1 to the end of each the rows to be like this:
x= [[3, 3, -1], [2, 2, -1]]
any simple way to do that ?
A simple way would be with np.insert -
np.insert(x,x.shape[1],-1,axis=1)
We can also use np.column_stack -
np.column_stack((x,[-1]*x.shape[0]))
Sample run -
In [161]: x
Out[161]:
array([[0, 8, 7, 0, 1],
[0, 1, 8, 6, 8],
[3, 4, 7, 0, 2]])
In [162]: np.insert(x,x.shape[1],-1,axis=1)
Out[162]:
array([[ 0, 8, 7, 0, 1, -1],
[ 0, 1, 8, 6, 8, -1],
[ 3, 4, 7, 0, 2, -1]])
In [163]: np.column_stack((x,[-1]*x.shape[0]))
Out[163]:
array([[ 0, 8, 7, 0, 1, -1],
[ 0, 1, 8, 6, 8, -1],
[ 3, 4, 7, 0, 2, -1]])