Numpy: Diff on non-adjacent values, in 2D - python

I'd like to take the difference of non-adjacent values within 2D numpy array along axis=-1 (per row). An array can consist of a large number of rows.
Each row is a selection of values along a timeline from 1 to N.
For N=12, the array could look like below 3x12 shape:
timeline = np.array([[ 0, 0, 0, 4, 0, 6, 0, 0, 9, 0, 11, 0],
[ 1, 0, 3, 4, 0, 0, 0, 0, 9, 0, 0, 12],
[ 0, 0, 0, 4, 0, 0, 0, 0, 9, 0, 0, 0]])
The desired result should look like: (size of array is intact and position is important)
diff = np.array([[ 0, 0, 0, 4, 0, 2, 0, 0, 3, 0, 2, 0],
[ 1, 0, 2, 1, 0, 0, 0, 0, 5, 0, 0, 3],
[ 0, 0, 0, 4, 0, 0, 0, 0, 5, 0, 0, 0]])
I am aware of the solution in 1D, Diff on non-adjacent values
imask = np.flatnonzero(timeline)
diff = np.zeros_like(timeline)
diff[imask] = np.diff(timeline[imask], prepend=0)
within which the last line can be replaced with
diff[imask[0]] = timeline[imask[0]]
diff[imask[1:]] = timeline[imask[1:]] - timeline[imask[:-1]]
and the first line can be replaced with
imask = np.where(timeline != 0)[0]
Attempting to generalise the 1D solution I can see imask = np.flatnonzero(timeline) is undesirable as rows becomes inter-dependent. Thus I am trying by using the alternative np.nonzero.
imask = np.nonzero(timeline)
diff = np.zeros_like(timeline)
diff[imask] = np.diff(timeline[imask], prepend=0)
However, this solution results in a connection between row's end values (inter-dependent).
array([[ 0, 0, 0, 4, 0, 2, 0, 0, 3, 0, 2, 0],
[-10, 0, 2, 1, 0, 0, 0, 0, 5, 0, 0, 3],
[ 0, 0, 0, -8, 0, 0, 0, 0, 5, 0, 0, 0]])
How can I make the "prepend" to start each row with a zero?

Wow. I did it... (It is interesting problem for me too..)
I made non_adjacent_diff function to be applied to every row, and apply it to every row using np.apply_along_axis.
Try this code.
timeline = np.array([[ 0, 0, 0, 4, 0, 6, 0, 0, 9, 0, 11, 0],
[ 1, 0, 3, 4, 0, 0, 0, 0, 9, 0, 0, 12],
[ 0, 0, 0, 4, 0, 0, 0, 0, 9, 0, 0, 0]])
def non_adjacent_diff(row):
not_zero_index = np.where(row != 0)
diff = row[not_zero_index][1:] - row[not_zero_index][:-1]
np.put(row, not_zero_index[0][1:], diff)
return row
np.apply_along_axis(non_adjacent_diff, 1, timeline)

Related

Fastest way to expand the values of a numpy matrix in diagonal blocks

I'm searching for a fast way for resize the matrix in a special way, without using for-loops:
I have a squared Matrix:
matrix = [[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]]
and my purpose is to resize it 3 (or n) times, where the values are diagonal blocks in the matrix and other values are zeros:
goal_matrix = [[ 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0],
[ 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0],
[ 0, 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5],
[ 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0,10, 0, 0],
[ 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0,10, 0],
[ 0, 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0,10],
[11, 0, 0,12, 0, 0,13, 0, 0,14, 0, 0,15, 0, 0],
[ 0,11, 0, 0,12, 0, 0,13, 0, 0,14, 0, 0,15, 0],
[ 0, 0,11, 0, 0,12, 0, 0,13, 0, 0,14, 0, 0,15],
[16, 0, 0,17, 0, 0,18, 0, 0,19, 0, 0,20, 0, 0],
[ 0,16, 0, 0,17, 0, 0,18, 0, 0,19, 0, 0,20, 0],
[ 0, 0,16, 0, 0,17, 0, 0,18, 0, 0,19, 0, 0,20],
[21, 0, 0,22, 0, 0,23, 0, 0,24, 0, 0,25, 0, 0],
[ 0,21, 0, 0,22, 0, 0,23, 0, 0,24, 0, 0,25, 0],
[ 0, 0,21, 0, 0,22, 0, 0,23, 0, 0,24, 0, 0,25]]
It should do something like this question, but without unnecessary zero padding.
Is there any mapping, padding or resizing function for doing this in a fast way?
IMO, it is inappropriate to reject the for loop blindly. Here I provide a solution without the for loop. When n is small, its performance is better than that of #MichaelSzczesny and #SalvatoreDanieleBianco solutions:
def mechanic(mat, n):
ar = np.zeros((*mat.shape, n * n), mat.dtype)
ar[..., ::n + 1] = mat[..., None]
return ar.reshape(
*mat.shape,
n,
n
).transpose(0, 3, 1, 2).reshape([s * n for s in mat.shape])
This solution obtains the expected output through a slice assignment, then transpose and reshape, but copies will occur in the last step of reshaping, making it inefficient when n is large.
After a simple test, I found that the solution that simply uses the for loop has the best performance:
def mechanic_for_loop(mat, n):
ar = np.zeros([s * n for s in mat.shape], mat.dtype)
for i in range(n):
ar[i::n, i::n] = mat
return ar
Next is a benchmark test using perfplot. The test functions are as follows:
import numpy as np
def mechanic(mat, n):
ar = np.zeros((*mat.shape, n * n), mat.dtype)
ar[..., ::n + 1] = mat[..., None]
return ar.reshape(
*mat.shape,
n,
n
).transpose(0, 3, 1, 2).reshape([s * n for s in mat.shape])
def mechanic_for_loop(mat, n):
ar = np.zeros([s * n for s in mat.shape], mat.dtype)
for i in range(n):
ar[i::n, i::n] = mat
return ar
def michael_szczesny(mat, n):
return np.einsum(
'ij,kl->ikjl',
mat,
np.eye(n, dtype=mat.dtype)
).reshape([s * n for s in mat.shape])
def salvatore_daniele_bianco(mat, n):
repeated_matrix = mat.repeat(n, axis=0).repeat(n, axis=1)
col_ids, row_ids = np.meshgrid(
np.arange(repeated_matrix.shape[0]),
np.arange(repeated_matrix.shape[1])
)
repeated_matrix[(col_ids % n) - (row_ids % n) != 0] = 0
return repeated_matrix
functions = [
mechanic,
mechanic_for_loop,
michael_szczesny,
salvatore_daniele_bianco
]
Resize times unchanged, array size changes:
if __name__ == '__main__':
from itertools import accumulate, repeat
from operator import mul
from perfplot import bench
bench(
functions,
list(accumulate(repeat(2, 11), mul)),
lambda n: (np.arange(n * n).reshape(n, n), 5),
xlabel='ar.shape[0]'
).show()
Output:
Resize times changes, array size unchanged:
if __name__ == '__main__':
from itertools import accumulate, repeat
from operator import mul
from perfplot import bench
ar = np.arange(25).reshape(5, 5)
bench(
functions,
list(accumulate(repeat(2, 11), mul)),
lambda n: (ar, n),
xlabel='resize times'
).show()
Output:
Input:
matrix = np.array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]])
Solution:
repeated_matrix = matrix.repeat(3, axis=0).repeat(3, axis=1)
col_ids, row_ids = np.meshgrid(np.arange(repeated_matrix.shape[0]), np.arange(repeated_matrix.shape[1]))
repeated_matrix[(col_ids%3)-(row_ids%3)!=0]=0
Output (repeated_matrix):
array([[ 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0],
[ 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0],
[ 0, 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5],
[ 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0, 10, 0, 0],
[ 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0, 10, 0],
[ 0, 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0, 10],
[11, 0, 0, 12, 0, 0, 13, 0, 0, 14, 0, 0, 15, 0, 0],
[ 0, 11, 0, 0, 12, 0, 0, 13, 0, 0, 14, 0, 0, 15, 0],
[ 0, 0, 11, 0, 0, 12, 0, 0, 13, 0, 0, 14, 0, 0, 15],
[16, 0, 0, 17, 0, 0, 18, 0, 0, 19, 0, 0, 20, 0, 0],
[ 0, 16, 0, 0, 17, 0, 0, 18, 0, 0, 19, 0, 0, 20, 0],
[ 0, 0, 16, 0, 0, 17, 0, 0, 18, 0, 0, 19, 0, 0, 20],
[21, 0, 0, 22, 0, 0, 23, 0, 0, 24, 0, 0, 25, 0, 0],
[ 0, 21, 0, 0, 22, 0, 0, 23, 0, 0, 24, 0, 0, 25, 0],
[ 0, 0, 21, 0, 0, 22, 0, 0, 23, 0, 0, 24, 0, 0, 25]])
basically you can define your custom function to do this on any matrix like:
def what_you_whant(your_matrix, n_repeats):
repeated_matrix = your_matrix.repeat(n_repeats, axis=0).repeat(n_repeats, axis=1)
col_ids, row_ids = np.meshgrid(np.arange(repeated_matrix.shape[1]), np.arange(repeated_matrix.shape[0]))
repeated_matrix[(col_ids%n_repeats)-(row_ids%n_repeats)!=0]=0
return repeated_matrix
As Michael Szczesny suggested in his comment:
The fastest way is to use the einsum, and multiplicate the matrix with the identification matrix with size of the block and reshape it to the expanded size:
np.einsum('ij,kl->ikjl', matrix, np.eye(3)).reshape(len(matrix) * 3, -1)
another more straight forward answer (but ~4x slower) is to use the Kronecker product. Again multiplying the matrix with the identity matrix:
np.kron(matrix, np.eye(3))

reshape tensor in a determined way

I have some tensor x3. I got it in the following way:
x = torch.tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])
x2 = torch.stack(5 * [x], 0)
x2 = x2.reshape(-1)
x3 = torch.stack(4 * [x2], 0)
x3 = torch.stack(6 * [x3], -1)
x3 = torch.stack(7 * [x3], -1)
In short it means that
x[0, :9, 0, 0] = [0, 0, 0, 0, 1, 0, 0, 0, 0]
x[0, 9:18, 0, 0] = [0, 0, 0, 0, 1, 0, 0, 0, 0]
and so on.
Then I want to reshape it that every nine values of the 1st dimension go to the new dimension. In other words, I want x3[0, 0, 0, 0, :] to give me tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])
I tried to do:
x3.reshape(4, 5, 6, 7, 9)[0, 0, 0, 0, :]
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0])
x3.reshape(4, 9, 6, 7, 5).transpose(1, -1)[0, 0, 0, 0, :]
tensor([0, 0, 0, 0, 0, 0, 0, 0, 1])
As you see, none of it gives me the right answer
UPD: added x3 = torch.stack(7 * [x3], -1)
If you want to modify the 1st dimension and create a new dimension at the end, you need to first move that dimension to last by using permute. Something like this should do the trick:
xpermuted = x3.permute(0, 2, 3, 1)
xreshaped = xpermuted.reshape(xpermuted.shape[0], xpermuted.shape[1], xpermuted.shape[2], int(xpermuted.shape[3] / 9), 9)
print(xreshaped[0, 0, 0, 0, :]) # tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])
print(xreshaped[0, 0, 0, 1, :]) # tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])
print(xreshaped[0, 0, 0, 2, :]) # tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])
After that, you can restore the initial dimension order by using permute again if you need the original order of dimensions:
xrestored = xreshaped.permute(0, 3, 1, 2, 4)
print(xrestored.shape) # torch.Size([4, 5, 6, 7, 9])
Technically you don't have to move the first dimension to last initially, you can also do the reverse first reshape, then permute. Actually, now that I think about it, this is better since it has one less permute:
xreshaped = x3.reshape(x3.shape[0], int(x3.shape[1]/9), 9, x3.shape[2], x3.shape[3])
xrestored = xreshaped.permute(0, 1, 3, 4, 2)
print(xrestored.shape) # torch.Size([4, 5, 6, 7, 9])
print(xrestored[0, 0, 0, 0, :]) # tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])
print(xrestored[0, 1, 0, 0, :]) # tensor([0, 0, 0, 0, 1, 0, 0, 0, 0])

how to do padding a nested list

I have a nested list that contains 1002 time steps and in each time step, I have observation of 11 features. I have read docs related to padding but I really could not find out how to add zero elements at the end of each list. I found out the highest length of lists is for example the 24th item in my main list and now I want to pad all the rest elements based on this unless the 24th element that already in shape.As an example:
a = [[1,2,3,4,5,6,76,7],[2,2,3,4,2,5,5,5,,7,8,9,33,677,8,8,9,9],[2,3,46,7,8,9,],[3,3,3,5],[2,2],[1,1],[2,2]]
a[1] = padding(a[1],len(a[2]) with zeros at the end of the list)
I have done below:
import numpy as np
def pad_or_truncate(some_list, target_len):
return some_list[:target_len] + [0]*(target_len - len(some_list))
for i in range(len(Length)):
pad_or_truncate(Length[i],len(Length[24]))
print(len(Length[i]))
or
for i in range(len(Length)):
df_train_array = np.pad(Length[i],len(Length[24]),mode='constant')
and I got this error: Unable to coerce to Series, length must be 11: given 375
Solution 1
# set the max number of 0
max_len = max([len(x) for x in a])
# add zeros to the lists
temp = [x+ [0]*max_len for x in a]
#Limit the output to the wished length
[x[0:max_len] for x in temp]
Solution 2 using pandas
import pandas as pd
df = pd.DataFrame(a)
df.fillna(0).astype(int).values.tolist()
Output
[[1, 2, 3, 4, 5, 6, 76, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[2, 2, 3, 4, 2, 5, 5, 5, 7, 8, 9, 33, 677, 8, 8, 9, 9],
[2, 3, 46, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[3, 3, 3, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
...]
The following code snippet should pad the individual lists with the appropriate number of 0s(driven by the size of the list with the maximum elements)
def main():
data = [
[1,2,3,4,5,6,76,7],
[2,2,3,4,2,5,5,5,7,8,9,33,677,8,8,9,9],
[2,3,46,7,8,9,],
[3,3,3,5],
[2,2],
[1,1],
[2,2]
]
# find the list with the maximum elements
max_length = max(map(len, data))
for element in data:
for _ in range(len(element), max_length):
element.append(0)
if __name__ == '__main__':
main()
You can use this simple line, which uses np.pad
list(map(lambda x: np.pad(x, (max(map(len, a)) - len(x), 0)).tolist(), a))
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 76, 7],
[2, 2, 3, 4, 2, 5, 5, 5, 7, 8, 9, 33, 677, 8, 8, 9, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 46, 7, 8, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 5],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2]]
Use this if you want to pad at the end instead:
list(map(lambda x: np.pad(x, (0, max(map(len, a)) - len(x))).tolist(), a))

numpy: check for 1 every 6 element every row

I need to have something like this:
arr = array([[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]])
Where each row contains 36 elements, every 6 element in a row represents a hidden row, and that hidden row needs exactly one 1, and 0 everywhere else. In other words, every entry mod 6 needs exactly one 1. This is my requirement for arr.
I have a table that's going to be used to compute a "fitness" value for each row. That is, I have a
table = np.array([10, 5, 4, 6, 5, 1, 6, 4, 9, 7, 3, 2, 1, 8, 3,
6, 4, 6, 5, 3, 7, 2, 1, 4, 3, 2, 5, 6, 8, 7, 7, 6, 4, 1, 3, 2])
table = table.T
and I'm going to multiply each row of arr with table. The result of that multiplication, a 1x1 matrix, will be stored as the "fitness" value of that corresponding row. UNLESS the row does not fit the requirement described above, which should return 0.
an example of what should be returned is
result = array([5,12,13,14,20,34])
I need a way to do this but I'm too new to numpy to know how to.
(I'm Assuming you want what you've asked for in the first half).
I believe better or more elegant solutions exist, but this is what I think can do the job.
np.all(arr[:,6] == 1) and np.all(arr[:, :6] == 0) and np.all(arr[:, 7:])
Alternatively, you can construct the array (with 0's and 1's) and then just compare with it, say using not_equal.
I'm also not 100% sure of your question, but I'll try to answer with the best of my knowledge.
Since you're saying your matrix has "hidden rows", to check whether it is well formed, the easiest way seems to be to just reshape it:
# First check, returns true if all elements are either 0 or 1
np.in1d(arr, [0,1]).all()
# Second check, provided the above was True, returns True if
# each "hidden row" has exactly one 1 and other 0.
(arr.reshape(6,6,6).sum(axis=2) == 1).all()
Both checks return "True" for your arr.
Now, my understanding is that for each "large" row of 36 elements, you want a scalar product with your "table" vector, unless that "large" row has an ill-formed "hidden small" row. In this case, I'd do something like:
# The following computes the result, not checking for integrity
results = arr.dot(table)
# Now remove the results that are not well formed.
# First, compute "large" rows where at least one "small" subrow
# fails the condition.
mask = (arr.reshape(6,6,6).sum(axis=2) != 1).any(axis=1)
# And set the corresponding answer to 0
results[mask] = 0
However, running this code against your data returns as answer
array([38, 31, 24, 24, 32, 20])
which is not what you mention; did I misunderstand your requirement, or was the example based on different data?

how to move all non-zero elements in a python list or numpy array to one side?

I'm going to do the following operation of a list or numpy array:
[0, 0, 0, 1, 0, 0, 4, 2, 0, 7, 0, 0, 0]
move all non-zeros to the right side:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 2, 7]
How can I do this efficiently?
Thanks
============
Sorry I didn't make it clear, I need the order of non-zeros elements remains.
You could sort the list by their boolean value. All falsy values (just zero for numbers) will get pushed to the front of the list. Python's builtin sort appears stable, so other values will keep their relative position.
Example:
>>> a = [0, 0, 0, 1, 0, 0, 5, 2, 0, 7, 0, 0, 0]
>>> sorted(a, key=bool)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 2, 7]
Using NumPy:
>>> a = np.array([0, 0, 0, 1, 0, 0, 4, 2, 0, 7, 0, 0, 0])
>>> np.concatenate((a[a==0], a[a!=0]))
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 2, 7])
You can do this in O(N) time in Python as well by using a simple for-loop. But will take some extra memory which we can prevent in #grc's solution by using a.sort(key=bool):
>>> from collections import deque
#Using a deque
>>> def solve_deque(lst):
d = deque()
append_l = d.appendleft
append_r = d.append
for x in lst:
if x:
append_r(x)
else:
append_l(x)
return list(d) #Convert to list if you want O(1) indexing.
...
#Using simple list
>>> def solve_list(lst):
left = []
right = []
left_a = left.append
right_a = right.append
for x in lst:
if x:
right_a(x)
else:
left_a(x)
left.extend(right)
return left
>>> solve_list([0, 0, 0, 1, 0, 0, 4, 2, 0, 7, 0, 0, 0])
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 2, 7]
>>> solve_deque([0, 0, 0, 1, 0, 0, 4, 2, 0, 7, 0, 0, 0])
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 2, 7]

Categories