Related
given the following array, I want to replace the zero with their previous value columnwise as long as it is surrounded by two values greater than zero.
I am aware of np.where but it would consider the whole array instead of its columns.
I am not sure how to do it and help would be appreciated.
This is the array:
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
and since the only zero that meets this condition is the second row/second column one,
the new array should be the following
new_a=np.array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
How do I accomplish this?
And what if I would like to extend the gap surrounded by nonzero ? For instance, the first column contains two 0 and the second column contains one 0, so the new array would be
new_a=np.array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
In short, how do I solve this if the columnwise condition would be the one of having N consecutive zeros or less?
As a generic method, I would approach this using a convolution:
from scipy.signal import convolve2d
# kernel for top/down neighbors
kernel = np.array([[1],
[0],
[1]])
# is the value a zero?
m1 = a==0
# count non-zeros neighbors
m2 = convolve2d(~m1, kernel, mode='same') > 1
mask = m1&m2
# replace matching values with previous row value
a[mask] = np.roll(a, 1, axis=0)[mask]
output:
array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
filling from surrounding values
Using pandas to benefit from ffill/bfill (you can forward-fill in pure numpy but its more complex):
import pandas as pd
df = pd.DataFrame(a)
# limit for neighbors
N = 2
# identify non-zeros
m = df.ne(0)
# mask zeros
m2 = m.where(m)
# mask for values with 2 neighbors within limits
mask = m2.ffill(limit=N) & m2.bfill(limit=N)
df.mask(mask&~m).ffill()
array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
That's one solution I found. I know it's basic but I think it works.
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
a_t = a.T
for i in range(len(a_t)):
ar = a_t[i]
for j in range(len(ar)-1):
if (j>0) and (ar[j] == 0) and (ar[j+1] > 0):
a_t[i][j] = a_t[i][j-1]
a = a_t.T
I have a two dimensional numpy array x:
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
My goal is to replace all consecutive duplicate numbers with a specific value (lets take -1), but by leaving one occurrence unchanged.
I could do this as follows:
def replace_consecutive_duplicates(x):
consec_dup = np.zeros(x.shape, dtype=bool)
consec_dup[:, 1:] = np.diff(x, axis=1) == 0
x[consec_dup] = -1
return x
# current output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, 5, -1, -1, 3],
# [ 0, 2, -1, -1, -1, 1, -1, 4]])
However, in this case the one occurrence left unchanged is always the first.
My goal is to leave the middle occurrence unchanged.
So given the same x as input, the desired output of function replace_consecutive_duplicates is:
# desired output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, -1, 5, -1, 3],
# [ 0, -1, 2, -1, -1, 1, -1, 4]])
Note that in case consecutive duplicate sequences with an even number of occurrences the middle left value should be unchanged. So the consecutive duplicate sequence [2, 2, 2, 2] in x[1] becomes [-1, 2, -1, -1]
Also note that I'm looking for a vectorized solution for 2D numpy arrays since performance is of absolute importance in my particular use case.
I've already tried looking at things like run length encoding and using np.diff(), but I didn't manage to solve this. Hope you guys can help!
The main problem is that you require the length of the number of consecutives values. This is not easy to get with numpy, but using itertools.groupby we can solve it using the following code.
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
def replace_row(arr: np.ndarray, new_val=-1):
results = []
for val, count in itertools.groupby(arr):
k = len(list(count))
results.extend([new_val] * ((k - 1) // 2))
results.append(val)
results.extend([new_val] * (k // 2))
return np.fromiter(results, arr.dtype)
if __name__ == '__main__':
for idx, row in enumerate(x):
x[idx, :] = replace_row(row)
print(x)
Output:
[[ 1 2 8 4 -1 5 -1 3]
[ 0 -1 2 -1 -1 1 -1 4]]
This isn't vectorized, but can be combined with multi threading since every row is handled one by one.
I have 2 numpy arrays and I want whenever element B is 1, the element in A is equal to 0. Both arrays are always in the same dimension:
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
I tried to do numpy slicing but I still can't get it to work.
B[A==1]=0
How can I achieve this in numpy without doing the conventional loop ?
First, you need them to be numpy arrays and not lists. Then, you just inverted B and A.
import numpy as np
A = np.array([1, 2, 3, 4, 5])
B = np.array([0, 0, 0, 1, 0])
A[B==1]=0 ## array([1, 2, 3, 0, 5])
If you use lists instead, here is what you get
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
A[B==1]=0 ## [0, 2, 3, 4, 5]
That's because B == 1 is False or 0 (instead of an array). So you essentially write A[0] = 0
Isn't it that what you want to do ?
A[B==1] = 0
A
array([1, 2, 3, 0, 5])
I have couple of lists:
a = [1,2,3]
b = [1,2,3,4,5,6]
which are of variable length.
I want to return a vector of length five, such that if the input list length is < 5 then it will be padded with zeros on the right, and if it is > 5, then it will be truncated at the 5th element.
For example, input a would return np.array([1,2,3,0,0]), and input b would return np.array([1,2,3,4,5]).
I feel like I ought to be able to use np.pad, but I can't seem to follow the documentation.
This might be slow or fast, I am not sure, however it works for your purpose.
In [22]: pad = lambda a,i : a[0:i] if len(a) > i else a + [0] * (i-len(a))
In [23]: pad([1,2,3], 5)
Out[23]: [1, 2, 3, 0, 0]
In [24]: pad([1,2,3,4,5,6,7], 5)
Out[24]: [1, 2, 3, 4, 5]
np.pad is overkill, better for adding a border all around a 2d image than adding some zeros to a list.
I like the zip_longest, especially if the inputs are lists, and don't need to be arrays. It's probably the closest you'll find to a code that operates on all lists at once in compiled code).
a, b = zip(*list(itertools.izip_longest(a, b, fillvalue=0)))
is a version that does not use np.array at all (saving some array overhead)
But by itself it does not truncate. It stills something like [x[:5] for x in (a,b)].
Here's my variation on all_ms function, working with a simple list or 1d array:
def foo_1d(x, n=5):
x = np.asarray(x)
assert x.ndim==1
s = np.min([x.shape[0], n])
ret = np.zeros((n,), dtype=x.dtype)
ret[:s] = x[:s]
return ret
In [772]: [foo_1d(x) for x in [[1,2,3], [1,2,3,4,5], np.arange(10)[::-1]]]
Out[772]: [array([1, 2, 3, 0, 0]), array([1, 2, 3, 4, 5]), array([9, 8, 7, 6, 5])]
One way or other the numpy solutions do the same thing - construct a blank array of the desired shape, and then fill it with the relevant values from the original.
One other detail - when truncating the solution could, in theory, return a view instead of a copy. But that requires handling that case separately from a pad case.
If the desired output is a list of equal lenth arrays, it may be worth while collecting them in a 2d array.
In [792]: def foo1(x, out):
x = np.asarray(x)
s = np.min((x.shape[0], out.shape[0]))
out[:s] = x[:s]
In [794]: lists = [[1,2,3], [1,2,3,4,5], np.arange(10)[::-1], []]
In [795]: ret=np.zeros((len(lists),5),int)
In [796]: for i,xx in enumerate(lists):
foo1(xx, ret[i,:])
In [797]: ret
Out[797]:
array([[1, 2, 3, 0, 0],
[1, 2, 3, 4, 5],
[9, 8, 7, 6, 5],
[0, 0, 0, 0, 0]])
Pure python version, where a is a python list (not a numpy array): a[:n] + [0,]*(n-len(a)).
For example:
In [42]: n = 5
In [43]: a = [1, 2, 3]
In [44]: a[:n] + [0,]*(n - len(a))
Out[44]: [1, 2, 3, 0, 0]
In [45]: a = [1, 2, 3, 4]
In [46]: a[:n] + [0,]*(n - len(a))
Out[46]: [1, 2, 3, 4, 0]
In [47]: a = [1, 2, 3, 4, 5]
In [48]: a[:n] + [0,]*(n - len(a))
Out[48]: [1, 2, 3, 4, 5]
In [49]: a = [1, 2, 3, 4, 5, 6]
In [50]: a[:n] + [0,]*(n - len(a))
Out[50]: [1, 2, 3, 4, 5]
Function using numpy:
In [121]: def tosize(a, n):
.....: a = np.asarray(a)
.....: x = np.zeros(n, dtype=a.dtype)
.....: m = min(n, len(a))
.....: x[:m] = a[:m]
.....: return x
.....:
In [122]: tosize([1, 2, 3], 5)
Out[122]: array([1, 2, 3, 0, 0])
In [123]: tosize([1, 2, 3, 4], 5)
Out[123]: array([1, 2, 3, 4, 0])
In [124]: tosize([1, 2, 3, 4, 5], 5)
Out[124]: array([1, 2, 3, 4, 5])
In [125]: tosize([1, 2, 3, 4, 5, 6], 5)
Out[125]: array([1, 2, 3, 4, 5])
I am trying to optimise some code by removing for loops and using numpy arrays only as I am working with large data sets.
I would like to take a 1D numpy array, for example:
a = [1, 2, 3, 4, 5]
and produce a 2D numpy array whereby the value in each column shifts along a place, for example in the case above for a I wish to have a function which returns:
[[1 2 3 4 5]
[0 1 2 3 4]
[0 0 1 2 3]
[0 0 0 1 2]
[0 0 0 0 1]]
I have found examples which use the strides function to do something similar to produce, for example:
[[1 2 3]
[2 3 4]
[3 4 5]]
However I am trying to shift each of my columns in the other direction. Alternatively, one can view the problem as putting the first element of a on the first diagonal, the second element on the second diagonal and so on. However, I would like to stress again how I would like to avoid using a for, while or if loop entirely. Any help would be greatly appreciated.
Such a matrix is an example of a Toeplitz matrix. You could use scipy.linalg.toeplitz to create it:
In [32]: from scipy.linalg import toeplitz
In [33]: a = range(1,6)
In [34]: toeplitz(a, np.zeros_like(a)).T
Out[34]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
Inspired by #EelcoHoogendoorn's answer, here's a variation that doesn't use as much memory as scipy.linalg.toeplitz:
In [47]: from numpy.lib.stride_tricks import as_strided
In [48]: a
Out[48]: array([1, 2, 3, 4, 5])
In [49]: t = as_strided(np.r_[a[::-1], np.zeros_like(a)], shape=(a.size,a.size), strides=(a.itemsize, a.itemsize))[:,::-1]
In [50]: t
Out[50]:
array([[1, 2, 3, 4, 5],
[0, 1, 2, 3, 4],
[0, 0, 1, 2, 3],
[0, 0, 0, 1, 2],
[0, 0, 0, 0, 1]])
The result should be treated as a "read only" array. Otherwise, you'll be in for some surprises when you change an element. For example:
In [51]: t[0,2] = 99
In [52]: t
Out[52]:
array([[ 1, 2, 99, 4, 5],
[ 0, 1, 2, 99, 4],
[ 0, 0, 1, 2, 99],
[ 0, 0, 0, 1, 2],
[ 0, 0, 0, 0, 1]])
Here is the indexing-tricks based solution. Not nearly as elegant as the toeplitz solution already posted, but should memory consumption or performance be a concern, it is to be preferred. As demonstrated, this also makes it easy to subsequently manipulate the entries of the matrix in a consistent manner.
import numpy as np
a = np.arange(5)+1
def toeplitz_view(a):
b = np.concatenate((np.zeros_like(a),a))
i = a.itemsize
v = np.lib.index_tricks.as_strided(b,
shape=(len(b),)*2,
strides=(-i, i))
#return a view on the 'original' data as well, for manipulation
return v[:len(a), len(a):], b[len(a):]
v, a = toeplitz_view(a)
print v
a[0] = 10
v[2,1] = -1
print v