having an array like this for example:
[1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
What's the fastest way in Python to get the non-zero elements organized in a list where each element contains the indexes of blocks of continuous non-zero values?
Here the result would be a list containing many arrays:
([0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21])
>>> L = [1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
>>> import itertools
>>> import operator
>>> [[i for i,value in it] for key,it in itertools.groupby(enumerate(L), key=operator.itemgetter(1)) if key != 0]
[[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
A trivial change to my answer at Finding the consecutive zeros in a numpy array gives the function find_runs:
def find_runs(value, a):
# Create an array that is 1 where a is `value`, and pad each end with an extra 0.
isvalue = np.concatenate(([0], np.equal(a, value).view(np.int8), [0]))
absdiff = np.abs(np.diff(isvalue))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example,
In [43]: x
Out[43]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
In [44]: find_runs(1, x)
Out[44]:
array([[ 0, 4],
[ 9, 12],
[14, 16],
[20, 22]])
In [45]: [range(*run) for run in find_runs(1, x)]
Out[45]: [[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
If the value 1 in your example was not representative, and you really want runs of any non-zero values (as suggested by the text of the question), you can change np.equal(a, value) to (a != 0) and change the arguments and comments appropriately. E.g.
def find_nonzero_runs(a):
# Create an array that is 1 where a is nonzero, and pad each end with an extra 0.
isnonzero = np.concatenate(([0], (np.asarray(a) != 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(isnonzero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example,
In [63]: y
Out[63]:
array([-1, 2, 99, 99, 0, 0, 0, 0, 0, 12, 13, 14, 0, 0, 1, 1, 0,
0, 0, 0, 42, 42])
In [64]: find_nonzero_runs(y)
Out[64]:
array([[ 0, 4],
[ 9, 12],
[14, 16],
[20, 22]])
Have a look at scipy.ndimage.measurements.label:
import numpy as np
from scipy.ndimage.measurements import label
x = np.asarray([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
labelled, numfeats = label(x)
indices = [np.nonzero(labelled == k) for k in np.unique(labelled)[1:]]
indices contains exactly what you asked for. Note that, depending on your ultimate goal, labelled might also give you useful (extra) information.
You can use np.split, once you know the interval of non-zeros' lengths and the corresponding indices in A. Assuming A as the input array, the implementation would look something like this -
# Append A on either sides with zeros
A_ext = np.diff(np.hstack(([0],A,[0])))
# Find interval of non-zeros lengths
interval_lens = np.where(A_ext==-1)[0] - np.where(A_ext==1)[0]
# Indices of non-zeros places in A
idx = np.arange(A.size)[A!=0]
# Finally split indices based on the interval lengths
out = np.split(idx,interval_lens.cumsum())[:-1]
Sample input, output -
In [53]: A
Out[53]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
In [54]: out
Out[54]: [array([0, 1, 2, 3]), array([ 9, 10, 11]), array([14, 15]), array([20, 21])]
Related
I have a list of lists, each indicating a numeric interval:
intervals = [[1, 4],
[7, 9],
[13, 18]
]
I need to create a list of 20 elements, where each element is 0 if its index is NOT contained in any of the intervals, and 1 otherwise. So, the desired output is:
output = [0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]
I can think of using something simple, in the lines of:
output = zeros(20)
for index, _ in enumerate(output):
for interval in intervals:
if interval[0] <= index <= interval[1]:
output[index] = 1
but is there a more efficient way?
Essentially the same as #mozway's answer, but without creating an intermediate data structure and arguably more readable:
output = np.zeros(N, dtype=int)
for start, end in intervals:
output[np.arange(start, end+1)] = 1
You could use advanced indexing:
a = np.zeros(20, dtype=int)
idx = np.hstack([np.r_[a:b+1] for a,b in intervals])
# array([ 1, 2, 3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18])
a[idx] = 1
output:
array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
There might be a more efficient way to do this, but this might get you started:
intervals = np.array([[1, 4], [7, 9], [13, 18]])
low, high = intervals[:,0], intervals[:,1]
r = np.arange(20)[:,None]
((low <= r) & (high >= r)).any(1).astype(int)
output:
array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
I have a nested list that contains 1002 time steps and in each time step, I have observation of 11 features. I have read docs related to padding but I really could not find out how to add zero elements at the end of each list. I found out the highest length of lists is for example the 24th item in my main list and now I want to pad all the rest elements based on this unless the 24th element that already in shape.As an example:
a = [[1,2,3,4,5,6,76,7],[2,2,3,4,2,5,5,5,,7,8,9,33,677,8,8,9,9],[2,3,46,7,8,9,],[3,3,3,5],[2,2],[1,1],[2,2]]
a[1] = padding(a[1],len(a[2]) with zeros at the end of the list)
I have done below:
import numpy as np
def pad_or_truncate(some_list, target_len):
return some_list[:target_len] + [0]*(target_len - len(some_list))
for i in range(len(Length)):
pad_or_truncate(Length[i],len(Length[24]))
print(len(Length[i]))
or
for i in range(len(Length)):
df_train_array = np.pad(Length[i],len(Length[24]),mode='constant')
and I got this error: Unable to coerce to Series, length must be 11: given 375
Solution 1
# set the max number of 0
max_len = max([len(x) for x in a])
# add zeros to the lists
temp = [x+ [0]*max_len for x in a]
#Limit the output to the wished length
[x[0:max_len] for x in temp]
Solution 2 using pandas
import pandas as pd
df = pd.DataFrame(a)
df.fillna(0).astype(int).values.tolist()
Output
[[1, 2, 3, 4, 5, 6, 76, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[2, 2, 3, 4, 2, 5, 5, 5, 7, 8, 9, 33, 677, 8, 8, 9, 9],
[2, 3, 46, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[3, 3, 3, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
...]
The following code snippet should pad the individual lists with the appropriate number of 0s(driven by the size of the list with the maximum elements)
def main():
data = [
[1,2,3,4,5,6,76,7],
[2,2,3,4,2,5,5,5,7,8,9,33,677,8,8,9,9],
[2,3,46,7,8,9,],
[3,3,3,5],
[2,2],
[1,1],
[2,2]
]
# find the list with the maximum elements
max_length = max(map(len, data))
for element in data:
for _ in range(len(element), max_length):
element.append(0)
if __name__ == '__main__':
main()
You can use this simple line, which uses np.pad
list(map(lambda x: np.pad(x, (max(map(len, a)) - len(x), 0)).tolist(), a))
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 76, 7],
[2, 2, 3, 4, 2, 5, 5, 5, 7, 8, 9, 33, 677, 8, 8, 9, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 46, 7, 8, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 5],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2]]
Use this if you want to pad at the end instead:
list(map(lambda x: np.pad(x, (0, max(map(len, a)) - len(x))).tolist(), a))
In python, I have a matrix and I want to find the two largest elements in every row and every column and change their values to 1 (seperately, I mean get two matrices where one of them modified the rows and the other modified the cols).
The main goal is to get a corresponding matrix with zeros everywhere except those ones I've put in the 2 largest element of each row and column (using np.where(mat == 1, 1, 0).
I'm trying to use the np.argpartition in order to do so but without success.
Please help.
See image below.
Here's an approach with np.argpartition -
idx_row = np.argpartition(-a,2,axis=1)[:,:2]
out_row = np.zeros(a.shape,dtype=int)
out_row[np.arange(idx_row.shape[0])[:,None],idx_row] = 1
idx_col = np.argpartition(-a,2,axis=0)[:2]
out_col = np.zeros(a.shape,dtype=int)
out_col[idx_col,np.arange(idx_col.shape[1])] = 1
Sample input, output -
In [40]: a
Out[40]:
array([[ 3, 7, 1, -5, 14, 2, 8],
[ 5, 8, 1, 4, -3, 3, 10],
[11, 3, 5, 1, 9, 2, 5],
[ 6, 4, 12, 6, 1, 15, 4],
[ 8, 2, 0, 1, -2, 3, 5]])
In [41]: out_row
Out[41]:
array([[0, 0, 0, 0, 1, 0, 1],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 1]])
In [42]: out_col
Out[42]:
array([[0, 1, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0]])
Alternatively, if you are into compact codes, we can skip the initialization and use broadcasting to get the outputs from idx_row and idx_col directly, like so -
out_row = (idx_row[...,None] == np.arange(a.shape[1])).any(1).astype(int)
out_col = (idx_col[...,None] == np.arange(a.shape[0])).any(0).astype(int).T
I have a large matrix, I'd like to check that it has a column of all zeros somewhere in it. How to do that in numpy?
Here's one way:
In [19]: a
Out[19]:
array([[9, 4, 0, 0, 7, 2, 0, 4, 0, 1, 2],
[0, 2, 0, 0, 0, 7, 6, 0, 6, 2, 0],
[6, 8, 0, 4, 0, 6, 2, 0, 8, 0, 3],
[5, 4, 0, 0, 0, 0, 0, 0, 0, 3, 8]])
In [20]: (~a.any(axis=0)).any()
Out[20]: True
If you later decide that you need the column index:
In [26]: numpy.where(~a.any(axis=0))[0]
Out[26]: array([2])
Create an equals 0 mask (mat == 0), and run all on it along an axis.
(mat == 0).all(axis=0).any()
I have a list of lists and I want to be able to refer to the 1st, 2nd, 3rd, etc. column in a list of lists. Here is my code for the list:
matrix = [
[0, 0, 0, 5, 0, 0, 0, 0, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
I want to be able to say something like:
matrix = [
[0, 0, 0, 5, 0, 0, 0, 0, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
if (The fourth column in this matrix does not have any 1's in it):
(then do something)
I want to know what the python syntax would be for the stuff in parenthesis.
The standard way to perform what you asked is to do a list comprehension
if (The fourth column in this matrix does not have any 1's in it):
translates in:
>>>if not any([1 == row[3] for row in matrix])
However, depending on how often you need to perform this operation, how big is your matrix, etc... you might wish to look into numpy as it is easier (and remarkably faster) to address columns. An example:
>>> import numpy as np
>>> matrix = np.random.randint(0, 10, (5, 5))
>>> matrix
array([[3, 0, 9, 9, 3],
[5, 7, 7, 7, 6],
[5, 4, 6, 2, 2],
[1, 3, 5, 0, 5],
[3, 9, 7, 8, 6]])
>>> matrix[..., 3] #fourth column
array([9, 7, 2, 0, 8])
Try this:
if all(row[3] != 1 for row in matrix):
# do something
The row[3] part takes a look at the fourth element of a row, the for row in matrix part looks at all the rows in the matrix - this produces a list with all the fourth elements in all the rows, that is, the whole fourth column. Now if it is true for all the elements in the fourth column that they're different from one, then the condition is satisfied and you can do what you need inside the if.
A more traditional approach would be:
found_one = False
for i in xrange(len(matrix)):
if matrix[i][3] == 1:
found_one = True
break
if found_one:
# do something
Here I'm iterating over all the rows (i index) of the fourth column (3 index), and checking if an element is equal to one: if matrix[i][3] == 1:. Notice that the for cycle goes from the 0 index up to the "height" of the matrix minus one, that's what the xrange(len(matrix)) part says.
if 1 in [row[3] for row in matrix]: