Related
I have a 2d numpy array like so. I want to find the maximum consecutive streak of 1's for every row.
a = np.array([[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 1, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 0, 1],
[1, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 0],
]
)
Desired Output: [5, 1, 2, 0, 3, 1, 2, 2]
I have found the solution to above for a 1D array:
a = np.array([1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0])
d = np.diff(np.concatenate(([0], a, [0])))
np.max(np.flatnonzero(d == -1) - np.flatnonzero(d == 1))
> 4
On similar lines, I wrote the following but it doesn't work.
d = np.diff(np.column_stack(([0] * a.shape[0], a, [0] * a.shape[0])))
np.max(np.flatnonzero(d == -1) - np.flatnonzero(d == 1))
The 2D equivalent of you current code would be using pad, diff, where and maximum.reduceat:
# pad with a column of 0s on left/right
# and get the diff on axis=1
d = np.diff(np.pad(a, ((0,0), (1,1)), constant_values=0), axis=1)
# get row/col indices of -1
row, col = np.where(d==-1)
# get groups of rows
val, idx = np.unique(row, return_index=True)
# subtract col indices of -1/1 to get lengths
# use np.maximum.reduceat to get max length per group of rows
out = np.zeros(a.shape[0], dtype=int)
out[val] = np.maximum.reduceat(col-np.where(d==1)[1], idx)
Output: array([5, 1, 2, 0, 3, 1, 2, 2])
Intermediates:
np.pad(a, ((0,0), (1,1)), constant_values=0)
array([[0, 1, 1, 1, 1, 1, 0],
[0, 1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 0, 1, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 1, 0, 1, 1, 0, 0]])
np.diff(np.pad(a, ((0,0), (1,1)), constant_values=0), axis=1)
array([[ 1, 0, 0, 0, 0, -1],
[ 1, -1, 1, -1, 1, -1],
[ 1, 0, -1, 1, -1, 0],
[ 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, -1, 1, -1],
[ 1, -1, 0, 0, 0, 0],
[ 0, 1, 0, -1, 0, 0],
[ 1, -1, 1, 0, -1, 0]])
np.where(d==-1)
(array([0, 1, 1, 1, 2, 2, 4, 4, 5, 6, 7, 7]),
array([5, 1, 3, 5, 2, 4, 3, 5, 1, 3, 1, 4]))
col-np.where(d==1)[1]
array([5, 1, 1, 1, 2, 1, 3, 1, 1, 2, 1, 2])
np.unique(row, return_index=True)
(array([0, 1, 2, 4, 5, 6, 7]),
array([ 0, 1, 4, 6, 8, 9, 10]))
out = np.zeros(a.shape[0], dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0])
out[val] = np.maximum.reduceat(col-np.where(d==1)[1], idx)
array([5, 1, 2, 0, 3, 1, 2, 2])
Let's say I have this numpy array:
array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
And going row by row, I would like to see, by column, the cumulative count of the number that appears. So for this array, the result would be:
array([[0, 0, 0, 0, 0, 0], # (*0)
[0, 0, 0, 0, 0, 0], # (*1)
[0, 0, 0, 1, 1, 1], # (*2)
[0, 0, 0, 2, 0, 0], # (*3)
[0, 1, 0, 1, 1, 0]] # (*4)
(*0): first time each value appears
(*1): all values are different from the previous one (in the column)
(*2): For the last 3 columns, a 1 appears because there is already 1 value repetition.
(*3): For the 4th column, a 2 appears because it's the 3rd time that a 8 appears.
(*4): In the 4th column, a 1 appears because it's the 2nd time that a 9 appears in that column. Similarly, for the second and second to last column.
Any idea how to perform this?
Thanks!
Maybe there is a faster way using numpy ufuncs, however here is a solution using standard python:
from collections import defaultdict
import numpy as np
a = np.array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
# define function
def get_count(array):
count = []
for row in array.T:
occurences = defaultdict(int)
rowcount = []
for n in row:
occurences[n] += 1
rowcount.append(occurences[n] - 1)
count.append(rowcount)
return np.array(count).T
Output:
>>> get_count(a)
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 2, 0, 0],
[0, 1, 0, 1, 1, 0]])
I made a similar post here. Now I am trying to generalize what was done there for an entire matrix of numbers.
Specifically I want to do this:
dates = []
dates.append(NDD_month[0])
for i in range(1,len(cpi)):
dates.append((dates[i-1] + 12 - number_of_payments[:i]) % 12)
print(dates)
where the number_of_payments is a matrix of type <class 'list'>.
Here is an example:
print(number_of_payments[:1])
is
[array([[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1]])]
After performing what I want then
print(dates[:1])
Should be
[array([[8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8]])]
or something like that.
EDIT:
Here is an example of what my data looks like:
print(number_of_payments[:3])
This gives me this:
[
array(
[
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1]
]),
array(
[
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]),
array(
[
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 2, 0, 2, 1, 1, 0, 2, 1, 0, 0]
])
]
print(NDD_month[:3])
Gives me
[8, 7, 11]
Now for the answer I want I want to do something like this that I did in my earlier post where I had
dates = []
dates.append(NDD_month[0])
for i in range(1, len(first_payments)):
dates.append((dates[i-1] + 12 - first_payments[i-1]) % 12)
print(dates)
This gave me the correct output of
[8 8 7 7 6 5 4 4 11 10 10 8]
But now since I have the number_of_payments being a matrix I need to apply the same logic to this larger data structure. Let me know if that is clear.
Edit 2:
Okay this is hard to explain so I am going to go step by step example, I have this data or matrix (number_of_payments) whatever it is in python:
[[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]]
I have another list or vector called NDD_month, the first three elements are
[8, 7, 11]
Now for sake of simplicity lets say I just have the first row of number_of_payments i.e.
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1]
Further for simplicity lets say I have just the first element of NDD_month so
8
Then to get the answer I seek I would do this that Aurora Wang provided a nice answer too which was this
first_payments = number_of_payments[:1]
first_payments = first_payments[0][0]
dates = []
dates.append(NDD_month[0])
for i in range(1, len(first_payments)):
dates.append((dates[i-1] + 12 - first_payments[i-1]) % 12)
print(dates)
This gives me [8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8].
Now I need to do the same thing but for each row in the matrix and each element in the NDD_month vector. I hope that makes it much more clear.
I was thinking this may work but again I am new to python and this does not work:
dates = []
for i in range(1,len(NDD_month)):
dates.append(NDD_month[i-1])
for j in range(1, len(NDD_month)):
dates.append((dates[j-1] + 12 - number_of_payments[i-1][j-1]) % 12)
print(dates)
If I understood you right, you want to do something like this:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
NDD_month = [8, 7, 11]
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates)
I have a large matrix, I'd like to check that it has a column of all zeros somewhere in it. How to do that in numpy?
Here's one way:
In [19]: a
Out[19]:
array([[9, 4, 0, 0, 7, 2, 0, 4, 0, 1, 2],
[0, 2, 0, 0, 0, 7, 6, 0, 6, 2, 0],
[6, 8, 0, 4, 0, 6, 2, 0, 8, 0, 3],
[5, 4, 0, 0, 0, 0, 0, 0, 0, 3, 8]])
In [20]: (~a.any(axis=0)).any()
Out[20]: True
If you later decide that you need the column index:
In [26]: numpy.where(~a.any(axis=0))[0]
Out[26]: array([2])
Create an equals 0 mask (mat == 0), and run all on it along an axis.
(mat == 0).all(axis=0).any()
I have a list of lists and I want to be able to refer to the 1st, 2nd, 3rd, etc. column in a list of lists. Here is my code for the list:
matrix = [
[0, 0, 0, 5, 0, 0, 0, 0, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
I want to be able to say something like:
matrix = [
[0, 0, 0, 5, 0, 0, 0, 0, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
if (The fourth column in this matrix does not have any 1's in it):
(then do something)
I want to know what the python syntax would be for the stuff in parenthesis.
The standard way to perform what you asked is to do a list comprehension
if (The fourth column in this matrix does not have any 1's in it):
translates in:
>>>if not any([1 == row[3] for row in matrix])
However, depending on how often you need to perform this operation, how big is your matrix, etc... you might wish to look into numpy as it is easier (and remarkably faster) to address columns. An example:
>>> import numpy as np
>>> matrix = np.random.randint(0, 10, (5, 5))
>>> matrix
array([[3, 0, 9, 9, 3],
[5, 7, 7, 7, 6],
[5, 4, 6, 2, 2],
[1, 3, 5, 0, 5],
[3, 9, 7, 8, 6]])
>>> matrix[..., 3] #fourth column
array([9, 7, 2, 0, 8])
Try this:
if all(row[3] != 1 for row in matrix):
# do something
The row[3] part takes a look at the fourth element of a row, the for row in matrix part looks at all the rows in the matrix - this produces a list with all the fourth elements in all the rows, that is, the whole fourth column. Now if it is true for all the elements in the fourth column that they're different from one, then the condition is satisfied and you can do what you need inside the if.
A more traditional approach would be:
found_one = False
for i in xrange(len(matrix)):
if matrix[i][3] == 1:
found_one = True
break
if found_one:
# do something
Here I'm iterating over all the rows (i index) of the fourth column (3 index), and checking if an element is equal to one: if matrix[i][3] == 1:. Notice that the for cycle goes from the 0 index up to the "height" of the matrix minus one, that's what the xrange(len(matrix)) part says.
if 1 in [row[3] for row in matrix]: