Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
i want to count number of consecutive zeros in my Dataframe shown below, help please
DEC JAN FEB MARCH APRIL MAY consecutive zeros
0 X X X 1 0 1 0
1 X X X 1 0 1 0
2 0 0 1 0 0 1 2
3 1 0 0 0 1 1 3
4 0 0 0 0 0 1 5
5 X 1 1 0 0 0 3
6 1 0 0 1 0 0 2
7 0 0 0 0 1 0 4
For each row, you want cumsum(1-row) with reset at every point when row == 1. Then you take the row max.
For example
ts = pd.Series([0,0,0,0,1,1,0,0,1,1,1,0])
ts2 = 1-ts
tsgroup = ts.cumsum()
consec_0 = ts2.groupby(tsgroup).transform(pd.Series.cumsum)
consec_0.max()
will give you 4 as needed.
Write that in a function and apply to your dataframe
Here's my two cents...
Think of all the other non-zero elements as 1, then you will have a binary code. All you need to do now is find the 'largest interval' where there's no bit flip starting with 0.
We can write a function and 'apply' with lambda
def len_consec_zeros(a):
a = np.array(list(a)) # convert elements to `str`
rr = np.argwhere(a == '0').ravel() # find out positions of `0`
if not rr.size: # if there are no zeros, return 0
return 0
full = np.arange(rr[0], rr[-1]+1) # get the range of spread of 0s
# get the indices where `0` was flipped to something else
diff = np.setdiff1d(full, rr)
if not diff.size: # if there are no bit flips, return the
return len(full) # size of the full range
# break the array into pieces wherever there's a bit flip
# and the result is the size of the largest chunk
pos, difs = full[0], []
for el in diff:
difs.append(el - pos)
pos = el + 1
difs.append(full[-1]+1 - pos)
# return size of the largest chunk
res = max(difs) if max(difs) != 1 else 0
return res
Now that you have this function, call it on every row...
# join all columns to get a string column
# assuming you have your data in `df`
df['concated'] = df.astype(str).apply(lambda x: ''.join(x), axis=1)
df['consecutive_zeros'] = df.concated.apply(lambda x: len_consec_zeros(x))
Here's one approach -
# Inspired by https://stackoverflow.com/a/44385183/
def pos_neg_counts(mask):
idx = np.flatnonzero(mask[1:] != mask[:-1])
if len(idx)==0: # To handle all 0s or all 1s cases
if mask[0]:
return np.array([mask.size]), np.array([0])
else:
return np.array([0]), np.array([mask.size])
else:
count = np.r_[ [idx[0]+1], idx[1:] - idx[:-1], [mask.size-1-idx[-1]] ]
if mask[0]:
return count[::2], count[1::2] # True, False counts
else:
return count[1::2], count[::2] # True, False counts
def get_consecutive_zeros(df):
arr = df.values
mask = (arr==0) | (arr=='0')
zero_count = np.array([pos_neg_counts(i)[0].max() for i in mask])
zero_count[zero_count<2] = 0
return zero_count
Sample run -
In [272]: df
Out[272]:
DEC JAN FEB MARCH APRIL MAY
0 X X X 1 0 1
1 X X X 1 0 1
2 0 0 1 0 0 1
3 1 0 0 0 1 1
4 0 0 0 0 0 1
5 X 1 1 0 0 0
6 1 0 0 1 0 0
7 0 0 0 0 1 0
In [273]: df['consecutive_zeros'] = get_consecutive_zeros(df)
In [274]: df
Out[274]:
DEC JAN FEB MARCH APRIL MAY consecutive_zeros
0 X X X 1 0 1 0
1 X X X 1 0 1 0
2 0 0 1 0 0 1 2
3 1 0 0 0 1 1 3
4 0 0 0 0 0 1 5
5 X 1 1 0 0 0 3
6 1 0 0 1 0 0 2
7 0 0 0 0 1 0 4
Related
I have the following dataframe:
1 2 3 4 5 6 7 8 9
0 0 0 1 0 0 0 0 0 1
1 0 0 0 0 1 1 0 1 0
2 1 1 0 1 1 0 0 1 1
...
I want to get for each row the longest sequence of value 0 in the row.
so, the expected results for this dataframe will be an array that looks like this:
[5,4,2,...]
as on the first row, maximum sequenc eof value 0 is 5, ect.
I have seen this post and tried for the beginning to get this for the first row (though I would like to do this at once for the whole dataframe) but I got errors:
s=df_day.iloc[0]
(~s).cumsum()[s].value_counts().max()
TypeError: ufunc 'invert' not supported for the input types, and the
inputs could not be safely coerced to any supported types according to
the casting rule ''safe''
when I inserted manually the values like this:
s=pd.Series([0,0,1,0,0,0,0,0,1])
(~s).cumsum()[s].value_counts().max()
>>>7
I got 7 which is number of total 0 in the row but not the max sequence.
However, I don't understand why it raises the error at first, and , more important, I would like to run it on the end on the while dataframe and per row.
My end goal: get the maximum uninterrupted occurance of value 0 in a row.
Vectorized solution for counts consecutive 0 per rows, so for maximal use max of DataFrame c:
#more explain https://stackoverflow.com/a/52718619/2901002
m = df.eq(0)
b = m.cumsum(axis=1)
c = b.sub(b.mask(m).ffill(axis=1).fillna(0)).astype(int)
print (c)
1 2 3 4 5 6 7 8 9
0 1 2 0 1 2 3 4 5 0
1 1 2 3 4 0 0 1 0 1
2 0 0 1 0 0 1 2 0 0
df['max_consecutive_0'] = c.max(axis=1)
print (df)
1 2 3 4 5 6 7 8 9 max_consecutive_0
0 0 0 1 0 0 0 0 0 1 5
1 0 0 0 0 1 1 0 1 0 4
2 1 1 0 1 1 0 0 1 1 2
Use:
df = df.T.apply(lambda x: (x != x.shift()).astype(int).cumsum().where(x.eq(0)).dropna().value_counts().max())
OUTPUT
0 5
1 4
2 2
The following code should do the job.
the function longest_streak will count the number of consecutive zeros and return the max, and you can use apply on your df.
from itertools import groupby
def longest_streak(l):
lst = []
for n,c in groupby(l):
num,count = n,sum(1 for i in c)
if num==0:
lst.append((num,count))
maxx = max([y for x,y in lst])
return(maxx)
df.apply(lambda x: longest_streak(x),axis=1)
How to implement :
t=np.where(<exists at least 1 zero in the same column of t>,t,np.zeros_like(t))
in the "pythonic" way?
this code should set all column to zero in t if t has at least 1 zero in that column
Example :
1 1 1 1 1 1
0 1 1 1 1 1
1 1 0 1 0 1
should turn to
0 1 0 1 0 1
0 1 0 1 0 1
0 1 0 1 0 1
any is what you need
~(arr == 0).any(0, keepdims=True) * arr
0 1 0 1 0 1
0 1 0 1 0 1
0 1 0 1 0 1
this code should set all column to zero in t if t has at least 1 zero
in that column
The simplest way to do this particular task:
t * t.min(0)
A more general way to do it (in case you have an array with different values and the condition is: if a column has at least one occurrence of some_value, then set that column to some_value).
cond = (arr == some_value).any(0)
arr[:, cond] = some_value
I'm trying to create some extra features on a data set. I want to get a spatial context from the features I already have one hot encoded. So for example, I have this:
F1 F2 F3 F4
1 0 1 1 0
2 1 0 1 1
3 1 0 0 0
4 0 0 0 1
I want to create some new columns against the values here:
F1 F2 F3 F4 S1 S2 S3 S4
1 0 1 1 0 0 2 1 0
2 1 0 0 1 1 0 0 3
3 1 0 0 0 1 0 0 0
4 0 0 0 1 0 0 0 4
I'm hoping there is an easy way to do this, to calculate changes from the last value of the column and output that to a corresponding column. Any help is appreciated, thanks.
You could do:
def func(x):
# create result array
result = np.zeros(x.shape, dtype=np.int)
# get indices of array distinct of zero
w = np.argwhere(x).ravel()
# compute the difference between consecutive indices and add the first index + 1
array = np.hstack(([w[0] + 1], np.ediff1d(w)))
# set the values on result
np.put(result, w, array)
return result
columns = ['S{}'.format(i) for i in range(1, 5)]
s = pd.DataFrame(df.ne(0).apply(func, axis=1).values.tolist(),
columns=columns)
result = pd.concat([df, s], axis=1)
print(result)
Output
F1 F2 F3 F4 S1 S2 S3 S4
0 0 1 1 0 0 2 1 0
1 1 0 0 1 1 0 0 3
2 1 0 0 0 1 0 0 0
3 0 0 0 1 0 0 0 4
Note that you need to import numpy (import numpy as np) in order for func to work. The idea is to find the indices distinct of zero compute the difference between to consecutive values, set the first value as the index + 1, and do this for each row.
Whats the best way to do the following in python/pandas please?
I want to count the occurences where trend data 2 steps out of line with trend data 1 and reset the counter each time trend data 1 changes.
I'm struggling with the right way to do it on the dataframe creating a new column df['D'] in this example.
df['A'] = trend data 1
df['B'] = boolean indicator if trend data 1 changes
df['C'] = trend data 2
df['D'] = desired result
df['A'] df['B'] df['C'] df['D']
1 0 1 0
1 0 1 0
-1 1 -1 0
-1 0 -1 0
-1 0 1 1
-1 0 -1 1
-1 0 -1 1
-1 0 1 2
-1 0 1 2
-1 0 -1 2
1 1 1 0
1 0 1 0
1 0 -1 1
1 0 1 1
1 0 -1 2
1 0 1 2
1 0 1 2
in excel i would simply use:
=IF(B2=1,0,IF(AND((C2<>C1),(C2<>A2)),D1+1,D1))
however, i've always struggled in not being able to reference prior cells in pandas.
I can't use np.where(). I'm sure its just apply a function in the correct way but I can't seem to make it work referencing other columns and resetting the variable. I've looked at other answers but can't seem to find anything to work in this situation.
something like
note: create df['E'] = df['C'].shift(1)
def corrections(x):
if df['B'] == 1:
x = 0
elif ((df['C'] != df['E']) AND ( df['C'] != df['A'])):
x = x + 1
else:
x
apologies, as I feel i'm missing something rather simple with this question but just keep going round in circles!
def make_D (df):
counter = 0
array = []
for index in df.index:
if df.loc[index, 'A']!=df.loc[index, 'C']:
counter = counter + 1
if index>0:
if df.loc[index, 'B'] != df.loc[index-1, 'B']:
counter = 0
array.append(counter)
df['D'] = array
return (df)
new_df = make_D(df)
hope it helps!
#Set a list to store values for column D
d = []
#calculate D using the given conditions
df.apply(lambda x: d.append(0) if ((x.name==0)|(x.B==1)) else d.append(d[-1]+1) if (x.C!=df.iloc[x.name-1].C) & (x.C!=x.A) else d.append(d[-1]), axis=1)
#set columns D using values from the list d.
df['D'] = d
Out[594]:
A B C D
0 1 0 1 0
1 1 0 1 0
2 -1 1 -1 0
3 -1 0 -1 0
4 -1 0 1 1
5 -1 0 -1 1
6 -1 0 -1 1
7 -1 0 1 2
8 -1 0 1 2
9 -1 0 -1 2
10 1 1 1 0
11 1 0 1 0
12 1 0 -1 1
13 1 0 1 1
14 1 0 -1 2
15 1 0 1 2
16 1 0 1 2
This question already has an answer here:
Pandas Dataframe, change values on "diagonal" (where index-value is equal to column-name)
(1 answer)
Closed 6 years ago.
I have a dataframe with binary values. I want to change values to zero where row and column name is same. What is the easiest way to do this?
tmp_dict= {'x':[0,0,0,0,0],'y':[1,0,1,0,0],'z':[1,1,1,1,1],'p':[0,1,1,0,1],'q':[0,1,1,1,1]}
mydf= pd.DataFrame(tmp_dict, columns=['x','y','z','p','q'], index=['x','y','z','p','q'])
mydf
x y z p q
x 0 1 1 0 0
y 0 0 1 1 1
z 0 1 1 1 1
p 0 0 1 0 1
q 0 0 1 1 1
Desired dataframe:
x y z p q
x 0 1 1 0 0
y 0 0 1 1 1
z 0 1 0 1 1
p 0 0 1 0 1
q 0 0 1 1 0
You can loop through the columns and set the corresponding row(s) to zero using loc.
for col in mydf:
mydf.loc[col, col] = 0