Counting number of zeros per row by Pandas DataFrame? - python

Given a DataFrame I would like to compute number of zeros per each row. How can I compute it with Pandas?
This is presently what I ve done, this returns indices of zeros
def is_blank(x):
return x == 0
indexer = train_df.applymap(is_blank)

Use a boolean comparison which will produce a boolean df, we can then cast this to int, True becomes 1, False becomes 0 and then call count and pass param axis=1 to count row-wise:
In [56]:
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
df
Out[56]:
a b c
0 1 0 0
1 0 0 0
2 0 1 0
3 1 0 0
4 3 1 0
In [64]:
(df == 0).astype(int).sum(axis=1)
Out[64]:
0 2
1 3
2 2
3 2
4 1
dtype: int64
Breaking the above down:
In [65]:
(df == 0)
Out[65]:
a b c
0 False True True
1 True True True
2 True False True
3 False True True
4 False False True
In [66]:
(df == 0).astype(int)
Out[66]:
a b c
0 0 1 1
1 1 1 1
2 1 0 1
3 0 1 1
4 0 0 1
EDIT
as pointed out by david the astype to int is unnecessary as the Boolean types will be upcasted to int when calling sum so this simplifies to:
(df == 0).sum(axis=1)

You can count the zeros per column using the following function of python pandas.
It may help someone who needs to count the particular values per each column
df.isin([0]).sum(axis=1)
Here df is the dataframe and the value which we want to count is 0

Here is another solution using apply() and value_counts().
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
df.apply( lambda s : s.value_counts().get(key=0,default=0), axis=1)

Given the following dataframe df
df = pd.DataFrame({'A': [1, 1, 1, 1, 1, 0, 1, 0, 0, 0],
'B': [0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
'C': [1, 1, 1, 0, 0, 1, 0, 0, 0, 0],
'D': [0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
'E': [0, 0, 1, 0, 1, 0, 0, 1, 0, 1]})
[Out]:
A B C D E
0 1 0 1 0 0
1 1 0 1 0 0
2 1 0 1 0 1
3 1 0 0 0 0
4 1 1 0 0 1
5 0 0 1 0 0
6 1 0 0 1 0
7 0 0 0 0 1
8 0 0 0 1 0
9 0 1 0 0 1
Apart from the various answers mentioned before, if the requirement is just using Pandas, another option would be using pandas.DataFrame.eq
df['Zero_Count'] = df.eq(0).sum(axis=1)
[Out]:
A B C D E Zero_Count
0 1 0 1 0 0 3
1 1 0 1 0 0 3
2 1 0 1 0 1 2
3 1 0 0 0 0 4
4 1 1 0 0 1 2
5 0 0 1 0 0 4
6 1 0 0 1 0 3
7 0 0 0 0 1 4
8 0 0 0 1 0 4
9 0 1 0 0 1 3
However, one can also do it with numpy using numpy.sum
import numpy as np
df['Zero_Count'] = np.sum(df == 0, axis=1)
[Out]:
A B C D E Zero_Count
0 1 0 1 0 0 3
1 1 0 1 0 0 3
2 1 0 1 0 1 2
3 1 0 0 0 0 4
4 1 1 0 0 1 2
5 0 0 1 0 0 4
6 1 0 0 1 0 3
7 0 0 0 0 1 4
8 0 0 0 1 0 4
9 0 1 0 0 1 3
Or even using numpy.count_nonzero as follows
df['Zero_Count'] = np.count_nonzero(df == 0, axis=1)
[Out]:
A B C D E Zero_Count
0 1 0 1 0 0 3
1 1 0 1 0 0 3
2 1 0 1 0 1 2
3 1 0 0 0 0 4
4 1 1 0 0 1 2
5 0 0 1 0 0 4
6 1 0 0 1 0 3
7 0 0 0 0 1 4
8 0 0 0 1 0 4
9 0 1 0 0 1 3

Related

How to change previous and current row value to 1 based on current value at each index

I want to replace 1 where x is present in each row and its previous x-1 rows at each index wherever that number (x) is present.
df = {1: [1,0,0,0,0], 2: [0,2,0,0,0],3:[0,0,3,0,0],4:[0,0,0,0,0],5:[0,0,0,0,5]}
randomdf = pd.DataFrame(df)
randomdf
PS. There are many such 2's ,3's in those respective columns, and i have shared a sample df for clarity
Input:-
Here x can be 1,2,3,4 or 5
so if x = 5 , i want to replace 5 with 1 and also i want to insert 1 in previous x-1 rows i.e ( previous 4 rows) as shown in required output.
1 2 3 4 5
0 1 0 0 0 0
1 0 2 0 0 0
2 0 0 3 0 0
3 0 0 0 0 0
4 0 0 0 0 5
required output:-
1 2 3 4 5
0 1 1 1 0 1
1 0 1 1 0 1
2 0 0 1 0 1
3 0 0 0 0 1
4 0 0 0 0 1
First change all non-zero values to 1, then use replace
df = {1: [1, 0, 0, 0, 0], 2: [0, 2, 0, 0, 0], 3: [0, 0, 3, 0, 0],
4: [0, 0, 0, 0, 0], 5: [0, 0, 0, 0, 5]}
df = pd.DataFrame(df)
df[df != 0] = 1
df = df.replace(to_replace=0, method='bfill')
1 2 3 4 5
0 1 1 1 0 1
1 0 1 1 0 1
2 0 0 1 0 1
3 0 0 0 0 1
4 0 0 0 0 1

Pandas resetting cumsum() based on a condition of another column

I have a column called 'on' with a series of 0 and 1:
d1 = {'on': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0]}
df = pd.DataFrame(d1)
I want to create a new column called 'value' such that it do a cumulative count cumsum() only when the '1' of the 'on' column is on and recount from zero once the 'on' column shows zero.
I tried using a combination of cumsum() and np.where but I don't get what I want as follows:
df['value_try'] = df['on'].cumsum()
df['value_try'] = np.where(df['on'] == 0, 0, df['value_try'])
Attempt:
on value_try
0 0 0
1 0 0
2 0 0
3 1 1
4 1 2
5 1 3
6 0 0
7 0 0
8 1 4
9 1 5
10 0 0
What my desired output would be:
on value
0 0 0
1 0 0
2 0 0
3 1 1
4 1 2
5 1 3
6 0 0
7 0 0
8 1 1
9 1 2
10 0 0
You can set groups on consecutive 0 or 1 by checking whether the value of on is equal to that of previous row by .shift() and get group number by .Series.cumsum(). Then for each group use .Groupby.cumsum() to get the value within group.
g = df['on'].ne(df['on'].shift()).cumsum()
df['value'] = df.groupby(g).cumsum()
Result:
print(df)
on value
0 0 0
1 0 0
2 0 0
3 1 1
4 1 2
5 1 3
6 0 0
7 0 0
8 1 1
9 1 2
10 0 0
Let us try cumcount + cumsum
df['out'] = df.groupby(df['on'].eq(0).cumsum()).cumcount()
Out[18]:
0 0
1 0
2 0
3 1
4 2
5 3
6 0
7 0
8 1
9 2
10 0
dtype: int64

Add a column according to other columns in a matrix. [python]

I got a matrix of the form:
1.0 2.0 3.0 4.0
1 0 0 0 1
2 0 0 1 0
3 1 0 0 0
4 0 1 0 0
5 1 0 0 0
6 0 0 0 0
7 1 0 0 0
I want to add another column in the matrix where its value will be 1 only if every other value is 0 and 0 otherwise. So visually i want this:
1.0 2.0 3.0 4.0 5.0
1 0 0 0 1 0
2 0 0 1 0 0
3 1 0 0 0 0
4 0 1 0 0 0
5 1 0 0 0 0
6 0 0 0 0 1
7 1 0 0 0 0
Lets try something different. we can take sun acriss axis 1 and convert to np.sign then subtract that result with 1 which converts 0 to 1 and 1 to 0.
df['5.0'] = 1-np.sign(df.sum(1))
Or with df.any(axis=1)
df['5.0'] = 1-df.any(1)
print(df)
1.0 2.0 3.0 4.0 5.0
1 0 0 0 1 0
2 0 0 1 0 0
3 1 0 0 0 0
4 0 1 0 0 0
5 1 0 0 0 0
6 0 0 0 0 1
7 1 0 0 0 0
If the a row can have just one 1 or less just do;
df['5.0'] = 1-df.sum(1)
this must do the job :
df['05']=(df.sum(axis=1)==0).astype(int)
Use df.apply to create a new series then assign like this:
df[5.0] = df.apply(lambda row: 1 if all(i == 0 for i in row) else 0, axis=1)
You can convert the matrix to Dataframe function:
matrixA = {}
matrixA['1'] = [0, 0, 0, 1]
matrixA['2'] = [0, 1, 0, 0]
matrixA['3'] = [0, 0, 1, 1]
matrixA['4'] = [0, 1, 1, 1]
df = pd.DataFrame(matrixA)
after that add a lambda function
df['5'] = df.apply(lambda x:
get_sum_of_1(list(x)),axis=1).reset_index(drop=True).copy()
and the calculated row function will be:
def get_sum_of_1(row):
return row.count(1) % 2

How can I count the unique values in a Pandas Dataframe?

I have a pandas DataFrame that looks like Y =
0 1 2 3
0 1 1 0 0
1 0 0 0 0
2 1 1 1 0
3 1 1 0 0
4 1 1 0 0
5 1 1 0 0
6 1 0 0 0
7 1 1 1 0
8 1 0 0 0
... .. .. .. ..
14989 1 1 1 1
14990 1 1 1 0
14991 1 1 1 1
14992 1 1 1 0
[14993 rows x 4 columns]
There are a total of 5 unique values:
1 1 0 0
0 0 0 0
1 1 1 0
1 0 0 0
1 1 1 1
For each unique value, I want to count how many times it's in the Y DataFrame
Let us using np.unique
c,v=np.unique(df.values,axis=0,return_counts =True)
c
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0]], dtype=int64)
v
array([1, 2, 4, 2], dtype=int64)
We can use .groupby for this to get the unique combinations.
While applying groupby, we calculate the size of the aggregation.
# Groupby on all columns which aggregates the data
df_group = df.groupby(list(df.columns)).size().reset_index()
# Because we used reset_index we need to rename our count column
df_group.rename({0:'count'}, inplace=True, axis=1)
Output
0 1 2 3 count
0 0 0 0 0 1
1 1 0 0 0 2
2 1 1 0 0 4
3 1 1 1 0 4
4 1 1 1 1 2
Note
I copied the example dataframe you provided.
Which looks like this:
print(df)
0 1 2 3
0 1 1 0 0
1 0 0 0 0
2 1 1 1 0
3 1 1 0 0
4 1 1 0 0
5 1 1 0 0
6 1 0 0 0
7 1 1 1 0
8 1 0 0 0
14989 1 1 1 1
14990 1 1 1 0
14991 1 1 1 1
14992 1 1 1 0
I made sample for you.
import itertools
import random
iter_list = list(itertools.product([0,1],[0,1],[0,1],[0,1]))
sum_list = []
for i in range(1000):
sum_list.append(random.choice(iter_list))
target_df = pd.DataFrame(sum_list)
target_df.reset_index().groupby(list(target_df.columns)).count().rename(columns ={'index':'count'}).reset_index()

How to get DataFrame indices for multiple rows configurations using Pandas in Python?

Consider the DataFrame P1 and P2:
P1 =
A B
0 0 0
1 0 1
2 1 0
3 1 1
P2 =
A B C
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
I would like to know if there is a concise and efficient way of getting the indices in P1 for the row (tuple/configurations/assignments) of columns ['A','B'] in P2.
That is, given P2['A','B']:
P2['A','B'] =
A B
0 0 0
1 0 0
2 0 1
3 0 1
4 1 0
5 1 0
6 1 1
7 1 1
I would like to get [0, 0, 1, 1, 2, 2, 3, 3], since the first and second rows in P2['A','B'] corresponds to the first row in P1, and so on.
You could use merge and extract the overlapping keys
In [3]: tmp = p2[['A', 'B']].merge(p1.reset_index())
In [4]: tmp
Out[4]:
A B index
0 0 0 0
1 0 0 0
2 0 1 1
3 0 1 1
4 1 0 2
5 1 0 2
6 1 1 3
7 1 1 3
Get the values.
In [5]: tmp['index'].values
Out[5]: array([0, 0, 1, 1, 2, 2, 3, 3], dtype=int64)
However, there could be a native NumPy method to do this aswell.

Categories