Cross addition in pandas - python

How to apply cross addition (OR) in my pandas dataframe like below.
Input:
A B C D
0 0 1 0 1
Output:
A B C D
0 0 1 0 1
1 1 1 1 1
2 0 1 0 1
3 1 1 1 1
So far I can achieve using this,
cols=df.columns
n=len(cols)
df1=pd.concat([df]*n,ignore_index=True).eq(1)
df2= pd.concat([df.T]*n,axis=1,ignore_index=True).eq(1)
df2.columns=cols
df2=df2.reset_index(drop=True)
print (df1|df2).astype(int)
I think there is much more simpler way to handle this case.

You can use numpy | operation with broadcast as:
data = df.values
df = pd.DataFrame((data.T | data), columns=df.columns)
Or using np.logical_or as:
df = pd.DataFrame(np.logical_or(data,data.T).astype(int), columns=df.columns)
print(df)
A B C D
0 0 1 0 1
1 1 1 1 1
2 0 1 0 1
3 1 1 1 1

Numpy solution:
First extract first row to 1d array with iloc and then broadcast by a[:, None] for change shape to Mx1:
a = df.iloc[0].values
df = pd.DataFrame(a | a[:, None], columns=df.columns)
print (df)
A B C D
0 0 1 0 1
1 1 1 1 1
2 0 1 0 1
3 1 1 1 1

Related

How do you count the common 1's in pandas data frame?

I have this data for example:
A
B
C
Class_label
0
1
1
B_C
1
1
1
A_B_C
0
0
1
C
How do you obtain (classified label column) this and count the common ones and display that as well using pandas dataframe?
Use DataFrame.assign for add new columns by DataFrame.dot with columns names for labels and sum for count 1, but only numeric columns selected by DataFrame.select_dtypes:
df1 = df.select_dtypes(np.number)
df = df.assign(classifiedlabel = df1.dot(df1.columns + '_').str[:-1],
countones = df1.sum(axis=1))
print (df)
A B C D classifiedlabel countones
0 0 1 0 1 B_D 2
1 1 1 0 1 A_B_D 3
2 0 0 1 0 C 1
3 0 1 1 0 B_C 2
If column classifiedlabel exist simpliest is use sum only:
df["countones"] = df.sum(axis=1)
print (df)
A B C D classifiedlabel countones
0 0 1 0 1 B_D 2
1 1 1 0 1 A_B_D 3
2 0 0 1 0 C 1
3 0 1 1 0 B_C 2
If values are 1/0 then you can use:
(
df.assign(
count=df._get_numeric_data().sum(axis=1)
)
)
Output:
A B C D classifiedlabel count
0 0 1 0 1 B_D 2
1 1 1 0 1 A_B_D 3
2 0 0 1 0 C 1
3 0 1 1 0 B_C 2
Try:
df["number_of_ones"] = (df == 1).astype(int).sum(axis=1)
print(df)
A B C D classifiedlabel number_of_ones
0 0 1 0 1 B_D 2
1 1 1 0 1 A_B_D 3
2 0 0 1 0 C 1
3 0 1 1 0 B_C 2

How to built three input OR gate using Pandas

I am having dataframe df with 3 inputs (A,B,C) as listed below
A B C
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
I want built logicial or gate and have sample output like shown below
A B C Output
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 1
How can this be done in pandas
You just need to evaluate df.A | df.B | df.C.
df['OR_Gate'] = df.A | df.B | df.C
Note: If the values in columns A, B, C are strings of 0's and 1's, then do one of the following:
# Method-1:
# Convert the strings into int and then evaluate OR_Gate:
# This changes the value-types in the columns A, B, C
df = df.astype('int')
df['OR_Gate'] = df.A | df.B | df.C
# Method-2:
# This will not change the original data type in columns A, B, C
# But will correctly evaluate 'OR_Gate'.
df['OR_Gate'] = df.A.astype(int) | df.B.astype(int) | df.C.astype(int)
# Method-3:
# If you want your final output to be in boolean form.
df['OR_Gate'] = df.A.astype(bool) | df.B.astype(bool) | df.C.astype(bool)
Detailed Solution
import pandas as pd
# Dummy data
A = [0]*4 + [1]*4
B = [0]*2 + [1]*2 + [0]*2 + [1]*2
C = [0, 1]*4
# Make Dataframe
df = pd.DataFrame({'A': A, 'B': B, 'C': C})
# Update 'OR_Gate' Output
df['OR_Gate'] = df.A | df.B | df.C
df
Output:
A B C OR_Gate
0 0 0 0 0
1 0 0 1 1
2 0 1 0 1
3 0 1 1 1
4 1 0 0 1
5 1 0 1 1
6 1 1 0 1
7 1 1 1 1
You can require all values meet the condition.
df = pd.DataFrame([[0,0,0],[0,0,1], [0,1,0], [0,0,0]], columns=['A', 'B', 'C'])
df['Output'] = df.eq(False).all(1).astype(int)
df
Out[1]:
A B C Output
0 0 0 0 0
1 0 0 1 1
2 0 1 0 1
3 0 0 0 0

How can i make a dataframe where the count is higher than a specific value?

I have a df that looks like this:
df
a b c d
0 1 0 0 1
1 1 1 0 1
2 0 1 1 1
3 1 0 0 1
I try to get a df where the count of the columns is higher than 2. But can't find the solution for this. It should look like this:
a d
0 1 1
1 1 1
2 0 1
3 1 1
If there are only 1 and 0 values use DataFrame.loc with boolean indexing, first : is for match all rows:
df = df.loc[:, df.sum() > 2]
print (df)
a d
0 1 1
1 1 1
2 0 1
3 1 1
Detail:
print (df.sum())
a 3
b 2
c 1
d 4
dtype: int64
print (df.sum() > 2)
a True
b False
c False
d True
dtype: bool
If possible another values and need count only 1:
df = df.loc[:, df.eq(1).sum() > 2]

Create a categorical column based on different binary columns in python

I have a dataset that looks like this:
df = pd.DataFrame(data= [[0,0,1],[1,0,0],[0,1,0]], columns = ['A','B','C'])
A B C
0 0 0 1
1 1 0 0
2 0 1 0
I want to create a new column where on each row appears the value of the previous column where there is a 1:
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Use dot:
df['value'] = df.values.dot(df.columns)
Output:
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Using pd.DataFrame.idxmax:
df['value'] = df.idxmax(1)
print(df)
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B

How do I create dummy variables for a subset of a categorical variable?

Example
>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> s
0 a
1 b
2 c
3 a
dtype: object
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
Now I would like to map a and b to a dummy variable, but nothing else. How can I do that?
What I tried
>>> pd.get_dummies(s, columns=['a', 'b'])
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
A simpler method is to just mask the resultant df with the cols of interest:
In[16]:
pd.get_dummies(s)[list('ab')]
Out[16]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
So this will sub-select the resultant dummies df with the cols of interest
If you don't want to calculate the dummies column for the columns that you are not interested in the first place, then you could filter out the rows of interest but this requires reindexing with a fill_value (thanks to #jezrael for the suggestion):
In[20]:
pd.get_dummies(s[s.isin(list('ab'))]).reindex(s.index, fill_value=0)
Out[20]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
Setting everything else to nan is one option:
s[~((s == 'a') | (s == 'b'))] = float('nan')
which yields:
>>> pd.get_dummies(s)
a b
0 1 0
1 0 1
2 0 0
3 1 0
Another way
In [3907]: pd.DataFrame({c:s.eq(c).astype(int) for c in ['a', 'b']})
Out[3907]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
Or, (s==c).astype(int)

Categories