I am having dataframe df with 3 inputs (A,B,C) as listed below
A B C
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
I want built logicial or gate and have sample output like shown below
A B C Output
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 1
How can this be done in pandas
You just need to evaluate df.A | df.B | df.C.
df['OR_Gate'] = df.A | df.B | df.C
Note: If the values in columns A, B, C are strings of 0's and 1's, then do one of the following:
# Method-1:
# Convert the strings into int and then evaluate OR_Gate:
# This changes the value-types in the columns A, B, C
df = df.astype('int')
df['OR_Gate'] = df.A | df.B | df.C
# Method-2:
# This will not change the original data type in columns A, B, C
# But will correctly evaluate 'OR_Gate'.
df['OR_Gate'] = df.A.astype(int) | df.B.astype(int) | df.C.astype(int)
# Method-3:
# If you want your final output to be in boolean form.
df['OR_Gate'] = df.A.astype(bool) | df.B.astype(bool) | df.C.astype(bool)
Detailed Solution
import pandas as pd
# Dummy data
A = [0]*4 + [1]*4
B = [0]*2 + [1]*2 + [0]*2 + [1]*2
C = [0, 1]*4
# Make Dataframe
df = pd.DataFrame({'A': A, 'B': B, 'C': C})
# Update 'OR_Gate' Output
df['OR_Gate'] = df.A | df.B | df.C
df
Output:
A B C OR_Gate
0 0 0 0 0
1 0 0 1 1
2 0 1 0 1
3 0 1 1 1
4 1 0 0 1
5 1 0 1 1
6 1 1 0 1
7 1 1 1 1
You can require all values meet the condition.
df = pd.DataFrame([[0,0,0],[0,0,1], [0,1,0], [0,0,0]], columns=['A', 'B', 'C'])
df['Output'] = df.eq(False).all(1).astype(int)
df
Out[1]:
A B C Output
0 0 0 0 0
1 0 0 1 1
2 0 1 0 1
3 0 0 0 0
Related
Let's say we have the following df:
id
A
B
C
D
123
1
1
0
0
456
0
1
1
0
786
1
0
0
0
The id column represents a unique client.
Columns A, B, C, and D represent a product. These columns' values are binary.
1 means the client has that product.
0 means the client doesn't have that product.
I want to create a matrix table of sorts that counts the number of combinations of products that exist for all users.
This would be the desired output, given the df provided above:
A
B
C
D
A
2
1
0
0
B
0
2
1
0
C
0
1
1
0
D
0
0
1
0
import pandas as pd
df = pd.read_fwf('table.dat', infer_nrows=1001)
cols = ['A', 'B', 'C', 'D']
df2 = df[cols]
df2.T.dot(df2)
Result:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0
I think you want a dot product:
df2 = df.set_index('id')
out = df2.T.dot(df2)
Output:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0
How to apply cross addition (OR) in my pandas dataframe like below.
Input:
A B C D
0 0 1 0 1
Output:
A B C D
0 0 1 0 1
1 1 1 1 1
2 0 1 0 1
3 1 1 1 1
So far I can achieve using this,
cols=df.columns
n=len(cols)
df1=pd.concat([df]*n,ignore_index=True).eq(1)
df2= pd.concat([df.T]*n,axis=1,ignore_index=True).eq(1)
df2.columns=cols
df2=df2.reset_index(drop=True)
print (df1|df2).astype(int)
I think there is much more simpler way to handle this case.
You can use numpy | operation with broadcast as:
data = df.values
df = pd.DataFrame((data.T | data), columns=df.columns)
Or using np.logical_or as:
df = pd.DataFrame(np.logical_or(data,data.T).astype(int), columns=df.columns)
print(df)
A B C D
0 0 1 0 1
1 1 1 1 1
2 0 1 0 1
3 1 1 1 1
Numpy solution:
First extract first row to 1d array with iloc and then broadcast by a[:, None] for change shape to Mx1:
a = df.iloc[0].values
df = pd.DataFrame(a | a[:, None], columns=df.columns)
print (df)
A B C D
0 0 1 0 1
1 1 1 1 1
2 0 1 0 1
3 1 1 1 1
I have a dataset that looks like this:
df = pd.DataFrame(data= [[0,0,1],[1,0,0],[0,1,0]], columns = ['A','B','C'])
A B C
0 0 0 1
1 1 0 0
2 0 1 0
I want to create a new column where on each row appears the value of the previous column where there is a 1:
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Use dot:
df['value'] = df.values.dot(df.columns)
Output:
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Using pd.DataFrame.idxmax:
df['value'] = df.idxmax(1)
print(df)
A B C value
0 0 0 1 C
1 1 0 0 A
2 0 1 0 B
Example
>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> s
0 a
1 b
2 c
3 a
dtype: object
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
Now I would like to map a and b to a dummy variable, but nothing else. How can I do that?
What I tried
>>> pd.get_dummies(s, columns=['a', 'b'])
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
A simpler method is to just mask the resultant df with the cols of interest:
In[16]:
pd.get_dummies(s)[list('ab')]
Out[16]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
So this will sub-select the resultant dummies df with the cols of interest
If you don't want to calculate the dummies column for the columns that you are not interested in the first place, then you could filter out the rows of interest but this requires reindexing with a fill_value (thanks to #jezrael for the suggestion):
In[20]:
pd.get_dummies(s[s.isin(list('ab'))]).reindex(s.index, fill_value=0)
Out[20]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
Setting everything else to nan is one option:
s[~((s == 'a') | (s == 'b'))] = float('nan')
which yields:
>>> pd.get_dummies(s)
a b
0 1 0
1 0 1
2 0 0
3 1 0
Another way
In [3907]: pd.DataFrame({c:s.eq(c).astype(int) for c in ['a', 'b']})
Out[3907]:
a b
0 1 0
1 0 1
2 0 0
3 1 0
Or, (s==c).astype(int)
I have a column in a DataFrame (which is a column in a csv) which are comma-separated values. I'd like to split this column into multiple columns.
The problem is an old one, and has been discussed here also, but there is one peculiarity: one entry may be from 0-n comma-separated values. An example:
df.head():
i: vals | sth_else
---------------------
1: a,b,c | ba
2: a,d | be
3: | bi
4: e,a,c | bo
5: e | bu
I'd like the following output (or similar, e.g. True/False):
i : a | b | c | d | e | sth_else
-----------------------------------
1: 1 | 1 | 1 | 0 | 0 | ba
2: 1 | 0 | 0 | 1 | 0 | be
3: 0 | 0 | 0 | 0 | 0 | bi
4: 1 | 0 | 1 | 0 | 1 | bo
5: 0 | 0 | 0 | 0 | 1 | bu
I'm currently experimenting with the Series.str.split and then Series.to_dict functions, but with out any satisfactory results (causing always a ValueError: arrays must all be same length. :)
Also, I always try to find elegant solutions which are easily understandable when looked at after a couple of months ;). In any case, propositions are highly appreciated!
Here is the dummy.csv for testing.
vals;sth_else
a,b,c;ba
a,d;be
;bi
e,a,c;bo
e;bu
import pandas as pd
from StringIO import StringIO # py2.7 used here
# from io.StringIO import StringIO if you have py3.x
# data
# ==================================================================
csv_buffer = 'vals;sth_else\na,b,c;ba\na,d;be\n;bi\ne,a,c;bo\ne;bu'
df = pd.read_csv(StringIO(csv_buffer), sep=';')
Out[58]:
vals sth_else
0 a,b,c ba
1 a,d be
2 NaN bi
3 e,a,c bo
4 e bu
# processing
# ==================================================================
def func(group):
return pd.Series(group.vals.str.split(',').values[0], name='vals')
ser = df.groupby(level=0).apply(func)
Out[60]:
0 0 a
1 b
2 c
1 0 a
1 d
2 0 NaN
3 0 e
1 a
2 c
4 0 e
Name: vals, dtype: object
# use get_dummies, and then aggregate for each column of a b c d e to be its max (max is always 1 in this case)
pd.get_dummies(ser)
Out[85]:
a b c d e
0 0 1 0 0 0 0
1 0 1 0 0 0
2 0 0 1 0 0
1 0 1 0 0 0 0
1 0 0 0 1 0
2 0 0 0 0 0 0
3 0 0 0 0 0 1
1 1 0 0 0 0
2 0 0 1 0 0
4 0 0 0 0 0 1
# do this groupby on outer index level [0,1,2,3,4] and reduce any inner group from multiple rows to one row
df_dummies = pd.get_dummies(ser).groupby(level=0).apply(lambda group: group.max())
Out[64]:
a b c d e
0 1 1 1 0 0
1 1 0 0 1 0
2 0 0 0 0 0
3 1 0 1 0 1
4 0 0 0 0 1
df_dummies['sth_else'] = df.sth_else
Out[67]:
a b c d e sth_else
0 1 1 1 0 0 ba
1 1 0 0 1 0 be
2 0 0 0 0 0 bi
3 1 0 1 0 1 bo
4 0 0 0 0 1 bu
This is very similar to another question today. As I said in that question, there may be a simple elegant pandas way to do this, but I also find it convenient to simply create a new data frame and populate it by iterating over the original one in the following fashion:
#import and create your data
import pandas as pd
DF = pd.DataFrame({ 'vals' : ['a,b,c', 'a,d', '', 'e,a,c', 'e'],
'other' : ['ba', 'be', 'bi', 'bo', 'bu']
}, dtype = str)
Now create a new data frame with the other column form the DF as the index and columns that are drawn from the unique characters found in your val column in the DF:
New_DF = pd.DataFrame({col : 0 for col in
set([letter for letter in ''.join([char for char in DF.vals.values])
if letter.isalpha()])},
index = DF.other)
In [51]: New_DF
Out[51]:
a b c d e
other
ba 0 0 0 0 0
be 0 0 0 0 0
bi 0 0 0 0 0
bo 0 0 0 0 0
bu 0 0 0 0 0
Now simply iterate over the index of the New_DF slicing the original DF at that value and iterate over the columns to see if they appear in the relevant_string:
for ind in New_DF.index:
relevant_string = str(DF[DF.other == ind].vals.values)
for col in list(New_DF.columns):
if col in relevant_string:
New_DF.loc[ind, col] += 1
Output looks like this
In [54]: New_DF
Out[54]:
a b c d e
other
ba 1 1 1 0 0
be 1 0 0 1 0
bi 0 0 0 0 0
bo 1 0 1 0 1
bu 0 0 0 0 1