Python equivalent to dplyr's ifelse - python

I'm converting code from R to Python and am looking for some help with mutating a new column based on other columns, using dfply syntax/piping
In this example, I want to subtract 2 from col1 if col2 is 'c', otherwise add 4
import pandas as pd
import numpy as np
from dfply import *
col1 = [1,2,3,4,5]
col2 = ['a', 'b', 'c', 'd', 'e']
df = pd.DataFrame(data = {'col1': col1, 'col2': col2})
in R I would do:
df_new <- df %>%
mutate(newCol = ifelse(col2 == 'c', col1 - 2, col1 + 4))
but Python doesn't seem to like this:
new_df = (df >>
mutate(newCol = np.where(X.col2 == 'c', X.col1 - 2, X.col1 + 4)))
I get an error of "invalid __array_struct__"
Note that this works fine:
new_df = (df >>
mutate(newCol = X.col1 - 2))

I will use apply/lambda function. X is the dataframe row and axis=1 means apply the lambda function per column.
df['newCol'] = df.apply(lambda X: X.col1 - 2 if X.col2 == 'c' else X.col1 + 4, axis=1)
df
col1 col2 newCol
0 1 a 5
1 2 b 6
2 3 c 1
3 4 d 8
4 5 e 9

The python equivalent here would be a inline if else expression (or ternary operator):
ifelse(col2 == 'c', col1 - 2, col1 + 4)
Would then become
col1 - 2 if col2 == 'c' else col1 + 4

Related

How can I update values of a column in a dataframe based on its existence in another dataframe

import pandas as pd
import numpy as np
from numpy.random import randint
from IPython.display import display, HTML
dict_1 = {'col1':range(0,21),'col3':0}
dict_2 = {'col2':range(0,41,4)}
df = pd.DataFrame(dict_1)
df_2 = pd.DataFrame(dict_2)
So the goal is to compare all the values in df['col1'] in df_2['col2'] and for each row that equals the other I need to update col3 to 1 or some value other than what it is now
in this case I am looking for something of the sort:
df = pd.DataFrame({'col1':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20],'col3':[1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}
One option is isin + np.where
df['col3'] = np.where(df['col1'].isin(df_2['col2']), 1, 0)
df.head():
col1 col3
0 0 1
1 1 0
2 2 0
3 3 0
4 4 1
Edit to include col4 with left merge and use np.where to set values based on where col4 (values from df2) is nan:
Given
df = pd.DataFrame({'col1': range(0, 21), 'col3': 0})
df_2 = pd.DataFrame({'col2': [0, 4, 8, 12, 16],
'col4': ['a', 'b', 'c', 'd', 'e']})
df = df.merge(
df_2, left_on='col1', right_on='col2', how='left'
).drop(columns='col2').fillna('')
df['col3'] = np.where(df['col4'].isna(), 0, 1)
df.head():
col1 col3 col4
0 0 1 a
1 1 0
2 2 0
3 3 0
4 4 1 b
You can do it directly with pandas:
df_1['col3'] = df_1['col1'].isin(df_2['col2']).astype(int)

Replace a cell's value in pandas to a tuple

This might be a simple question but I tried searching for the answer and I can't seem to find it.
I have a pandas dataframe and I want to change the value of some cells to a tuple.
So if I have this:
Col0 Col1 Col2
3 a 6
7 b 8
I would like to change all the "a" values to a tuple:
Col0 Col1 Col2
3 (4,5) 6
7 b 8
I tried doing this:
df.loc[df["Col1"] == "a"] = (4,5,)
but it obviously didn't work. I don't know how I can do this.
How can I do this?
Let's try boolean indexing with loc to update the values in the cells in Col1 which contains a:
m = df['Col1'].eq('a')
df.loc[m, 'Col1'] = pd.Series([(4, 5)]*m.sum(), index=m[m].index)
Alternatively you can try .reindex with fill_value parameter set to tuple (4, 5):
m = df['Col1'].eq('a')
df['Col1'] = df.loc[~m, 'Col1'].reindex(m.index, fill_value=(4, 5))
Col0 Col1 Col2
0 3 (4, 5) 6
1 7 b 8
Maybe something like this?
import pandas as pd
df = pd.DataFrame(data = {'Col0': [3,7], 'Col1': ['a', 'b'], 'Col2': [6, 8]})
df.set_value(0, 'Col1', (4, 5))
Or, if you don't know where 'a' is (and in case we have more than one 'a0 per column, you can do that with a loop:
import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'Col0': [3,7], 'Col1': ['a', 'b'], 'Col2': [6, 8]})
# find the position of a in 'Col1'
where_a = np.where(df['Col1'] == 'a')[0]
# replace a with tuple (4, 5)
for x in where_a:
df.set_value(x, 'Col1', (4, 5))
Col0 Col1 Col2
0 3 (4, 5) 6
1 7 b 8

How do I add from one column only when string matches in another?

trying to add up numbers from one column only when it matches the string in another
for instance, add ints in col2 when col1 is A.
import pandas as pd
d = {'col1': ['A', 'B', 'A', 'B'], 'col2': [1, 2, 1, 2]}
df = pd.DataFrame(data=d)
df
Use DataFrame.loc + GroupBy.transform
df.loc[df.col1.eq('A'), 'col2'] = df.groupby('col1')['col2'].transform('sum')
print(df)
Or
new_df = df.assign(col2=df.groupby('col1')['col2'].transform('sum').where(df.col1.ne('A'),
df.col1))
Output
col1 col2
0 A 2
1 B 2
2 A 2
3 B 2

Python - how to split list for creating new column? pandas

I have a data frame like this
col1 col2
[A, B] 1
[A, C] 2
I would like to separate col1 into two columns and the output, I would like it out in this form
col1_A col1_B col2
A B 1
A C 2
I have tried this df['col1'].str.rsplit(',',n=2, expand=True)
but it showed TypeError: list indices must be integers or slices, not str
join + pop
df = df.join(pd.DataFrame(df.pop('col1').values.tolist(),
columns=['col1_A', 'col1_B']))
print(df)
col2 col1_A col1_B
0 1 A B
1 2 A C
It's good practice to try and avoid pd.Series.apply, which often amounts a Python-level loop with an additional overhead.
You can use apply:
import pandas as pd
df = pd.DataFrame({
"col1": [['A', 'B'], ['A', 'C']],
"col2": [1, 2],
})
df['col1_A'] = df['col1'].apply(lambda x: x[0])
df['col1_B'] = df['col1'].apply(lambda x: x[1])
del df['col1']
df = df[df.columns[[1,2,0]]]
print(df)
col1_A col1_B col2
0 A B 1
1 A C 2
You can do this:
>> df_expanded = df['col1'].apply(pd.Series).rename(
columns = lambda x : 'col1_' + str(x))
>> df_expanded
col1_0 col1_1
0 A B
1 A C
Adding these columns to the original dataframe:
>> pd.concat([df_expanded, df], axis=1).drop('col1', axis=1)
col1_0 col1_1 col2
0 A B 1
1 A C 2
If columns need to be named as the first element in the rows:
df_expanded.columns = ['col1_' + value
for value in df_expanded.iloc[0,:].values.tolist()]
col1_A col1_B
0 A B
1 A C
Zip values and column name and use insert to get right position.
for ind,(k,v) in enumerate(zip(zip(*df.pop('col1').tolist()),['col1_A', 'col1_B'])):
df.insert(ind, v, k)
Full example
import pandas as pd
df = pd.DataFrame({
"col1": [['A', 'B'], ['A', 'C']],
"col2": [1, 2],
})
for ind,(k,v) in enumerate(zip(zip(*df.pop('col1').tolist()),['col1_A', 'col1_B'])):
df.insert(ind, v, k)
print(df)
Returns:
col1_A col1_B col2
0 A B 1
1 A C 2

How to modify cells in a pandas DataFrame?

I need to change individual elements in a DataFrame. I tried doing something like this, but it doesn't work:
for index, row in df.iterrows():
if df.at[row, index] == 'something':
df.at[row, index] = df.at[row, index] + 'add a string'
else:
df.at[row, index] = df.at[row, index] + 'add a value'
How can I do that?
If need modify all columns in DataFrame use numpy.where with DataFrame constructor, because where return numpy array:
df = pd.DataFrame(np.where(df == 'something', df + 'add a string', df + 'add a value'),
index=df.index,
columns=df.columns)
If only one column col:
df['col'] = np.where(df['col'] == 'something',
df['col'] + 'add a string',
df['col'] + 'add a value')
Sample:
df = pd.DataFrame({'col': ['a', 'b', 'a'], 'col1': ['a', 'b', 'b']})
print (df)
col col1
0 a a
1 b b
2 a b
df = pd.DataFrame(np.where(df == 'a', df + 'add a string', df + 'add a value'),
index=df.index,
columns=df.columns)
print (df)
col col1
0 aadd a string aadd a string
1 badd a value badd a value
2 aadd a string badd a value
df['col'] = np.where(df['col'] == 'a',
df['col'] + 'add a string',
df['col'] + 'add a value')
print (df)
col col1
0 aadd a string a
1 badd a value b
2 aadd a string b
You can use .ix and apply a function like this:
import pandas as pd
D = pd.DataFrame({'A': ['a', 'b', 3,7,'b','a'], 'B': ['a', 'b', 3,7,'b','a']})
D.ix[D.index%2 == 0,'A'] = D.ix[D.index%2 == 0,'A'].apply(lambda s: s+'x' if isinstance(s,str) else s+1)
D.ix[D.index[2:5],'B'] = D.ix[D.index[2:5],'B'].apply(lambda s: s+'y' if isinstance(s,str) else s-1)
First example appends x to each string or alternatively adds 1 to each non-string on column A for every even index.
The second example appends y to each string or alternatively subtracts 1 from each non-string on column B for the indices 2,3,4.
Original Frame:
A B
0 a a
1 b b
2 3 3
3 7 7
4 b b
5 a a
Modified Frame:
A B
0 ax a
1 b b
2 4 2
3 7 6
4 bx by
5 a a

Categories