Given the Dataframe
df = pd.DataFrame({'word1': ['elvis', 'lease', 'admirer'], 'word2': ['lives', 'sale', 'married']})
how can I add a third column that returns True or False depending on whether the two words in the same row are an anagram or not?
I have written this function, which returns an error when I apply it to the df.
def anagram(word1, word2):
word1_lst = [l for l in word1]
word2_lst = [i for i in word2]
return sorted(word1_lst) == sorted(word2_lst)
df['Anagram'] = df.apply(anagram(df['word1'], df['word2']), axis = 1)
TypeError: 'bool' object is not callable
df = pd.DataFrame({'word1': ['elvis', 'lease', 'admirer'], 'word2': ['lives', 'sale', 'married']})
df['Anagram'] = df.word1.apply(sorted) == df.word2.apply(sorted)
The issue here is that you are calling df.apply() with the args
anagram(df['word1'], df['word2') which is a bool, not a function
and
axis = 1
To fix, alter your function like this:
def anagram(row):
word1_lst = [l for l in row['word1']]
word2_lst = [i for i in row['word2']]
return sorted(word1_lst) == sorted(word2_lst)
then call the method with the function name, not the result
df['Anagram'] = df.apply(anagram, axis=1)
Related
I have two dataframe like the following but with more rows:
data = {'First': [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': [['adj','noun'],['adj','noun'],['adj','noun','verb'],['adj','noun','verb']]}
df = pd.DataFrame (data, columns = ['First','Second'])
data2 = {'example': ['First value is important', 'second value is imprtant too','it us goof to know']}
df2 = pd.DataFrame (data2, columns = ['example'])
I wrote a function that checks if the first word in the example column can be found in the First column in the first dataframe, and if true return the string, like the following:
def reader():
for l in [l for l in df2.example]:
if df["first"].str.contains(pat=l.split(' ', 1)[0]).any() is True:
return l
However, i realized that it would not work because the First column in df is a list of strings, so I made the following modification:
def reader():
for l in [l for l in df2.example]:
df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
if df["first_unlist"].str.contains(pat=l.split(' ', 1)[0]).any() is True:
return l
however, i still get 'None' when i run the function, and I cannot figure out what is wrong here.
Update:
I would like the function to return the first two strings in the example column, 'First value is important', 'second value is imprtant too'
Your function doesn's return False when the first word in the example column can not be found. Here is the revision.
def reader():
for l in [l for l in df2.example]:
df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
if df["first_unlist"].str.contains(pat=l.split(' ', 1)[0]).any() is True:
return l
return list(df2.example[:2])
reader()
I have a dataframe with strings and a dictionary which values are lists of strings.
I need to check if each string of the dataframe contains any element of every value in the dictionary. And if it does, I need to label it with the appropriate key from the dictionary. All I need to do is to categorize all the strings in the dataframe with keys from the dictionary.
For example.
df = pd.DataFrame({'a':['x1','x2','x3','x4']})
d = {'one':['1','aa'],'two':['2','bb']}
I would like to get something like this:
df = pd.DataFrame({
'a':['x1','x2','x3','x4'],
'Category':['one','two','x3','x4']})
I tried this, but it has not worked:
df['Category'] = np.nan
for k, v in d.items():
for l in v:
df['Category'] = [k if l in str(x).lower() else x for x in df['a']]
Any ideas appreciated!
Firstly create a function that do this for you:-
def func(val):
for x in range(0,len(d.values())):
if val in list(d.values())[x]:
return list(d.keys())[x]
Now make use of split() and apply() method:-
df['Category']=df['a'].str.split('',expand=True)[2].apply(func)
Finally use fillna() method:-
df['Category']=df['Category'].fillna(df['a'])
Now if you print df you will get your expected output:-
a Category
0 x1 one
1 x2 two
2 x3 x3
3 x4 x4
Edit:
You can also do this by:-
def func(val):
for x in range(0,len(d.values())):
if any(l in val for l in list(d.values())[x]):
return list(d.keys())[x]
then:-
df['Category']=df['a'].apply(func)
Finally:-
df['Category']=df['Category'].fillna(df['a'])
I've come up with the following heuristic, which looks really dirty.
It outputs what you desire, albeit with some warnings, since I've used indices to append values to dataframe.
import pandas as pd
import numpy as np
def main():
df = pd.DataFrame({'a': ['x1', 'x2', 'x3', 'x4']})
d = {'one': ['1', 'aa'], 'two': ['2', 'bb']}
found = False
i = 0
df['Category'] = np.nan
for x in df['a']:
for k,v in d.items():
for item in v:
if item in x:
df['Category'][i] = k
found = True
break
else:
df['Category'][i] = x
if found:
found = False
break
i += 1
print(df)
main()
I'm trying to apply a function to my 'age' and 'area' columns in order to get the results that I show in the column 'wanted'.
Unfortunately this funtion gives me errors. I know that there are other methods in Pandas, like iloc, but I would like to understand this particular situation.
raw_data = {'age': [-1, np.nan, 10, 300, 20],'area': ['N','S','W',np.nan,np.nan],
'wanted': ['A',np.nan,'A',np.nan,np.nan]}
df = pd.DataFrame(raw_data, columns = ['age','area','wanted'])
df
def my_funct(df) :
if df["age"].isnull() :
return np.nan
elif df["area"].notnull():
return 'A'
else:
return np.nan
df["target"] = df.apply(lambda df:my_funct(df) ,axis = 1)
In your example, the problem is when you pass a row to your function, by referencing df['age'], it gives you a float, which doesn't have a method called isnull(). To check if a float is null, you can use the pd.isna function. Similar case for notna().
def my_funct(df) :
if pd.isna(df["age"]) :
return np.nan
elif pd.notna(df["area"]):
return 'A'
else:
return np.nan
df["target"] = df.apply(lambda x: my_funct(x) ,axis = 1)
I am looking for pandas apply function, from which can return the tuple (index_name, column_name,value). The value is the entry in that row,col specified.
Something like the following function,
def pair(val):
return zip(index.name,column.name,val)
If you want a list with tuples for every value of your dataframe df, you can try something like this:
my_list = []
for index, row in df.iterrows():
for col in df.columns:
my_list.append((index, col, row[col]))
If you want a list with all tuples matching a single value, this function should work:
def findTuples(value, df):
my_list = []
for col in df.columns:
# if at least one such value in column
if df[df[col] == value].shape[0] != 0:
my_list = my_list + list(
df[df[col] == value].apply(
lambda x: (x.name, col, x[col]),
axis=1
)
)
return my_list
I have a nested dictionary called datastore containing keys m, n, o and finally 'target_a', 'target_b', or 'target c' (these contain the values). Additionally, I have a pandas dataframe df, which contains a number of columns. Three of these columns, 'r', 's', and 't', contain values that can be used as keys to find the values in the dictionary.
With the code below, I have attempted to do this using a lambda function, however, it requires calling the function three times, which seems pretty inefficient! Is there better way of doing this? Any help would be much appreciated.
def find_targets(m, n, o):
if m == 0:
return [1.5, 1.5, 1.5]
else:
a = datastore[m][n][o]['target_a']
b = datastore[m][n][o]['target_b']
c = datastore[m][n][o]['target_c']
return [a, b, c]
df['a'] = df.apply(lambda x: find_targets(x['r'], x['s'], x['t'])[0],axis=1)
df['b'] = df.apply(lambda x: find_targets(x['r'], x['s'], x['t'])[1],axis=1)
df['c'] = df.apply(lambda x: find_targets(x['r'], x['s'], x['t'])[2],axis=1)
You can have your apply return a pd.Series, and then do the assignment in one pass using df.merge
Here's an example, modifying your function to return a pd.Series, but you can find other solutions aswell, keeping your finding function as you defined it and transforming it to series in the lambda expression.
def find_targets(m, n, o):
if m == 0:
return pd.Series({'a':1.5, 'b':1.5, 'c':1.5})
else:
a = d[m][n][o]['target_a']
b = d[m][n][o]['target_b']
c = d[m][n][o]['target_c']
return pd.Series({'a':a, 'b':b, 'c':c})
df.merge(df.apply(lambda x: find_targets(x['r'], x['s'], x['t']), axis=1), left_index=True, right_index=True)
If you make your find targets return a dictionary and in your lambda convert it to a pandas.Series, apply will create the rows for you and return a dataframe with the columns you want.
def find_targets(m, n, o):
if m == 0:
return {'a': 1.5, 'b': 1.5, 'c': 1.5}
else:
targets = {}
targets['a'] = datastore[m][n][o]['target_a']
targets['b'] = datastore[m][n][o]['target_b']
targets['c'] = datastore[m][n][o]['target_c']
return targets
abc_df = df.apply(lambda x: pd.Series(find_targets(x['r'], x['s'], x['t'])), axis=1)
df = pd.concat((df, abc_df), axis=1)
If you can't change the find_targets function you could still zip it with the keys you need:
abc_dict = dict(zip('abc', old_find_targets(...)))