Pandas / Python: Groupby.apply() with function dictionary - python

I'm trying to implement something like this:
def RR(x):
x['A'] = x['A'] +1
return x
def Locked(x):
x['A'] = x['A'] + 2
return x
func_mapper = {"RR": RR, "Locked": Locked}
df = pd.DataFrame({'A':[1,1], 'LookupVal':['RR','Locked'],'ID':[1,2]})
df= df.groupby("ID").apply(lambda x: func_mapper[x.LookupVal.first()](x))
Output for column A would be 2, 6
where x.LookupVal is a column of strings (it will have the same value within each groupby("ID")) that I want to pass as the key to the dictionary lookup.
Any suggestions how to implement this??
Thanks!

The first is not what you think it is. It is for timeseries data and it requires an offset parameter. I think you are mistaken with groupby first
You can use iloc[0] to get the first value:
slice_b.groupby("ID").apply(lambda x: func_mapper[x.LookupVal.iloc[0]](x))

Related

How to check if any of elements in a dictionary value is in string?

I have a dataframe with strings and a dictionary which values are lists of strings.
I need to check if each string of the dataframe contains any element of every value in the dictionary. And if it does, I need to label it with the appropriate key from the dictionary. All I need to do is to categorize all the strings in the dataframe with keys from the dictionary.
For example.
df = pd.DataFrame({'a':['x1','x2','x3','x4']})
d = {'one':['1','aa'],'two':['2','bb']}
I would like to get something like this:
df = pd.DataFrame({
'a':['x1','x2','x3','x4'],
'Category':['one','two','x3','x4']})
I tried this, but it has not worked:
df['Category'] = np.nan
for k, v in d.items():
for l in v:
df['Category'] = [k if l in str(x).lower() else x for x in df['a']]
Any ideas appreciated!
Firstly create a function that do this for you:-
def func(val):
for x in range(0,len(d.values())):
if val in list(d.values())[x]:
return list(d.keys())[x]
Now make use of split() and apply() method:-
df['Category']=df['a'].str.split('',expand=True)[2].apply(func)
Finally use fillna() method:-
df['Category']=df['Category'].fillna(df['a'])
Now if you print df you will get your expected output:-
a Category
0 x1 one
1 x2 two
2 x3 x3
3 x4 x4
Edit:
You can also do this by:-
def func(val):
for x in range(0,len(d.values())):
if any(l in val for l in list(d.values())[x]):
return list(d.keys())[x]
then:-
df['Category']=df['a'].apply(func)
Finally:-
df['Category']=df['Category'].fillna(df['a'])
I've come up with the following heuristic, which looks really dirty.
It outputs what you desire, albeit with some warnings, since I've used indices to append values to dataframe.
import pandas as pd
import numpy as np
def main():
df = pd.DataFrame({'a': ['x1', 'x2', 'x3', 'x4']})
d = {'one': ['1', 'aa'], 'two': ['2', 'bb']}
found = False
i = 0
df['Category'] = np.nan
for x in df['a']:
for k,v in d.items():
for item in v:
if item in x:
df['Category'][i] = k
found = True
break
else:
df['Category'][i] = x
if found:
found = False
break
i += 1
print(df)
main()

missing in function applied to pandas dataframe column

I'm trying to apply a function to my 'age' and 'area' columns in order to get the results that I show in the column 'wanted'.
Unfortunately this funtion gives me errors. I know that there are other methods in Pandas, like iloc, but I would like to understand this particular situation.
raw_data = {'age': [-1, np.nan, 10, 300, 20],'area': ['N','S','W',np.nan,np.nan],
'wanted': ['A',np.nan,'A',np.nan,np.nan]}
df = pd.DataFrame(raw_data, columns = ['age','area','wanted'])
df
def my_funct(df) :
if df["age"].isnull() :
return np.nan
elif df["area"].notnull():
return 'A'
else:
return np.nan
df["target"] = df.apply(lambda df:my_funct(df) ,axis = 1)
In your example, the problem is when you pass a row to your function, by referencing df['age'], it gives you a float, which doesn't have a method called isnull(). To check if a float is null, you can use the pd.isna function. Similar case for notna().
def my_funct(df) :
if pd.isna(df["age"]) :
return np.nan
elif pd.notna(df["area"]):
return 'A'
else:
return np.nan
df["target"] = df.apply(lambda x: my_funct(x) ,axis = 1)

Creating a Pandas column from a nested dictionary

I have a nested dictionary called datastore containing keys m, n, o and finally 'target_a', 'target_b', or 'target c' (these contain the values). Additionally, I have a pandas dataframe df, which contains a number of columns. Three of these columns, 'r', 's', and 't', contain values that can be used as keys to find the values in the dictionary.
With the code below, I have attempted to do this using a lambda function, however, it requires calling the function three times, which seems pretty inefficient! Is there better way of doing this? Any help would be much appreciated.
def find_targets(m, n, o):
if m == 0:
return [1.5, 1.5, 1.5]
else:
a = datastore[m][n][o]['target_a']
b = datastore[m][n][o]['target_b']
c = datastore[m][n][o]['target_c']
return [a, b, c]
df['a'] = df.apply(lambda x: find_targets(x['r'], x['s'], x['t'])[0],axis=1)
df['b'] = df.apply(lambda x: find_targets(x['r'], x['s'], x['t'])[1],axis=1)
df['c'] = df.apply(lambda x: find_targets(x['r'], x['s'], x['t'])[2],axis=1)
You can have your apply return a pd.Series, and then do the assignment in one pass using df.merge
Here's an example, modifying your function to return a pd.Series, but you can find other solutions aswell, keeping your finding function as you defined it and transforming it to series in the lambda expression.
def find_targets(m, n, o):
if m == 0:
return pd.Series({'a':1.5, 'b':1.5, 'c':1.5})
else:
a = d[m][n][o]['target_a']
b = d[m][n][o]['target_b']
c = d[m][n][o]['target_c']
return pd.Series({'a':a, 'b':b, 'c':c})
df.merge(df.apply(lambda x: find_targets(x['r'], x['s'], x['t']), axis=1), left_index=True, right_index=True)
If you make your find targets return a dictionary and in your lambda convert it to a pandas.Series, apply will create the rows for you and return a dataframe with the columns you want.
def find_targets(m, n, o):
if m == 0:
return {'a': 1.5, 'b': 1.5, 'c': 1.5}
else:
targets = {}
targets['a'] = datastore[m][n][o]['target_a']
targets['b'] = datastore[m][n][o]['target_b']
targets['c'] = datastore[m][n][o]['target_c']
return targets
abc_df = df.apply(lambda x: pd.Series(find_targets(x['r'], x['s'], x['t'])), axis=1)
df = pd.concat((df, abc_df), axis=1)
If you can't change the find_targets function you could still zip it with the keys you need:
abc_dict = dict(zip('abc', old_find_targets(...)))

reduceByKey in spark for adding tuples

Consider an Rdd with below dataset
where 10000241 is the key and remaining are values
('10000241',([0,0,1],[None,None,'RX']))
('10000241',([0,2,0],[None,'RX','RX']))
('10000241',([3,0,0],['RX',None,None]))
pv1 = rdd.reduceBykey(lambda x,y :(
addtup(x[0],y[0]),
addtup(x[1],y[1]),
))
def addtup(t1,t2):
j =()
for k,v in enumerate(t1):
j = j + (t1[k] + t2[k],)
return j
The final output i want is (10000241,(3,2,1)('RX','RX','RX))
but i get the error of cant add none type to none type or nonetype to Str .how can i overcome this issue?
If I understood you correctly, you want to summarize numbers in the first tuple and to use logic or in the second?
I think you should rewrite your function as following:
def addtup(t1,t2):
left = list(map(lambda x: sum(x), zip(t1[0], t2[0])))
right = list(map(lambda x: x[0] or x[1], zip(t1[1], t2[1])))
return (left, right)
Then you can use it like this:
rdd.reduceBykey(addtup)
Here is a demonstration
import functools
data = (([0,0,1],[None,None,'RX']),
([0,2,0],[None,'RX','RX']),
([3,0,0],['RX',None,None]))
functools.reduce(addtup, data)
#=> ([3, 2, 1], ['RX', 'RX', 'RX'])

how can I convert a char to variable?

In detail:
I need this formula to work.
string = str(z)+":1.0 "+str(z+1)+":0.0" where z is a variable with a value.
will I be able to input this formula into a dictionary value with a specific key. Like
dicto={'A': 'str(z)+":1.0 "+str(z+1)+":0.0"'}
so that when i see a key value 'A' I should be able to use that formula in the dictionary
As I read your question, you wanted something like this:
dicto = {'A': lambda x: "{0!s}:1.0 {1!s}:0.0".format(x, x + 1)}
dicto['A'](2) # '2:1.0 3:0.0'
Use lambda function:
d = { 'A': lambda z: str(z)+":1.0 "+str(z+1)+":0.0" }
d['A'](5)
# returns: '5:1.0 6:0.0'

Categories