Get a Dictionary by applying function to pandas Series - python

I would like to apply a function to a dataframe and receive a single dictionary as a result. pandas.apply gives me a Series of dicts, and so currently I have to combine keys from each. I'll use an example to illustrate.
I have a pandas dataframe like so.
In [20]: df
Out[20]:
0 1
0 2.025745 a
1 -1.840914 b
2 -0.428811 c
3 0.718237 d
4 0.079593 e
I have some function that returns a dictionary. For this example I'm using a toy lambda function lambda x: {x: ord(x)} that returns a dictionary.
In [22]: what_i_get = df[1].apply(lambda x: {x: ord(x)})
In [23]: what_i_get
Out[23]:
0 {'a': 97}
1 {'b': 98}
2 {'c': 99}
3 {'d': 100}
4 {'e': 101}
Name: 1
apply() gives me a series of dictionaries, but what I want is a single dictionary.
I could create it with something like this:
In [41]: what_i_want = {}
In [42]: for elem in what_i_get:
....: for k,v in elem.iteritems():
....: what_i_want[k] = v
....:
In [43]: what_i_want
Out[43]: {'a': 97, 'b': 98, 'c': 99, 'd': 100, 'e': 101}
But it seems I should be able to get what I want more directly.

Instead of returning a dict from your function, just return the mapped value, then create one dict outside the mapping operation:
>>> d
Stuff
0 a
1 b
2 c
3 d
>>> dict(zip(d.Stuff, d.Stuff.map(ord)))
{'a': 97, 'b': 98, 'c': 99, 'd': 100}

Cutting out the items() middle-man:
what_i_want = {}
for elem in what_i_get:
what_i_want.update(elem)

Related

Create a dictionary from a list

I'm trying to create a dictionary in Python from this output:
["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
I tried with this code:
list_columns = list(df2.columns)
list_dictionary = []
for row in list_columns:
resultado = "'"+str(row)+"'" + "=" + "df2[" + "'" + row + "'" + "]"
list_dictionary.append(resultado)
clean_list_dictionary = ','.join(list_dictionary).replace('"','')
dictionary = dict(clean_list_dictionary)
print(dictionary)
But I get an error:
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Do you have any idea how I can make this work?
Thank you in advance!
Output dictionary should look like this:
{
'a' : df2['a'],
'b' : df2['b'],
'c' : df2['c'],
'd' : df2['d']
}
Method 1: Transforming your list of string for an eval later
As you have mentioned in your comment -
I would like to create a dictionary for with this format: ''' {'a' : df2['a'], 'b' : df2['b'], 'c' : df2['c'], 'd' : df2['d']} ''' I will use it as global variables in an eval() function.
You can use the following to convert your input string
#dummy dataframe
df2 = pd.DataFrame([[1,2,3,4]], columns=['a','b','c','d']) #Dummy dataframe
#your list of strings
l = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
#Solution
def dict_string(l):
s0 = [i.split('=') for i in l]
s1 = '{' + ', '.join([': '.join([k,v]) for k,v in s0]) + '}'
return s1
output = dict_string(l)
print(output)
eval(output)
#String before eval
{'a': df2['a'], 'b': df2['b'], 'c': df2['c'], 'd': df2['d']} #<----
#String after eval
{'a': 0 1
Name: a, dtype: int64,
'b': 0 2
Name: b, dtype: int64,
'c': 0 3
Name: c, dtype: int64,
'd': 0 4
Name: d, dtype: int64}
Method 2: Using eval as part of your iteration of the list of strings
Here is a way to do this using list comprehensions and eval, as part of the iteration on the list of strings itself. This will give you the final output that you would get if you were to use eval on the dictionary string you are expecting.
#dummy dataframe
df2 = pd.DataFrame([[1,2,3,4]], columns=['a','b','c','d']) #Dummy dataframe
#your list of strings
l = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
#Solution
def eval_dict(l):
s0 = [(eval(j) for j in i.split('=')) for i in l]
s1 = {k:v for k,v in s0}
return s1
output = eval_dict(l)
print(output)
{'a': 0 1
Name: a, dtype: int64,
'b': 0 2
Name: b, dtype: int64,
'c': 0 3
Name: c, dtype: int64,
'd': 0 4
Name: d, dtype: int64}
The output is a dict that has 4 keys, (a,b,c,d) and 4 corresponding values for columns a, b, c, d from df2 respectively.
You can loop over the list,split by charater and convert to dict.
Code:
dic= {}
[dic.update(dict( [l.split('=')])) for l in ls]
dic
I think this is exactly what you want.
data = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
dic = {}
for d in data:
k = d.split("=")[0]
v = df2[d.split("=")[1].split("\'")[1]]
dic.update({k: v})
print(dic)
Its not clear what exactly you want to achieve.
If You have a pd.DataFrame() and you want to convert it to a dictionary where column names are keys and column values are dict values you should use df.to_dict('series').
import pandas as pd
# Generate the dataframe
data = {'a': [1, 2, 1, 0], 'b': [2, 3, 4, 5], 'c': [10, 11, 12, 13], 'd': [21, 22, 23, 24]}
df = pd.DataFrame.from_dict(data)
# Convert to dictionary
result = df.to_dict('series')
print(result)
If you have a list of strings that you need to convert to desired output than you should do it differently. What you have are strings 'df' while df in your dict is a variable. So you only need to extract the column names and use the variable df not the string 'df'
import pandas as pd
# Generate the dataframe
data = {'a': [1, 2, 1, 0], 'b': [2, 3, 4, 5], 'c': [10, 11, 12, 13], 'd': [21, 22, 23, 24]}
df = pd.DataFrame.from_dict(data)
# create string list
lst = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
# Convert to dictionary
result = {}
for item in lst:
key = item[1]
result[key] = df[key]
print(result)
The results are the same but in second case list of strings is created for no reason because first example can achieve the same results without it..

DataFrame from list of string dicts

So I have a list where each entry looks something like this:
"{'A': 1, 'B': 2, 'C': 3}"
I am trying to get a dataframe that looks like this
A B C
0 1 2 3
1 4 5 6
2 7 8 9
But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me
0
0 {'A': 1, 'B': 2, 'C': 3}
...
I've tried using using json, concat'ing a list of dataframes, and so on, but to no avail.
eval is not safe. Check this comparison.
Instead use ast.literal_eval:
Assuming this to be your list:
In [572]: l = ["{'A': 1, 'B': 2, 'C': 3}", "{'A': 4, 'B': 5, 'C': 6}"]
In [584]: import ast
In [587]: df = pd.DataFrame([ast.literal_eval(i) for i in l])
In [588]: df
Out[588]:
A B C
0 1 2 3
1 4 5 6
I would agree with the SomeDude that eval will work like this
pd.DataFrame([eval(s) for s in l])
BUT, if any user entered data is going into these strings, you should never use eval. Instead, you can convert the single quotes to double quotes and use the following syntax from the json package. This is much safer.
json.loads(u'{"A": 1, "B": 2, "C": 3}')
Use eval before reading it in dataframe:
pd.DataFrame([eval(s) for s in l])
Or better use ast.literal_eval as #Mayank Porwal's answer says.
Or use json.loads but after making sure its valid json
Take your list of strings, turn it into a list of dictionaries, then construct the data frame using pd.DataFrame.from_records.
>>> l = ["{'A': 1, 'B': 2, 'C': 3}"]
>>> pd.DataFrame.from_records(eval(s) for s in l)
A B C
0 1 2 3
Check that your input data doesn't include Python code, however, because eval is just to evaluate the input. Using it on something web-facing or something like that would be a severe security flaw.
You can try
lst = ["{'A': 1, 'B': 2, 'C': 3}", "{'A': 1, 'B': 2, 'C': 3}"]
df = pd.DataFrame(map(eval, lst))
# or
df = pd.DataFrame(lst)[0].apply(eval).apply(pd.Series)
# or
df = pd.DataFrame(lst)[0].apply(lambda x: pd.Series(eval(x)))
print(df)
A B C
0 1 2 3
1 1 2 3

DataFrame from list of string dicts with array() values

So I have a list where each entry looks something like this:
"{'A': array([1]), 'B': array([2]), 'C': array([3])}"
I am trying to get a dataframe that looks like this
A B C
0 1 2 3
1 4 5 6
2 7 8 9
But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me
0
0 {'A': array([1]), 'B': array([2]), 'C': array([3])}
...
I originally asked a question with an oversimplified my example dict as {'A': 1, 'B': 2, 'C': 3} so methods such as ast.literal_eval, and eval should typically work, but in the case of the arrays as values, I am running into a NameError NameError: name 'array' is not defined.
Assuming those really are arrays of length 1, this hackery should do the job:
data = [
"{'A': array([1]), 'B': array([2]), 'C': array([3])}",
"{'A': array([4]), 'B': array([5]), 'C': array([6])}",
"{'A': array([7]), 'B': array([8]), 'C': array([9])}"
]
import ast
import pandas as pd
data = [ast.literal_eval(d.replace('array([','').replace('])','')) for d in data]
a = pd.DataFrame(data)
print(a)
Output:
A B C
0 1 2 3
1 4 5 6
2 7 8 9

What to pass to aggfunc in pandas pivot table for summing up counters

I have a data table with two columns "A" and "B", and the elements in column "B" are counters. For example,
c = Counter(a=4, b=2)
df = pd.DataFrame({"A": ["group1", "group1", "group1", "group2", "group2"],
"B": [c, c, c, c, c]})
I would like to create a pivot table, where I group over element values in column "A" and aggregate over column "B" by adding up the counters. I wonder what should I pass to aggfunc?
This is what I have tried, but sadly it does not work:
pt = pd.pivot_table(df, index = ['A'], values = ['B'], aggfunc = ['+'])
Any suggestions?
My expected output is
table
group1 Counter(a=12, b=6) # i.e., c+c+c
group2 Counter(a=8, b=4) # i.e., c+c
It's just sum.
>>> df.groupby('A')['B'].sum()
A
group1 {'a': 12, 'b': 6}
group2 {'a': 8, 'b': 4}
Name: B, dtype: object
Two notes:
Putting dictionaries into dataframe columns is usually not a good practice. I would use two columns to hold the value for 'a' and 'b' respectively.
"B": [c, c, c, c, c] initializes each element of column 'B' with the same counter object.
Demo:
>>> df.loc[0, 'B']['a'] = 100
>>> df
Out[9]:
A B
0 group1 {'a': 100, 'b': 2}
1 group1 {'a': 100, 'b': 2}
2 group1 {'a': 100, 'b': 2}
3 group2 {'a': 100, 'b': 2}
4 group2 {'a': 100, 'b': 2}
You might want "B": [c.copy() for _ in range(5)] - if you want to keep your original design at all, that is.

Prevent Pandas from unpacking a tuple when creating a dataframe from dict

When creating a DataFrame in Pandas from a dictionary, a tuple is automatically expanded, i.e.
import pandas
d = {'a': 1, 'b': 2, 'c': (3,4)}
df = pandas.DataFrame.from_dict(d)
print(df)
returns
a b c
0 1 2 3
1 1 2 4
Apart from converting the tuple to string first, is there any way to prevent this from happening? I would want the result to be
a b c
0 1 2 (3, 4)
Try add [], so value in dictionary with key c is list of tuple:
import pandas
d = {'a': 1, 'b': 2, 'c': [(3,4)]}
df = pandas.DataFrame.from_dict(d)
print(df)
a b c
0 1 2 (3, 4)
Pass param orient='index' and transpose the result so it doesn't broadcast the scalar values:
In [13]:
d = {'a': 1, 'b': 2, 'c': (3,4)}
df = pd.DataFrame.from_dict(d, orient='index').T
df
Out[13]:
a c b
0 1 (3, 4) 2
To handle the situation where the first dict entry is a tuple, you'd need to enclose all the dict values into a list so it's iterable:
In [20]:
d = {'a': (5,6), 'b': 2, 'c': 1}
d1 = dict(zip(d.keys(), [[x] for x in d.values()]))
pd.DataFrame.from_dict(d1, orient='index').T
Out[23]:
a b c
0 (5, 6) 2 1

Categories