DataFrame from list of string dicts - python

So I have a list where each entry looks something like this:
"{'A': 1, 'B': 2, 'C': 3}"
I am trying to get a dataframe that looks like this
A B C
0 1 2 3
1 4 5 6
2 7 8 9
But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me
0
0 {'A': 1, 'B': 2, 'C': 3}
...
I've tried using using json, concat'ing a list of dataframes, and so on, but to no avail.

eval is not safe. Check this comparison.
Instead use ast.literal_eval:
Assuming this to be your list:
In [572]: l = ["{'A': 1, 'B': 2, 'C': 3}", "{'A': 4, 'B': 5, 'C': 6}"]
In [584]: import ast
In [587]: df = pd.DataFrame([ast.literal_eval(i) for i in l])
In [588]: df
Out[588]:
A B C
0 1 2 3
1 4 5 6

I would agree with the SomeDude that eval will work like this
pd.DataFrame([eval(s) for s in l])
BUT, if any user entered data is going into these strings, you should never use eval. Instead, you can convert the single quotes to double quotes and use the following syntax from the json package. This is much safer.
json.loads(u'{"A": 1, "B": 2, "C": 3}')

Use eval before reading it in dataframe:
pd.DataFrame([eval(s) for s in l])
Or better use ast.literal_eval as #Mayank Porwal's answer says.
Or use json.loads but after making sure its valid json

Take your list of strings, turn it into a list of dictionaries, then construct the data frame using pd.DataFrame.from_records.
>>> l = ["{'A': 1, 'B': 2, 'C': 3}"]
>>> pd.DataFrame.from_records(eval(s) for s in l)
A B C
0 1 2 3
Check that your input data doesn't include Python code, however, because eval is just to evaluate the input. Using it on something web-facing or something like that would be a severe security flaw.

You can try
lst = ["{'A': 1, 'B': 2, 'C': 3}", "{'A': 1, 'B': 2, 'C': 3}"]
df = pd.DataFrame(map(eval, lst))
# or
df = pd.DataFrame(lst)[0].apply(eval).apply(pd.Series)
# or
df = pd.DataFrame(lst)[0].apply(lambda x: pd.Series(eval(x)))
print(df)
A B C
0 1 2 3
1 1 2 3

Related

DataFrame from list of string dicts with array() values

So I have a list where each entry looks something like this:
"{'A': array([1]), 'B': array([2]), 'C': array([3])}"
I am trying to get a dataframe that looks like this
A B C
0 1 2 3
1 4 5 6
2 7 8 9
But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me
0
0 {'A': array([1]), 'B': array([2]), 'C': array([3])}
...
I originally asked a question with an oversimplified my example dict as {'A': 1, 'B': 2, 'C': 3} so methods such as ast.literal_eval, and eval should typically work, but in the case of the arrays as values, I am running into a NameError NameError: name 'array' is not defined.
Assuming those really are arrays of length 1, this hackery should do the job:
data = [
"{'A': array([1]), 'B': array([2]), 'C': array([3])}",
"{'A': array([4]), 'B': array([5]), 'C': array([6])}",
"{'A': array([7]), 'B': array([8]), 'C': array([9])}"
]
import ast
import pandas as pd
data = [ast.literal_eval(d.replace('array([','').replace('])','')) for d in data]
a = pd.DataFrame(data)
print(a)
Output:
A B C
0 1 2 3
1 4 5 6
2 7 8 9

Extracting item from a list stored as string in pandas dataframe

I have the following pandas dataframe
data = [{'a': 1, 'b': '[2,3,4,5,6' }, {'a': 10, 'b': '[54,3,40,5'}]
test = pd.DataFrame(data)
display(test)
a b
0 1 [2,3,4,5,6
1 10 [54,3,40,5
I want to list the number in column b, but as the list has the [ only at the beginning, doesnt allow me to create the list, I'm trying to remove the "[" so I can extract the numbers, but I keep getting errors, what I'm doing wrong?
This is how the numbers are stored
test.iloc[1,1]
'[54,3,40,5'
And this is what I've tried to remove the "[".
test.iloc[0,1].replace("[",'', regex=True).to_list()
test.iloc[0,1].str.replace("[\]\[]", "")
What i want to achieve is to have b as a proper list so i can apply other functions.
a b
0 1 [2,3,4,5,6]
1 10 [54,3,40,5]
To make your 'b' column a list you can first delete the open squared bracket at the beginning, and then use the split method on each element of your 'b' column
test['b'] = test['b'].str.replace('[', '').map(lambda x: x.split(','))
test
# a b
# 0 1 [2, 3, 4, 5, 6]
# 1 10 [54, 3, 40, 5]
try it:
def func(col):
return eval(col+']')
test['b'] = test['b'].apply(func)
import pandas as pd
data = [{'a': 1, 'b': '[2,3,4,5,6' }, {'a': 10, 'b': '[54,3,40,5'}]
test = pd.DataFrame(data)
print(test['b'][0][1:])
for i in range(len(test['b'])):
test['b'][i] = test['b'][i][1:]

Put array into DataFrame as single element

Guys,
I have an dict like this :
dic = {}
dic['A'] = 1
dic['B'] = np.array([1,2,3])
dic['C'] = np.array([1,2,3,4])
dic['D'] = np.array([6,7])
Then I tried to put them into a DataFrame (also may insert more lines later, but the array length for each element may be variable), for some reasons, I want to keep them as a entire object for each columns, when print, it looks like:
A B C D
1 [1,2,3] [1,2,3,4] [6,7]
......
[2,3] [7,8] [5,6,7,2] 4
When I am trying to do this by :
pd.DataFrame.from_dict(dic)
I always get the error : ValueError: arrays must all be same length
Do I have anyway to keep the entire array as single element, however, some times I do have some single value as well ?
I am not sure why you required input as the dictionary. but if you pass elements as numpy array it converts missing values with NaN.
pd.DataFrame([np.array([1,2,3]),np.array([1,2,3,4]),np.array([6,7])],columns=['A','B','C','D'])
Output:-
A B C D
0 1 2 3.0 NaN
1 1 2 3.0 4.0
2 6 7 NaN NaN
IIUC this should work
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[1, np.array([2,3])],
"B":[np.array([1,2,3]), np.array([7,8])],
"C":[np.array([1,2,3,4]), np.array([5,6,7,2])],
"D":[np.array([6,7]), 4]})
So df.to_dict() returns
{'A': {0: 1, 1: array([2, 3])},
'B': {0: array([1, 2, 3]), 1: array([7, 8])},
'C': {0: array([1, 2, 3, 4]), 1: array([5, 6, 7, 2])},
'D': {0: array([6, 7]), 1: 4}}
UPDATE
If you want to save to file you should consider to use lists instead of numpy arrays and use delimiter=';'
convert arrays to strings by if you want to maintain this shape.

Prevent Pandas from unpacking a tuple when creating a dataframe from dict

When creating a DataFrame in Pandas from a dictionary, a tuple is automatically expanded, i.e.
import pandas
d = {'a': 1, 'b': 2, 'c': (3,4)}
df = pandas.DataFrame.from_dict(d)
print(df)
returns
a b c
0 1 2 3
1 1 2 4
Apart from converting the tuple to string first, is there any way to prevent this from happening? I would want the result to be
a b c
0 1 2 (3, 4)
Try add [], so value in dictionary with key c is list of tuple:
import pandas
d = {'a': 1, 'b': 2, 'c': [(3,4)]}
df = pandas.DataFrame.from_dict(d)
print(df)
a b c
0 1 2 (3, 4)
Pass param orient='index' and transpose the result so it doesn't broadcast the scalar values:
In [13]:
d = {'a': 1, 'b': 2, 'c': (3,4)}
df = pd.DataFrame.from_dict(d, orient='index').T
df
Out[13]:
a c b
0 1 (3, 4) 2
To handle the situation where the first dict entry is a tuple, you'd need to enclose all the dict values into a list so it's iterable:
In [20]:
d = {'a': (5,6), 'b': 2, 'c': 1}
d1 = dict(zip(d.keys(), [[x] for x in d.values()]))
pd.DataFrame.from_dict(d1, orient='index').T
Out[23]:
a b c
0 (5, 6) 2 1

Get a Dictionary by applying function to pandas Series

I would like to apply a function to a dataframe and receive a single dictionary as a result. pandas.apply gives me a Series of dicts, and so currently I have to combine keys from each. I'll use an example to illustrate.
I have a pandas dataframe like so.
In [20]: df
Out[20]:
0 1
0 2.025745 a
1 -1.840914 b
2 -0.428811 c
3 0.718237 d
4 0.079593 e
I have some function that returns a dictionary. For this example I'm using a toy lambda function lambda x: {x: ord(x)} that returns a dictionary.
In [22]: what_i_get = df[1].apply(lambda x: {x: ord(x)})
In [23]: what_i_get
Out[23]:
0 {'a': 97}
1 {'b': 98}
2 {'c': 99}
3 {'d': 100}
4 {'e': 101}
Name: 1
apply() gives me a series of dictionaries, but what I want is a single dictionary.
I could create it with something like this:
In [41]: what_i_want = {}
In [42]: for elem in what_i_get:
....: for k,v in elem.iteritems():
....: what_i_want[k] = v
....:
In [43]: what_i_want
Out[43]: {'a': 97, 'b': 98, 'c': 99, 'd': 100, 'e': 101}
But it seems I should be able to get what I want more directly.
Instead of returning a dict from your function, just return the mapped value, then create one dict outside the mapping operation:
>>> d
Stuff
0 a
1 b
2 c
3 d
>>> dict(zip(d.Stuff, d.Stuff.map(ord)))
{'a': 97, 'b': 98, 'c': 99, 'd': 100}
Cutting out the items() middle-man:
what_i_want = {}
for elem in what_i_get:
what_i_want.update(elem)

Categories