DataFrame from list of string dicts with array() values

DataFrame from list of string dicts with array() values - python

So I have a list where each entry looks something like this:
"{'A': array([1]), 'B': array([2]), 'C': array([3])}"
I am trying to get a dataframe that looks like this
A B C
0 1 2 3
1 4 5 6
2 7 8 9
But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me
0
0 {'A': array([1]), 'B': array([2]), 'C': array([3])}
...
I originally asked a question with an oversimplified my example dict as {'A': 1, 'B': 2, 'C': 3} so methods such as ast.literal_eval, and eval should typically work, but in the case of the arrays as values, I am running into a NameError NameError: name 'array' is not defined.

Assuming those really are arrays of length 1, this hackery should do the job:
data = [
"{'A': array([1]), 'B': array([2]), 'C': array([3])}",
"{'A': array([4]), 'B': array([5]), 'C': array([6])}",
"{'A': array([7]), 'B': array([8]), 'C': array([9])}"
]
import ast
import pandas as pd
data = [ast.literal_eval(d.replace('array([','').replace('])','')) for d in data]
a = pd.DataFrame(data)
print(a)
Output:
A B C
0 1 2 3
1 4 5 6
2 7 8 9

Related

Is there a way to store a dictionary on each row of a dataframe column using a vectorized operation?

I am attempting to next a dictionary inside of a dataframe.
here's an example of what I have:
x y z
1 2 3
4 5 6
7 8 9
here's an example of what I want:
x y z
1 2 {'z':3}
4 5 {'z':6}
7 8 {'z':9}
For this specific application, the whole point of using pandas is the vectorized operations that are scalable and efficient. Is it possible to transform that column into a column of dictionaries? I have attempted to use string concatenation, but then it is stored in pandas as a string and not a dict, and returns later with quotations around the dictionary because it is a string.

Example
data = {'x': {0: 1, 1: 4, 2: 7}, 'y': {0: 2, 1: 5, 2: 8}, 'z': {0: 3, 1: 6, 2: 9}}
df = pd.DataFrame(data)
Code
df['z'] = pd.Series(df[['z']].T.to_dict())
df
x y z
0 1 2 {'z': 3}
1 4 5 {'z': 6}
2 7 8 {'z': 9}

DataFrame from list of string dicts

So I have a list where each entry looks something like this:
"{'A': 1, 'B': 2, 'C': 3}"
I am trying to get a dataframe that looks like this
A B C
0 1 2 3
1 4 5 6
2 7 8 9
But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me
0
0 {'A': 1, 'B': 2, 'C': 3}
...
I've tried using using json, concat'ing a list of dataframes, and so on, but to no avail.

eval is not safe. Check this comparison.
Instead use ast.literal_eval:
Assuming this to be your list:
In [572]: l = ["{'A': 1, 'B': 2, 'C': 3}", "{'A': 4, 'B': 5, 'C': 6}"]
In [584]: import ast
In [587]: df = pd.DataFrame([ast.literal_eval(i) for i in l])
In [588]: df
Out[588]:
A B C
0 1 2 3
1 4 5 6

I would agree with the SomeDude that eval will work like this
pd.DataFrame([eval(s) for s in l])
BUT, if any user entered data is going into these strings, you should never use eval. Instead, you can convert the single quotes to double quotes and use the following syntax from the json package. This is much safer.
json.loads(u'{"A": 1, "B": 2, "C": 3}')

Use eval before reading it in dataframe:
pd.DataFrame([eval(s) for s in l])
Or better use ast.literal_eval as #Mayank Porwal's answer says.
Or use json.loads but after making sure its valid json

Take your list of strings, turn it into a list of dictionaries, then construct the data frame using pd.DataFrame.from_records.
>>> l = ["{'A': 1, 'B': 2, 'C': 3}"]
>>> pd.DataFrame.from_records(eval(s) for s in l)
A B C
0 1 2 3
Check that your input data doesn't include Python code, however, because eval is just to evaluate the input. Using it on something web-facing or something like that would be a severe security flaw.

You can try
lst = ["{'A': 1, 'B': 2, 'C': 3}", "{'A': 1, 'B': 2, 'C': 3}"]
df = pd.DataFrame(map(eval, lst))
# or
df = pd.DataFrame(lst)[0].apply(eval).apply(pd.Series)
# or
df = pd.DataFrame(lst)[0].apply(lambda x: pd.Series(eval(x)))
print(df)
A B C
0 1 2 3
1 1 2 3

Put array into DataFrame as single element

Guys,
I have an dict like this :
dic = {}
dic['A'] = 1
dic['B'] = np.array([1,2,3])
dic['C'] = np.array([1,2,3,4])
dic['D'] = np.array([6,7])
Then I tried to put them into a DataFrame (also may insert more lines later, but the array length for each element may be variable), for some reasons, I want to keep them as a entire object for each columns, when print, it looks like:
A B C D
1 [1,2,3] [1,2,3,4] [6,7]
......
[2,3] [7,8] [5,6,7,2] 4
When I am trying to do this by :
pd.DataFrame.from_dict(dic)
I always get the error : ValueError: arrays must all be same length
Do I have anyway to keep the entire array as single element, however, some times I do have some single value as well ?

I am not sure why you required input as the dictionary. but if you pass elements as numpy array it converts missing values with NaN.
pd.DataFrame([np.array([1,2,3]),np.array([1,2,3,4]),np.array([6,7])],columns=['A','B','C','D'])
Output:-
A B C D
0 1 2 3.0 NaN
1 1 2 3.0 4.0
2 6 7 NaN NaN

IIUC this should work
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[1, np.array([2,3])],
"B":[np.array([1,2,3]), np.array([7,8])],
"C":[np.array([1,2,3,4]), np.array([5,6,7,2])],
"D":[np.array([6,7]), 4]})
So df.to_dict() returns
{'A': {0: 1, 1: array([2, 3])},
'B': {0: array([1, 2, 3]), 1: array([7, 8])},
'C': {0: array([1, 2, 3, 4]), 1: array([5, 6, 7, 2])},
'D': {0: array([6, 7]), 1: 4}}
UPDATE
If you want to save to file you should consider to use lists instead of numpy arrays and use delimiter=';'

convert arrays to strings by if you want to maintain this shape.

Prevent Pandas from unpacking a tuple when creating a dataframe from dict

When creating a DataFrame in Pandas from a dictionary, a tuple is automatically expanded, i.e.
import pandas
d = {'a': 1, 'b': 2, 'c': (3,4)}
df = pandas.DataFrame.from_dict(d)
print(df)
returns
a b c
0 1 2 3
1 1 2 4
Apart from converting the tuple to string first, is there any way to prevent this from happening? I would want the result to be
a b c
0 1 2 (3, 4)

Try add [], so value in dictionary with key c is list of tuple:
import pandas
d = {'a': 1, 'b': 2, 'c': [(3,4)]}
df = pandas.DataFrame.from_dict(d)
print(df)
a b c
0 1 2 (3, 4)

Pass param orient='index' and transpose the result so it doesn't broadcast the scalar values:
In [13]:
d = {'a': 1, 'b': 2, 'c': (3,4)}
df = pd.DataFrame.from_dict(d, orient='index').T
df
Out[13]:
a c b
0 1 (3, 4) 2
To handle the situation where the first dict entry is a tuple, you'd need to enclose all the dict values into a list so it's iterable:
In [20]:
d = {'a': (5,6), 'b': 2, 'c': 1}
d1 = dict(zip(d.keys(), [[x] for x in d.values()]))
pd.DataFrame.from_dict(d1, orient='index').T
Out[23]:
a b c
0 (5, 6) 2 1

Get a Dictionary by applying function to pandas Series

I would like to apply a function to a dataframe and receive a single dictionary as a result. pandas.apply gives me a Series of dicts, and so currently I have to combine keys from each. I'll use an example to illustrate.
I have a pandas dataframe like so.
In [20]: df
Out[20]:
0 1
0 2.025745 a
1 -1.840914 b
2 -0.428811 c
3 0.718237 d
4 0.079593 e
I have some function that returns a dictionary. For this example I'm using a toy lambda function lambda x: {x: ord(x)} that returns a dictionary.
In [22]: what_i_get = df[1].apply(lambda x: {x: ord(x)})
In [23]: what_i_get
Out[23]:
0 {'a': 97}
1 {'b': 98}
2 {'c': 99}
3 {'d': 100}
4 {'e': 101}
Name: 1
apply() gives me a series of dictionaries, but what I want is a single dictionary.
I could create it with something like this:
In [41]: what_i_want = {}
In [42]: for elem in what_i_get:
....: for k,v in elem.iteritems():
....: what_i_want[k] = v
....:
In [43]: what_i_want
Out[43]: {'a': 97, 'b': 98, 'c': 99, 'd': 100, 'e': 101}
But it seems I should be able to get what I want more directly.

Instead of returning a dict from your function, just return the mapped value, then create one dict outside the mapping operation:
>>> d
Stuff
0 a
1 b
2 c
3 d
>>> dict(zip(d.Stuff, d.Stuff.map(ord)))
{'a': 97, 'b': 98, 'c': 99, 'd': 100}

Cutting out the items() middle-man:
what_i_want = {}
for elem in what_i_get:
what_i_want.update(elem)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

DataFrame from list of string dicts with array() values - python

Related

Is there a way to store a dictionary on each row of a dataframe column using a vectorized operation?

DataFrame from list of string dicts

Put array into DataFrame as single element

Prevent Pandas from unpacking a tuple when creating a dataframe from dict

Get a Dictionary by applying function to pandas Series

Categories

Resources