list of dictionary column in a dataframe - python

A column in my dataframe is list of dictionaries some thing like this:
How can I filter the rows that have specific value for the id key in tag column? for instance the rows that contains {"id" : 18}

Since your tag column is list-valued, you could use explode if you are on pandas 0.25+:
# toy data
df = pd.DataFrame({'type':['df','fg','ff'],
'tag': [[{"id" : 12} ,{"id" : 13}],
[{"id" : 12}],
[{'id':10}]]
})
# make each row contains exactly one dict: {id: val}
s = df['tag'].explode()
# the indexes of interested rows
idx = s.index[pd.DataFrame(s.to_list())['id'].values==12]
# output
df.loc[idx]
Output:
type tag
0 df [{'id': 12}, {'id': 13}]
1 fg [{'id': 12}]

Sample DatFrame
df=pd.DataFrame({'type':['dg','fg','ff'],'tag':[[{"id" : 12} ,{"id" : 13}] ,[{"id" : 12}],[{"id" : 29}]]})
print(df)
type tag
0 dg [{'id': 12}, {'id': 13}]
1 fg [{'id': 12}]
2 ff [{'id': 29}]
Then you can use Series.apply to check each cell:
df_filtered=df[df['tag'].apply(lambda x: pd.Series([dict['id'] for dict in x]).eq(12).any())]
print(df_filtered)
type tag
0 dg [{'id': 12}, {'id': 13}]
1 fg [{'id': 12}]

Related

Extract value from JSON data of a pandas column with condition apply

Need to extract value from a json string stored in pandas column and assign it to a column with a conditional apply to rows with null values only.
df = pd.DataFrame({'col1': [06010, np.nan, 06020, np.nan],
'json_col': [{'Id': '060',
'Date': '20210908',
'value': {'Id': '060',
'Code': '06037'}
},
{'Id': '061',
'Date': '20210908',
'value': {'Id': '060',
'Code': '06038'}
},
{'Id': '062',
'Date': '20210908',
'value': {'Id': '060',
'Code': '06039'}
},
{'Id': '063',
'Date': '20210908',
'value': {'Id': '060',
'Code': '06040'}
}],
})
# Check for null condition and extract Code from json string
df['Col1'] = df[df['Col1'].isnull()].apply(lambda x : [x['json_col'][i]['value']['Code'] for i in x])
Expected result:
Col1
06010
06038
06020
06040
To extract field from a dictionary column, you can use .str accessor. For instance, to extract json_col -> value -> code you can use df.json_col.str['value'].str['Code']. And then use fillna to replace nan in col1:
df.col1.fillna(df.json_col.str['value'].str['Code'])
0 06010
1 06038
2 06020
3 06040
Name: col1, dtype: object
Try with this:
>>> df['col1'].fillna(df['json_col'].map(lambda x: x['value']['Code']))
0 06010
1 06038
2 06020
3 06040
Name: col1, dtype: object
>>>

Group two columns: 1st column as dict with keys as 1st column values, and 2nd column as dict values

I have dataframe:
df = pd.DataFrame([[1,'length',1],
[1,'diameter',40],
[2,'length',5],
[2,'diameter',100]], columns=['no.', 'property','value'])
Or:
no.0 property value
1 'length' 1
1 'diameter' 40
2 'length' 5
2 'diameter' 100
And I'm trying to convert it to dataframe like this (first column must be an index):
no.0 property
1 {'length': 1, 'diameter', 40}
2 {'length': 1, 'diameter', 40}
Group by the no. column and create records inside a dict comprehension
{k: {'property': dict(g.values)} for k,g in df.set_index('no.').groupby(level=0)}
{1: {'property': {'length': 1, 'diameter': 40}},
2: {'property': {'length': 5, 'diameter': 100}}}
If you want the output in dataframe format
df.set_index('no.').groupby(level=0)\
.apply(lambda g: dict(g.values)).reset_index(name='property')
no. property
0 1 {'length': 1, 'diameter': 40}
1 2 {'length': 5, 'diameter': 100}
Can set_index to property and groupby.agg on no. into dict to get inner dict:
new_df = (
df.set_index('property')
.groupby('no.')
.agg(property=('value', lambda s: s.to_dict()))
)
new_df:
property
no.
1 {'length': 1, 'diameter': 40}
2 {'length': 5, 'diameter': 100}
Then to_dict on the DataFrame to get the final output
d = new_df.to_dict()
d:
{1: {'property': {'length': 1, 'diameter': 40}},
2: {'property': {'length': 5, 'diameter': 100}}}
You can pivot the dataframe, then drop the column level, create dictionary for each row, and finally call to_dict():
reshaped = df.pivot(['no.'],['property'],['value'])
reshaped.columns = reshaped.columns.droplevel()
reshaped.apply(lambda x: {'property': dict(x)}, axis=1).to_dict()
OUTPUT:
{1:
{'property':
{'diameter': 40,
'length': 1}},
2: {'property':
{'diameter': 100,
'length': 5}}
}

Extract dictionary value from a list with one element in a pandas column

I have a pandas dataframe with a column which is a list containing a single dictionary.
For example:
col1
[{'type': 'yellow', 'id': 2, ...}]
[{'type': 'brown', 'id': 13, ...}]
...
I need to extract the value associated with the 'type' keyword. There are different ways to do it, but since my dataframe is huge (several million rows) I need an efficient way to do this but I am not sure which method is the best.
Let us try this:
data = {
'col': [[{'type': 'yellow', 'id': 2}], [{'type': 'brown', 'id': 13}], np.nan]
}
df = pd.DataFrame(data)
print(df)
col
0 [{'type': 'yellow', 'id': 2}]
1 [{'type': 'brown', 'id': 13}]
2 NaN
Use explode and str accessor:
df['result'] = df.col.explode().str['type']
output:
col result
0 [{'type': 'yellow', 'id': 2}] yellow
1 [{'type': 'brown', 'id': 13}] brown
2 NaN NaN
Accessing any element in most data structures is O(1) operation. I'm sure pandas data frame is no different. The only issue you will face is: looping through the rows. There's probably no way around it.

find cells with specific value and replace its value

Using pandas I have created a csv file containing 2 columns and saved my data into these columns. something like this:
fist second
{'value': 2} {'name': 'f'}
{'value': 2} {'name': 'h'}
{"value": {"data": {"n": 2, "m":"f"}}} {'name': 'h'}
...
Is there any way to look for all the rows whose the first column contains "data" and if any, only keep its value in this cell? I mean is it possible to change my third row from:
{"value": {"data": {"n": 2, "m":"f"}}} {'name': 'h'}
to something like this:
{"data": {"n": 2, "m":"f"}} {'name': 'h'}
and delete or replace the value of all other cells that does not contain data to something like -?
So my csv file will look like this:
fist second
- {'name': 'f'}
- {'name': 'h'}
{"data": {"n": 2, "m":"f"}} {'name': 'h'}
...
Here is my code:
import json
import pandas as pd
result = []
for line in open('file.json', 'r'):
result.append(json.loads(line))
df = pd.DataFrame(result)
print(df)
df.to_csv('document.csv')
f = pd.read_csv("document.csv")
keep_col = ['first', 'second']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)
here is my short example:
df = pd.DataFrame({
'first' : [{'value': 2}, {'value': 2}, {"value": {"data": {"n": 2, "m":"f"}}}]
,'secound' : [{'name': 'f'}, {'name': 'h'},{'name': 'h'}]
})
a = pd.DataFrame(df["first"].tolist())
a[~a["value"].str.contains("data",regex=False).fillna(False)] = "-"
df["first"] = a.value
first step is to remove the 'value' field. after this the value field
if the new field contains the word "data" the field is set to true; all other fields are False, Numberic Fields have the value NaN this is Replaced with False. And the whole gets negated and replaced with "-"
last step is to overwrite the column in the original Data Frame.
Something like this might work.
first=[{'value': 2} , {'value': 2} , {"value": {"data": {"n": 2, "m":"f"}}}, {"data": {"n": 2, "m":"f"}}]
second=[{'name': 'f'}, {'name': 'h'}, {'name': 'h'}, {'name': 'h'}]
df = pd.DataFrame({'first': first,
'second': second})
f = lambda x: x.get('value', x) if isinstance(x, dict) else np.nan
df['first'] = df['first'].apply(f)
df['first'][~df["first"].str.contains("data",regex=False).fillna(False)] = "-"
print(df)
first second
0 - {'name': 'f'}
1 - {'name': 'h'}
2 {'data': {'n': 2, 'm': 'f'}} {'name': 'h'}
3 {'data': {'n': 2, 'm': 'f'}} {'name': 'h'}

Pandas groupby function with dict.update()

I'm trying to use the Pandas groupby function with a dict.update() function to each element. An example in a data frame (just for illustration):
A B
0 icon1 {'ap1': {'item' : 1}, 'ap2': {'item' : 2}}
1 icon1 {'ap3': {'item' : 3}}
What I'm trying to do is set something like
df = df.groupby('A')['B'].apply(', '.join).reset_index()
But instead of using python', '.join, I need to groupby the 'A' column and update each element in the 'B' column. I've tried using the map function, but I was not able to achieve anything useful.
The outcome should be:
A B
0 icon1 {'ap1': {'item' : 1}, 'ap2': {'item' : 2}, 'ap3': {'item' : 3}}
Is that even possible without changing the item type from dict?
Using dict comprehension
df.groupby('A').B.agg(lambda s: {k:v for a in s for k, v in a.items()}).reset_index()
A B
0 icon1 {'ap1': {'item' : 1}, 'ap2': {'item' : 2}, 'ap3': {'item' : 3}}
toolz.dicttoolz.merge
from toolz.dicttoolz import merge
df.groupby('A')['B'].agg(merge).reset_index()
A B
0 icon1 {'ap1': {'item': 1}, 'ap2': {'item': 2}, 'ap3'...
1 icon2 {'ap1': {'item': 1}, 'ap2': {'item': 2}, 'ap3'...
Setup
df = pd.DataFrame(dict(
A=['icon1', 'icon1', 'icon2', 'icon2'],
B=[{'ap1': {'item': 1}, 'ap2': {'item': 2}}, {'ap3': {'item': 3}}] * 2
))
You can use a helper function:
def func(x):
dct = {}
for i in x:
dct.update(i)
return dct
df.groupby('A')['B'].agg(func).reset_index()

Categories