How to save data from dataframe to json file? - python

How can I convert to Json file form Dataframe in python using pandas.
I don't know how to get name and carmodel column, I was only get the price from dataframe
I have an Dataframe like:
name carmodel
ACURA CL 6.806155e+08
OTHER 2.280000e+08
EL 1.300000e+08
MDX 7.828750e+08
RDX 3.850000e+08
...
VOLVO XC90 3.748778e+09
ZOTYE OTHER 1.887500e+08
HUNTER 1.390000e+08
T600 4.200000e+08
Z8 4.754000e+08
so I want to convert it like:
{
'ACURA':
{
'CL':6.806155e+08,
'OTHER':2.280000e+08,
'MDX':7.828750e+08,
},
'VOLVO':
{
'XC90':3.748778e+09,
},
'ZOTYE':
{
'OTHER':1.887500e+08,
'HUNTER':1.390000e+08,
'T600':4.200000ee+08,
}
}
Here is my code:
df = pd.read_csv(r'D:\vucar\scraper\result.csv',dtype='unicode')
df['price'] = pd.to_numeric(df['price'],downcast='float')
cars = df[['name','carmodel','price']].sort_values('name').groupby(['name','carmodel']).mean()['price']
print(cars)

You could reset the index on cars, then groupby name again, while merging carmodel and name to a dictionary, then save as json:
cars = df[['name','carmodel','price']].sort_values('name').groupby(['name','carmodel']).mean().reset_index(level='carmodel')
cars.groupby('name').apply(lambda x: x.set_index('carmodel')['price'].to_dict()).to_json('filename.json', orient='index')
output:
{"ACURA":{"CL":680615500.0,"EL":130000000.0,"MDX":782875000.0,"OTHER":228000000.0,"RDX":385000000.0},"VOLVO":{"XC90":3748778000.0},"ZOTYE":{"HUNTER":139000000.0,"OTHER":188750000.0,"T600":420000000.0,"Z8":475400000.0}}

Related

Normalizing json using pandas with inconsistent nested lists/dictionaries

I've been using pandas' json_normalize for a bit but ran into a problem with specific json file, similar to the one seen here: https://github.com/pandas-dev/pandas/issues/37783#issuecomment-1148052109
I'm trying to find a way to retrieve the data within the Ats -> Ats dict and return any null values (like the one seen in the ID:101 entry) as NaN values in the dataframe. Ignoring errors within the json_normalize call doesn't prevent the TypeError that stems from trying to iterate through a null value.
Any advice or methods to receive a valid dataframe out of data with this structure is greatly appreciated!
import json
import pandas as pd
data = """[
{
"ID": "100",
"Ats": {
"Ats": [
{
"Name": "At1",
"Desc": "Lazy At"
}
]
}
},
{
"ID": "101",
"Ats": null
}
]"""
data = json.loads(data)
df = pd.json_normalize(data, ["Ats", "Ats"], "ID", errors='ignore')
df.head()
TypeError: 'NoneType' object is not iterable
I tried to iterate through the Ats dictionary, which would work normally for the data with ID 100 but not with ID 101. I expected ignoring errors within the function to return a NaN value in a dataframe but instead received a TypeError for trying to iterate through a null value.
The desired output would look like this: Dataframe
This approach can be more efficient when it comes to dealing with large datasets.
data = json.loads(data)
desired_data = list(
map(lambda x: pd.json_normalize(x, ["Ats", "Ats"], "ID").to_dict(orient="records")[0]
if x["Ats"] is not None
else {"ID": x["ID"], "Name": np.nan, "Desc": np.nan}, data))
df = pd.DataFrame(desired_data)
Output:
Name Desc ID
0 At1 Lazy At 100
1 NaN NaN 101
You might want to consider using this simple try and except approach when working with small datasets. In this case, whenever an error is found it should append new row to DataFrame with NAN.
Example:
data = json.loads(data)
df = pd.DataFrame()
for item in data:
try:
df = df.append(pd.json_normalize(item, ["Ats", "Ats"], "ID"))
except TypeError:
df = df.append({"ID" : item["ID"], "Name": np.nan, "Desc": np.nan}, ignore_index=True)
print(df)
Output:
Name Desc ID
0 At1 Lazy At 100
1 NaN NaN 101
Maybe you can create a DataFrame from the data normally (without pd.json_normalize) and then transform it to requested form afterwards:
import json
import pandas as pd
data = """\
[
{
"ID": "100",
"Ats": {
"Ats": [
{
"Name": "At1",
"Desc": "Lazy At"
}
]
}
},
{
"ID": "101",
"Ats": null
}
]"""
data = json.loads(data)
df = pd.DataFrame(data)
df["Ats"] = df["Ats"].str["Ats"]
df = df.explode("Ats")
df = pd.concat([df, df.pop("Ats").apply(pd.Series, dtype=object)], axis=1)
print(df)
Prints:
ID Name Desc
0 100 At1 Lazy At
1 101 NaN NaN

Pandas DataFrame - remove / replace dict values based on key

Say I have a DataFrame defined as:
df = {
"customer_name":"john",
"phone":{
"mobile":000,
"office":111
},
"mail":{
"office":"john#office.com",
"personal":"john#home.com",
"fax":"12345"
}
}
I want to somehow alter the value in column "mail" to remove the key "fax". Eg, the output DataFrame would be something like:
output_df = {
"customer_name":"john",
"phone":{
"mobile":000,
"office":111
},
"mail":{
"office":"john#office.com",
"personal":"john#home.com"
}
}
where the "fax" key-value pair has been deleted. I tried to use pandas.map with a dict in the lambda, but it does not work. One bad workaround I had was to normalize the dict, but this created unnecessary output columns, and I could not merge them back. Eg.;
df = pd.json_normalize(df)
Is there a better way for this?
You can use pop to remove a element from dict having the given key.
import pandas as pd
df['mail'].pop('fax')
df = pd.json_normalize(df)
df
Output:
customer_name phone.mobile phone.office mail.office mail.personal
0 john 0 111 john#office.com john#home.com
Is there a reason you just don't access it directly and delete it?
Like this:
del df['mail']['fax']
print(df)
{'customer_name': 'john',
'phone': {'mobile': 0, 'office': 111},
'mail': {'office': 'john#office.com', 'personal': 'john#home.com'}}
This is the simplest technique to achieve your aim.
import pandas as pd
import numpy as np
df = {
"customer_name":"john",
"phone":{
"mobile":000,
"office":111
},
"mail":{
"office":"john#office.com",
"personal":"john#home.com",
"fax":"12345"
}
}
del df['mail']['fax']
df = pd.json_normalize(df)
df
Output :
customer_name phone.mobile phone.office mail.office mail.personal
0 john 0 111 john#office.com john#home.com

Extract objects from nested json with Pandas

I have a nested json (like the one reported below) of translated labels, and I want to extract the leaves in separate json files, based on the languages key (it, en, etc).
I don't know at "compile time" the depth and the schema of the json, because there are a lot of files similiar to the big nested one, but I know that I always have the following structure: key path/to/en/label and value content.
I tried using Pandas with the json_normalize function to flatten my json, and works great, but afterwards I had trouble rebuilding the json schema, e.g. with the following json I get a 1x12 DataFrame, but I want a resulting DataFrame with shape 4x3, where 4 are the different labels (index) and 3 are the different languages (columns).
def fix_df(df: pd.DataFrame):
assert df.shape[0] == 1
columns = df.columns
columns_last_piece = [s.split("/")[-1] for s in columns]
fixed_columns = [s.split(".")[1] for s in columns_last_piece]
index = [".".join(elem.split(".")[2:]) for elem in columns_last_piece]
return pd.DataFrame(df.values, index=index, columns=fixed_columns)
def main():
path = pathlib.Path(os.getenv("FIXTURE_FLATTEN_PATH"))
assert path.exists()
json_dict = json.load(open(path, encoding="utf-8"))
flattened_json = pd.json_normalize(json_dict)
flattened_json_fixed = fix_df(flattened_json)
# do something with flattened_json_fixed
Example of my_labels.json:
{
"dynamicData": {
"bff_scoring": {
"subCollection": {
"dynamicData/bff_scoring/translations": {
"it": {
"area_title.people": "PERSONE",
"area_title.planet": "PIANETA",
"area_title.prosperity": "PROSPERITÀ",
"area_title.principle-gov": "PRINCIPI DI GOVERNANCE"
},
"en": {
"area_title.people": "PEOPLE",
"area_title.planet": "PLANET",
"area_title.prosperity": "PROSPERITY",
"area_title.principle-gov": "PRINCIPLE OF GOVERNANCE"
},
"fr":{
"area_title.people": "PERSONNES",
"area_title.planet": "PLANÈTE",
"area_title.prosperity": "PROSPERITÉ",
"area_title.principle-gov": "PRINCIPES DE GOUVERNANCE"
}
}
}
}
}
}
Example of my_labels_it.json:
{
"area_title.people": "PERSONE",
"area_title.planet": "PIANETA",
"area_title.prosperity": "PROSPERITÀ",
"area_title.principle-gov": "PRINCIPI DI GOVERNANCE"
}
I finally managed to solve this problem.
First, I need to use the melt function.
>>> df = flattened_json.melt()
>>> df
variable value
0 dynamicData.bff_scoring.subCollection.dynamicD... PERSONE
1 dynamicData.bff_scoring.subCollection.dynamicD... PIANETA
2 dynamicData.bff_scoring.subCollection.dynamicD... PROSPERITÀ
3 dynamicData.bff_scoring.subCollection.dynamicD... PRINCIPI DI GOVERNANCE
...
From here, I can extract the fields I'm interested with a regular expression. I tried using .str.extractall and explode, but I was greeted with an exception, so I relied to use .str.extract two times.
>>> df2 = df.assign(language=df.variable.str.extract(r".*\.([a-z]{2})\.[\w\.-]+$"), label=df.variable.str.extract(r"(?<=\.[a-z]{2}\.)([\w\.-]+)$")).drop(columns="variable")
>>> df2
value language label
0 PERSONE it area_title.people
1 PIANETA it area_title.planet
2 PROSPERITÀ it area_title.prosperity
3 PRINCIPI DI GOVERNANCE it area_title.principle-gov
...
And then, with a pivot, I can have the dataframe with the desired schema.
>>> df3 = df2.pivot(index="label", columns="language", values="value")
>>> df3
language en ... it
label ...
area_title.people PEOPLE ... PERSONE
area_title.planet PLANET ... PIANETA
area_title.principle-gov PRINCIPLE OF GOVERNANCE ... PRINCIPI DI GOVERNANCE
area_title.prosperity PROSPERITY ... PROSPERITÀ
From this dataframe is very simple to obtain the expected json.
>>> df3["it"].to_json(force_ascii=False)
'{"area_title.people":"PERSONE","area_title.planet":"PIANETA","area_title.principle-gov":"PRINCIPI DI GOVERNANCE","area_title.prosperity":"PROSPERITÀ"}'

Create json files from dataframe

I have a dataframe with 10 rows like this:
id
name
team
1
Employee1
Team1
2
Employee2
Team2
...
How can I generate 10 json files from the dataframe with python?
Here is the format of each json file:
{
"Company": "Company",
"id": "1",
"name": "Employee1",
"team": "Team1"
}
The field "Company": "Company" is the same in all json files.
Name of each json file is the name of each employee (i.e Employee1.json)
I do not really like iterrows but as you need a file per row, I cannot imagine how to vectorize the operation:
for _, row in df.iterrows():
row['Company'] = 'Company'
row.to_json(row['name'] + '.json')
You could use apply in the following way:
df.apply(lambda x: x.to_json(), axis=1)
And inside the to_json pass the employee name, it’s available to you in x
Another approach is to iterate over the rows like:
for i in df.index:
df.loc[i].to_json("Employee{}.json".format(i))

Replace DataFrame column with nested dictionary value

I'm trying to replace the 'starters' column of this DataFrame
starters
roster_id
Bob 3086
Bob 1234
Cam 6130
... ...
with the player names from a large nested dict like this. The values in my 'starters' column are the keys.
{
"3086": {
"team": "NE",
"player_id":"3086",
"full_name": "tombrady",
},
"1234": {
"team": "SEA",
"player_id":"1234",
"full_name": "RussellWilson",
},
"6130": {
"team": "BUF",
"player_id":"6130",
"full_name": "DevinSingletary",
},
...
}
I tried using DataFrame.replace(dict) and Dataframe.map(dict) but that gives me back all the player info instead of just the name.
is there a way to do this with a nested dict? thanks.
let df be the dataframe and d be the dictionary, then you can use apply from pandas on axis 1 to change the column
df.apply(lambda x: d[str(x.starters)]['full_name'], axis=1)
I am not sure, if I understand your question correctly. Have you tried using dict['full_name'] instead of simply dict?
Try pd.concat with series.map:
>>> pd.concat([
df,
pd.DataFrame.from_records(
df.astype(str)
.starters
.map(dct)
.values
).set_index(df.index)
], axis=1)
starters team player_id full_name
roster_id
Bob 3086 NE 3086 tombrady
Bob 1234 SEA 1234 RussellWilson
Cam 6130 BUF 6130 DevinSingletary

Categories