This question already has answers here:
Pandas dataframe to json without index
(4 answers)
Closed 5 years ago.
I created a table like below using pandas pivot table.
print(pd_pivot_table)
category_id name
3 name3 0.329204
24 name24 0.323727
31 name31 0.319526
19 name19 0.008992
23 name23 0.005897
I want to create JSON based on this pivot_table, but I do not know how.
[
{
"category_id": 3,
"name": "name3",
"score": 0.329204
},
{
"category_id": 24,
"name": "name24",
"score": 0.323727
},
{
"category_id": 31,
"name": "name31",
"score": 0.319526
},
{
"category_id": 19,
"name": "name19",
"score": 0.008992
},
{
"category_id": 23,
"name": "name23",
"score": 0.005897
}
]
Or, I do not know how to get category_id and name values in the first place.
Even if you write the code below you can not get the results you want.
for data in pd_pivot_table:
print(data) # 0.329204
print(data["category_id"]) # *** IndexError: invalid index to scalar variable.
You can use Series.reset_index first for DataFrame and then DataFrame.to_json:
print (df)
category_id name
3 name3 0.329204
24 name24 0.323727
31 name31 0.319526
19 name19 0.008992
23 name23 0.005897
Name: score, dtype: float64
print (type(df))
<class 'pandas.core.series.Series'>
json = df.reset_index().to_json(orient='records')
print (json)
[{"category_id":3,"name":"name3","score":0.329204},
{"category_id":24,"name":"name24","score":0.323727},
{"category_id":31,"name":"name31","score":0.319526},
{"category_id":19,"name":"name19","score":0.008992},
{"category_id":23,"name":"name23","score":0.005897}]
If need output to file:
df.reset_index().to_json('file.json',orient='records')
Details:
print (df.reset_index())
category_id name score
0 3 name3 0.329204
1 24 name24 0.323727
2 31 name31 0.319526
3 19 name19 0.008992
4 23 name23 0.005897
print (type(df.reset_index()))
<class 'pandas.core.frame.DataFrame'>
Related
I am trying to read nested JSON into a Dask DataFrame, preferably with code that'll do the heavy lifting.
Here's the JSON file I am reading:
{
"data": [{
"name": "george",
"age": 16,
"exams": [{
"subject": "geometry",
"score": 56
},
{
"subject": "poetry",
"score": 88
}
]
}, {
"name": "nora",
"age": 7,
"exams": [{
"subject": "geometry",
"score": 87
},
{
"subject": "poetry",
"score": 94
}
]
}]
}
Here is the resulting DataFrame I would like.
name
age
exam_subject
exam_score
george
16
geometry
56
george
16
poetry
88
nora
7
geometry
87
nora
7
poetry
94
Here's how I'd accomplish this with pandas:
df = pd.read_json("students3.json", orient="split")
exploded = df.explode("exams")
pd.concat([exploded[["name", "age"]].reset_index(drop=True), pd.json_normalize(exploded["exams"])], axis=1)
Dask doesn't have json_normalize, so what's the best way to accomplish this task?
If the file contains json-lines, then the most scale-able approach is to use dask.bag and then map the pandas snippet across each bag partition.
If the file is a large json, then the opening/ending brackets will cause problems, so an additional function will be needed to remove them before mapping the text into json.
Rough pseudo-code:
import dask.bag as db
bag = db.read_text("students3.json")
# if there are json-lines
option1 = bag.map(json.loads).map(pandas_fn)
# if there is a single json
option2 = bag.map(convert_to_jsonlines).map(json.loads).map(pandas_fn)
Use pd.json_normalize
import json
import pandas as pd
with open('students3.json', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.json_normalize(data['data'], record_path='exams', meta=['name', 'age'])
subject score name age
0 geometry 56 george 16
1 poetry 88 george 16
2 geometry 87 nora 7
3 poetry 94 nora 7
Pydantic offers excellent JSON validation and ingest. Several Pydantic models (one of each 'top level' JSON entry) can be converted to Python dictionaries in a loop to create a list of dictionaries, type: List[Dict], which may be converted to DataFrame objects.
I was inspired by the other answers to come up with this solution.
ddf = dd.read_json("students3.json", orient="split")
def pandas_fn(df):
exploded = df.explode("exams")
return pd.concat(
[
exploded[["name", "age"]].reset_index(drop=True),
pd.json_normalize(exploded["exams"]),
],
axis=1,
)
res = ddf.map_partitions(
lambda df: pandas_fn(df),
meta=(
("name", "object"),
("age", "int64"),
("subject", "object"),
("score", "int64"),
),
)
print(res.compute()) gives this output:
name age subject score
0 george 16 geometry 56
1 george 16 poetry 88
2 nora 7 geometry 87
3 nora 7 poetry 94
Have a dataframe with values
df
name rank subject marks age
tom 123 math 25 10
mark 124 math 50 10
How to insert the dataframe data into mongodb using pymongo like first two columns as a regular insert and another 3 as array
{
"_id": "507f1f77bcf86cd799439011",
"name":"tom",
"rank":"123"
"scores": [{
"subject": "math",
"marks": 25,
"age": 10
}]
}
{
"_id": "507f1f77bcf86cd799439012",
"name":"mark",
"rank":"124"
"scores": [{
"subject": "math",
"marks": 50,
"age": 10
}]
}
tried this :
convert_dict = df.to_dict("records")
mydb.school_data.insert_many(convert_dict)
I use this solution
convert_dict = df.to_dict(orient="records")
mydb.school_data.insert_many(convert_dict)
I am working on large dataset where I want to replace value of 1 column based on the value of another column. I have been trying different combinations, but not satisfied, is there a simple way like one liner?
Sample code with error Solution:
import pandas as pd
people = pd.DataFrame(
{
"name": ["Ram", "Sham", "Ghanu", "Dhanu", "Jeetu"],
"age": [25, 30, 25, 31, 31],
"loc": ['Vashi', 'Nerul', 'Airoli', 'Panvel', 'CBD'],
},)
print(people)
areacode = pd.DataFrame(
{
"loc": ['Vashi', 'Nerul', 'CBD', 'Panvel'],
"pin": [400703, 400706, 421504, 410206],
},)
print()
print(areacode)
people = pd.merge(people, areacode, how='left', on='loc').drop(columns='loc').fillna('')
people.rename(columns={'pin':'loc'}, inplace=True)
print(people)
output of people Dataframe before change:
name age loc
0 Ram 25 Vashi
1 Sham 30 Nerul
2 Ghanu 25 Airoli
3 Dhanu 31 Panvel
4 Jeetu 31 CBD
output of areacode Dataframe:
loc pin
0 Vashi 400703
1 Nerul 400706
2 CBD 421504
3 Panvel 410206
output of people Dataframe after change:
name age loc
0 Ram 25 400703.0
1 Sham 30 400706.0
2 Ghanu 25
3 Dhanu 31 410206.0
4 Jeetu 31 421504.0
I don't like this approach as 1. Its long and 2. I am getting float in loc column, I need int. Please help me
people = pd.DataFrame(
{
"name": ["Ram", "Sham", "Ghanu", "Dhanu", "Jeetu"],
"age": [25, 30, 25, 31, 31],
"loc": ['Vashi', 'Nerul', 'Airoli', 'Panvel', 'CBD'],
},)
print(people)
areacode = pd.DataFrame(
{
"loc": ['Vashi', 'Nerul', 'CBD', 'Panvel'],
"pin": [400703, 400706, 421504, 410206],
},)
print()
print(areacode)
d = dict(zip(areacode["loc"], areacode["pin"]))
people["loc"] = people["loc"].apply(lambda x: int(d[x]) if x in d else "")
print(people)
I see no issue with your appraoch. Just cast loc as integer.
Alternative would be map, but I suspect it would be slower. You still will cast loc as integer anyway
people=people.assign(loc=people['loc'].map(dict(zip(areacode['loc'],areacode['pin']))).fillna('0').astype(int))
Hi how are things? I have a dataframe, which looks like a recursive table, my idea is to be able to transform it to a json (in a mamushka way). Im using python
my example:
Datafame:
id
name
relations
1
config
0
2
buttons
1
3
accept
2
4
delete
2
5
descripton
1
6
title
1
7
juan
0
and the json that i whant is
[{
"id":"1"
"name":"config",
"relations":
[{
"id":"2"
"name":"buttons"
"relations":[{
"id":"3"
"name":"accept"
},
{
"id":"4",
"name":"delete"
}],
},
{
"id":"5"
"name":"descripton",
"relations":[]
},
"id":"6"
"name":"title",
"relations":[]
}],
"id":"7",
"name":"juan",
"relations":[]
}]
As you will see, in the column "relation", you can see that it joins with its parents (id)
I would like to get length this json element in Robot Framework.
Json Example
[
[
{
"a": "2020-01",
"value": "1"
},
{
"a": "2020-02",
"value": "2"
},
{
"a": "2020-03",
"value": "10"
},
{
"a": "2020-04",
"value": "9"
},
{
"a": "2020-05",
"value": "0"
},
{
"a": "2020-06",
"value": "7"
}
]
]
The expected result is
a 2020-01
value 1
a 2020-02
value 2
a 2020-03
value 10
a 2020-04
value 9
a 2020-05
value 0
a 2020-06
value 7
length = 6
I try
${data_length}= get length ${json_data}
is not working
I think there are [ ] 2 levels. Please guide me, Thanks
You need to convert the JSON to a python data structure, and then you can use the Get Length keyword on the first element of the outer-most list.
Here's one way to do that. It assumes that the JSON data is not null, and that the raw JSON data is in a variable named ${json_data}
${data}= Evaluate json.loads($json_data)
${length}= Get length ${data[0]}
Should be equal as numbers ${length} 6