Hi how are things? I have a dataframe, which looks like a recursive table, my idea is to be able to transform it to a json (in a mamushka way). Im using python
my example:
Datafame:
id
name
relations
1
config
0
2
buttons
1
3
accept
2
4
delete
2
5
descripton
1
6
title
1
7
juan
0
and the json that i whant is
[{
"id":"1"
"name":"config",
"relations":
[{
"id":"2"
"name":"buttons"
"relations":[{
"id":"3"
"name":"accept"
},
{
"id":"4",
"name":"delete"
}],
},
{
"id":"5"
"name":"descripton",
"relations":[]
},
"id":"6"
"name":"title",
"relations":[]
}],
"id":"7",
"name":"juan",
"relations":[]
}]
As you will see, in the column "relation", you can see that it joins with its parents (id)
Related
I am trying to read nested JSON into a Dask DataFrame, preferably with code that'll do the heavy lifting.
Here's the JSON file I am reading:
{
"data": [{
"name": "george",
"age": 16,
"exams": [{
"subject": "geometry",
"score": 56
},
{
"subject": "poetry",
"score": 88
}
]
}, {
"name": "nora",
"age": 7,
"exams": [{
"subject": "geometry",
"score": 87
},
{
"subject": "poetry",
"score": 94
}
]
}]
}
Here is the resulting DataFrame I would like.
name
age
exam_subject
exam_score
george
16
geometry
56
george
16
poetry
88
nora
7
geometry
87
nora
7
poetry
94
Here's how I'd accomplish this with pandas:
df = pd.read_json("students3.json", orient="split")
exploded = df.explode("exams")
pd.concat([exploded[["name", "age"]].reset_index(drop=True), pd.json_normalize(exploded["exams"])], axis=1)
Dask doesn't have json_normalize, so what's the best way to accomplish this task?
If the file contains json-lines, then the most scale-able approach is to use dask.bag and then map the pandas snippet across each bag partition.
If the file is a large json, then the opening/ending brackets will cause problems, so an additional function will be needed to remove them before mapping the text into json.
Rough pseudo-code:
import dask.bag as db
bag = db.read_text("students3.json")
# if there are json-lines
option1 = bag.map(json.loads).map(pandas_fn)
# if there is a single json
option2 = bag.map(convert_to_jsonlines).map(json.loads).map(pandas_fn)
Use pd.json_normalize
import json
import pandas as pd
with open('students3.json', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.json_normalize(data['data'], record_path='exams', meta=['name', 'age'])
subject score name age
0 geometry 56 george 16
1 poetry 88 george 16
2 geometry 87 nora 7
3 poetry 94 nora 7
Pydantic offers excellent JSON validation and ingest. Several Pydantic models (one of each 'top level' JSON entry) can be converted to Python dictionaries in a loop to create a list of dictionaries, type: List[Dict], which may be converted to DataFrame objects.
I was inspired by the other answers to come up with this solution.
ddf = dd.read_json("students3.json", orient="split")
def pandas_fn(df):
exploded = df.explode("exams")
return pd.concat(
[
exploded[["name", "age"]].reset_index(drop=True),
pd.json_normalize(exploded["exams"]),
],
axis=1,
)
res = ddf.map_partitions(
lambda df: pandas_fn(df),
meta=(
("name", "object"),
("age", "int64"),
("subject", "object"),
("score", "int64"),
),
)
print(res.compute()) gives this output:
name age subject score
0 george 16 geometry 56
1 george 16 poetry 88
2 nora 7 geometry 87
3 nora 7 poetry 94
Have a dataframe with values
df
name rank subject marks age
tom 123 math 25 10
mark 124 math 50 10
How to insert the dataframe data into mongodb using pymongo like first two columns as a regular insert and another 3 as array
{
"_id": "507f1f77bcf86cd799439011",
"name":"tom",
"rank":"123"
"scores": [{
"subject": "math",
"marks": 25,
"age": 10
}]
}
{
"_id": "507f1f77bcf86cd799439012",
"name":"mark",
"rank":"124"
"scores": [{
"subject": "math",
"marks": 50,
"age": 10
}]
}
tried this :
convert_dict = df.to_dict("records")
mydb.school_data.insert_many(convert_dict)
I use this solution
convert_dict = df.to_dict(orient="records")
mydb.school_data.insert_many(convert_dict)
My MongoDB document structure is as follows and some of the factors are NaN.
_id :ObjectId("5feddb959297bb2625db1450")
factors: Array
0:Object
factorId:"C24"
Index:0
weight:1
1:Object
factorId:"C25"
Index:1
weight:1
2:Object
factorId:"C26"
Index:2
weight:1
name:"Growth Led Momentum"
I want to convert it to pandas data frame as follows using pymongo and pandas.
|name | factorId | Index | weight|
----------------------------------------------------
|Growth Led Momentum | C24 | 0 | 0 |
----------------------------------------------------
|Growth Led Momentum | C25 | 1 | 0 |
----------------------------------------------------
|Growth Led Momentum | C26 | 2 | 0 |
----------------------------------------------------
Thank you
Update
I broke out the ol Python to give this a crack - the following code works flawlessly!
from pymongo import MongoClient
import pandas as pd
uri = "mongodb://<your_mongo_uri>:27017"
database_name = "<your_database_name"
collection_name = "<your_collection_name>"
mongo_client = MongoClient(uri)
database = mongo_client[database_name]
collection = database[collection_name]
# I used this code to insert a doc into a test collection
# before querying (just incase you wanted to know lol)
"""
data = {
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
"factorId": "C24",
"index": 0,
"weight": 1
},
{
"factorId": "D74",
"index": 7,
"weight": 9
}
]
}
insert_result = collection.insert_one(data)
print(insert_result)
"""
# This is the query that
# answers your question
results = collection.aggregate([
{
"$unwind": "$factors"
},
{
"$project": {
"_id": 1, # Change to 0 if you wish to ignore "_id" field.
"name": 1,
"factorId": "$factors.factorId",
"index": "$factors.index",
"weight": "$factors.weight"
}
}
])
# This is how we turn the results into a DataFrame.
# We can simply pass `list(results)` into `DataFrame(..)`,
# due to how our query works.
results_as_dataframe = pd.DataFrame(list(results))
print(results_as_dataframe)
Which outputs:
_id name factorId index weight
0 1 Growth Lead Momentum C24 0 1
1 1 Growth Lead Momentum D74 7 9
Original Answer
You could use the aggregation pipeline to unwind factors and then project the fields you want.
Something like this should do the trick.
Live demo here.
Database Structure
[
{
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
factorId: "C24",
index: 0,
weight: 1
},
{
factorId: "D74",
index: 7,
weight: 9
}
]
}
]
Query
db.collection.aggregate([
{
$unwind: "$factors"
},
{
$project: {
_id: 1,
name: 1,
factorId: "$factors.factorId",
index: "$factors.index",
weight: "$factors.weight"
}
}
])
Results
(.csv friendly)
[
{
"_id": 1,
"factorId": "C24",
"index": 0,
"name": "Growth Lead Momentum",
"weight": 1
},
{
"_id": 1,
"factorId": "D74",
"index": 7,
"name": "Growth Lead Momentum",
"weight": 9
}
]
Wonderful answer by Matt, In case you want to use pandas:
Use this after you have retrieved documents from db:
df = pd.json_normalize(data)
df = df['factors'].explode().apply(lambda x: [val for _, val in x.items()]).explode().apply(pd.Series).join(df).drop(columns=['factors'])
Output:
factorId Index weight name
0 C24 0 1 Growth Led Momentum
0 C25 1 1 Growth Led Momentum
0 C26 2 1 Growth Led Momentum
I would like to get length this json element in Robot Framework.
Json Example
[
[
{
"a": "2020-01",
"value": "1"
},
{
"a": "2020-02",
"value": "2"
},
{
"a": "2020-03",
"value": "10"
},
{
"a": "2020-04",
"value": "9"
},
{
"a": "2020-05",
"value": "0"
},
{
"a": "2020-06",
"value": "7"
}
]
]
The expected result is
a 2020-01
value 1
a 2020-02
value 2
a 2020-03
value 10
a 2020-04
value 9
a 2020-05
value 0
a 2020-06
value 7
length = 6
I try
${data_length}= get length ${json_data}
is not working
I think there are [ ] 2 levels. Please guide me, Thanks
You need to convert the JSON to a python data structure, and then you can use the Get Length keyword on the first element of the outer-most list.
Here's one way to do that. It assumes that the JSON data is not null, and that the raw JSON data is in a variable named ${json_data}
${data}= Evaluate json.loads($json_data)
${length}= Get length ${data[0]}
Should be equal as numbers ${length} 6
This question already has answers here:
Pandas dataframe to json without index
(4 answers)
Closed 5 years ago.
I created a table like below using pandas pivot table.
print(pd_pivot_table)
category_id name
3 name3 0.329204
24 name24 0.323727
31 name31 0.319526
19 name19 0.008992
23 name23 0.005897
I want to create JSON based on this pivot_table, but I do not know how.
[
{
"category_id": 3,
"name": "name3",
"score": 0.329204
},
{
"category_id": 24,
"name": "name24",
"score": 0.323727
},
{
"category_id": 31,
"name": "name31",
"score": 0.319526
},
{
"category_id": 19,
"name": "name19",
"score": 0.008992
},
{
"category_id": 23,
"name": "name23",
"score": 0.005897
}
]
Or, I do not know how to get category_id and name values in the first place.
Even if you write the code below you can not get the results you want.
for data in pd_pivot_table:
print(data) # 0.329204
print(data["category_id"]) # *** IndexError: invalid index to scalar variable.
You can use Series.reset_index first for DataFrame and then DataFrame.to_json:
print (df)
category_id name
3 name3 0.329204
24 name24 0.323727
31 name31 0.319526
19 name19 0.008992
23 name23 0.005897
Name: score, dtype: float64
print (type(df))
<class 'pandas.core.series.Series'>
json = df.reset_index().to_json(orient='records')
print (json)
[{"category_id":3,"name":"name3","score":0.329204},
{"category_id":24,"name":"name24","score":0.323727},
{"category_id":31,"name":"name31","score":0.319526},
{"category_id":19,"name":"name19","score":0.008992},
{"category_id":23,"name":"name23","score":0.005897}]
If need output to file:
df.reset_index().to_json('file.json',orient='records')
Details:
print (df.reset_index())
category_id name score
0 3 name3 0.329204
1 24 name24 0.323727
2 31 name31 0.319526
3 19 name19 0.008992
4 23 name23 0.005897
print (type(df.reset_index()))
<class 'pandas.core.frame.DataFrame'>