get value of nested lists and dictionaries of a json

get value of nested lists and dictionaries of a json - python

I'm trying to get value of 'description' and first 'x','y' of related to that description from a json file so I used pandas.io.json.json_normalize and followed this example at end of page but getting error:
KeyError: ("Try running with errors='ignore' as key %s is not always present", KeyError('description',))
How can I get value of 'description' "Play" and "Game" and first 'x','y' of related to that description (0,2) and (1, 2) respectively from following json file and save result as a data frame?
I edited the code and I want to get this as result:
0 1 2 3
0 Play Game
1
2
3
4
but Game is not in the x,y that should be.
import pandas as pd
from pandas.io.json import json_normalize
data = [
{
"responses": [
{
"text": [
{
"description": "Play",
"bounding": {
"vertices": [
{
"x": 0,
"y": 2
},
{
"x": 513,
"y": -5
},
{
"x": 513,
"y": 73
},
{
"x": 438,
"y": 73
}
]
}
},
{
"description": "Game",
"bounding": {
"vertices": [
{
"x": 1,
"y": 2
},
{
"x": 307,
"y": 29
},
{
"x": 307,
"y": 55
},
{
"x": 201,
"y": 55
}
]
}
}
]
}
]
}
]
#w is columns h is rows
w, h = 4, 5;
Matrix = [[' ' for j in range(w)] for i in range(h)]
for row in data:
for response in row["responses"]:
for entry in response["text"]:
Description = entry["description"]
x = entry["bounding"]["vertices"][0]["x"]
y = entry["bounding"]["vertices"][0]["y"]
Matrix[x][y] = Description
df = pd.DataFrame(Matrix)
print(df)

you need to pass data[0]['responses'][0]['text'] to json_normalize like this
df = json_normalize(data[0]['responses'][0]['text'],[['bounding','vertices']], 'description')
which will result in
x y description
0 438 -5 Play
1 513 -5 Play
2 513 73 Play
3 438 73 Play
4 201 29 Game
5 307 29 Game
6 307 55 Game
7 201 55 Game
I hope this is what you are expecting.
EDIT:
df.groupby('description').get_group('Play').iloc[0]
will give you the first item of a group 'play'
x 438
y -5
description Play
Name: 0, dtype: object

Related

dictionaries to pandas dataframe

I'm trying to extract data from dictionaries, here's an example for one dictionary. Here's what I have so far (probably not the greatest solution).
def common():
ab={
"names": ["Brad", "Chad"],
"org_name": "Leon",
"missing": 0.3,
"con": {
"base": "abx",
"conditions": {"func": "**", "ref": 0},
"results": 4,
},
"change": [{"func": "++", "ref": 50, "res": 31},
{"func": "--", "ref": 22, "res": 11}]
}
data = []
if "missing" in ab.keys():
data.append(
{
"names": ab["names"],
"org_name": ab["org_name"],
"func": "missing",
"ref": "",
"res": ab["missing"],
}
)
if "con" in ab.keys():
data.append(
{
"names": ab["names"],
"org_name": ab["con"]["base"],
"func": ab["con"]["conditions"]["func"],
"ref": ab["con"]["conditions"]["ref"],
"res": ab["con"]["results"],
}
)
df = pd.DataFrame(data)
print(df)
return df
Output:
names org_name func ref res
0 [Brad, Chad] Leon missing 0.3
1 [Brad, Chad] abx ** 0 4.0
What I would like the output to look like:
names org_name func ref res
0 [Brad, Chad] Leon missing 0.3
1 [Brad, Chad] abx ** 0 4
2 [Brad, Chad] Leon ++ 50 31
3 [Brad, Chad] Leon -- 22 11
The dictionaries can be different length, ultimately a list of several dictionaries will be passed. I'm not sure how to repeat the names and org_name values based on the ref and res values... I don't want to keep adding row by row, dynamic solution is always preferred.

Try:
import pandas as pd
ab={
"names": ["Brad", "Chad"],
"org_name": "Leon",
"missing": 0.3,
"con": {
"base": "abx",
"conditions": {"func": "**", "ref": 0},
"results": 4,
},
"change": [{"func": "++", "ref": 50, "res": 31},
{"func": "--", "ref": 22, "res": 11}]
}
out = []
if 'change' in ab:
for ch in ab['change']:
out.append({'names': ab['names'], 'org_name': ab['org_name'], **ch})
if 'con' in ab:
out.append({'names': ab['names'], 'org_name': ab['con']['base'], **ab['con']['conditions'], 'res': ab['con']['results']})
if 'missing' in ab:
out.append({'names': ab['names'], 'org_name': ab['org_name'], 'func': 'missing', 'res': ab['missing']})
print(pd.DataFrame(out).fillna(''))
Prints:
names org_name func ref res
0 [Brad, Chad] Leon ++ 50.0 31.0
1 [Brad, Chad] Leon -- 22.0 11.0
2 [Brad, Chad] abx ** 0.0 4.0
3 [Brad, Chad] Leon missing 0.3

Python - Grab specific value from known key inside large json

I need to get just 2 entries inside a very large json object, I don't know the array position, but I do know key:value pairs of the entry I want to find and where I want another value from this entry.
In this example there are only 4 examples, but in the original there are over 1000, and I need only 2 entries of which I do know "name" and "symbol" each. I need to get the value of quotes->ASK->time.
x = requests.get('http://example.org/data.json')
parsed = x.json()
gettime= str(parsed[0]["quotes"]["ASK"]["time"])
print(gettime)
I know that I can get it that way, and then loop through that a thousand times, but that seems like an overkill for just 2 values. Is there a way to do something like parsed["symbol":"kalo"]["quotes"]["ASK"]["time"] which would give me kalo time without using a loop, without going through all thousand entries?
[
{
"id": "nem-cri",
"name": "nemlaaoo",
"symbol": "nem",
"rank": 27,
"owner": "marcel",
"quotes": {
"ASK": {
"price": 19429,
"time": 319250866,
"duration": 21
}
}
},
{
"id": "kalo-lo-leek",
"name": "kalowaaa",
"symbol": "kalo",
"rank": 122,
"owner": "daniel",
"quotes": {
"ASK": {
"price": 12928,
"time": 937282932,
"duration": 09
}
}
},
{
"id": "reewmaarwl",
"name": "reeqooow",
"symbol": "reeq",
"rank": 4,
"owner": "eric",
"quotes": {
"ASK": {
"price": 9989,
"time": 124288222,
"duration": 19
}
}
},
{
"id": "sharkooaksj",
"name": "sharkmaaa",
"symbol": "shark",
"rank": 22,
"owner": "eric",
"quotes": {
"ASK": {
"price": 11122,
"time": 482773882,
"duration": 22
}
}
}
]

If you are OK with using pandas I would just create a DataFrame.
import pandas as pd
df = pd.json_normalize(parsed)
print(df)
id name symbol rank owner quotes.ASK.price \
0 nem-cri nemlaaoo nem 27 marcel 19429
1 kalo-lo-leek kalowaaa kalo 122 daniel 12928
2 reewmaarwl reeqooow reeq 4 eric 9989
3 sharkooaksj sharkmaaa shark 22 eric 11122
quotes.ASK.time quotes.ASK.duration
0 319250866 21
1 937282932 9
2 124288222 19
3 482773882 22
If you want the kalo value then
print(df[df['symbol'] == 'kalo']['quotes.ASK.price']) # -> 12928

How to normalize uneven JSON structures in pandas?

I am using the Google Maps Distance Matrix API to get several distances from multiple origins. The API response comes in a JSON structured like:
{
"destination_addresses": [
"Destination 1",
"Destination 2",
"Destination 3"
],
"origin_addresses": [
"Origin 1",
"Origin 2"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "8.7 km",
"value": 8687
},
"duration": {
"text": "19 mins",
"value": 1129
},
"status": "OK"
},
{
"distance": {
"text": "223 km",
"value": 222709
},
"duration": {
"text": "2 hours 42 mins",
"value": 9704
},
"status": "OK"
},
{
"distance": {
"text": "299 km",
"value": 299156
},
"duration": {
"text": "4 hours 17 mins",
"value": 15400
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "216 km",
"value": 215788
},
"duration": {
"text": "2 hours 44 mins",
"value": 9851
},
"status": "OK"
},
{
"distance": {
"text": "20.3 km",
"value": 20285
},
"duration": {
"text": "21 mins",
"value": 1283
},
"status": "OK"
},
{
"distance": {
"text": "210 km",
"value": 210299
},
"duration": {
"text": "2 hours 45 mins",
"value": 9879
},
"status": "OK"
}
]
}
],
"status": "OK"
}
Note the rows array has the same number of elements in origin_addresses (2), while each elements array has the same number of elements in destination_addresses (3).
Is one able to use the pandas API to normalize everything inside rows while fetching the corresponding data from origin_addresses and destination_addresses?
The output should be:
status distance.text distance.value duration.text duration.value origin_addresses destination_addresses
0 OK 8.7 km 8687 19 mins 1129 Origin 1 Destination 1
1 OK 223 km 222709 2 hours 42 mins 9704 Origin 1 Destination 2
2 OK 299 km 299156 4 hours 17 mins 15400 Origin 1 Destination 3
3 OK 216 km 215788 2 hours 44 mins 9851 Origin 2 Destination 1
4 OK 20.3 km 20285 21 mins 1283 Origin 2 Destination 2
5 OK 210 km 210299 2 hours 45 mins 9879 Origin 2 Destination 3
If pandas does not provide a relatively simple way to do it, how would one accomplish this operation?

If data contains the dictionary from the question you can try:
df = pd.DataFrame(data["rows"])
df["origin_addresses"] = data["origin_addresses"]
df = df.explode("elements")
df = pd.concat([df.pop("elements").apply(pd.Series), df], axis=1)
df = pd.concat(
[df.pop("distance").apply(pd.Series).add_prefix("distance."), df], axis=1
)
df = pd.concat(
[df.pop("duration").apply(pd.Series).add_prefix("duration."), df], axis=1
)
df["destination_addresses"] = data["destination_addresses"] * len(
data["origin_addresses"]
)
print(df)
Prints:
duration.text duration.value distance.text distance.value status origin_addresses destination_addresses
0 19 mins 1129 8.7 km 8687 OK Origin 1 Destination 1
0 2 hours 42 mins 9704 223 km 222709 OK Origin 1 Destination 2
0 4 hours 17 mins 15400 299 km 299156 OK Origin 1 Destination 3
1 2 hours 44 mins 9851 216 km 215788 OK Origin 2 Destination 1
1 21 mins 1283 20.3 km 20285 OK Origin 2 Destination 2
1 2 hours 45 mins 9879 210 km 210299 OK Origin 2 Destination 3

MongoDB collection to pandas Dataframe

My MongoDB document structure is as follows and some of the factors are NaN.
_id :ObjectId("5feddb959297bb2625db1450")
factors: Array
0:Object
factorId:"C24"
Index:0
weight:1
1:Object
factorId:"C25"
Index:1
weight:1
2:Object
factorId:"C26"
Index:2
weight:1
name:"Growth Led Momentum"
I want to convert it to pandas data frame as follows using pymongo and pandas.
|name | factorId | Index | weight|
----------------------------------------------------
|Growth Led Momentum | C24 | 0 | 0 |
----------------------------------------------------
|Growth Led Momentum | C25 | 1 | 0 |
----------------------------------------------------
|Growth Led Momentum | C26 | 2 | 0 |
----------------------------------------------------
Thank you

Update
I broke out the ol Python to give this a crack - the following code works flawlessly!
from pymongo import MongoClient
import pandas as pd
uri = "mongodb://<your_mongo_uri>:27017"
database_name = "<your_database_name"
collection_name = "<your_collection_name>"
mongo_client = MongoClient(uri)
database = mongo_client[database_name]
collection = database[collection_name]
# I used this code to insert a doc into a test collection
# before querying (just incase you wanted to know lol)
"""
data = {
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
"factorId": "C24",
"index": 0,
"weight": 1
},
{
"factorId": "D74",
"index": 7,
"weight": 9
}
]
}
insert_result = collection.insert_one(data)
print(insert_result)
"""
# This is the query that
# answers your question
results = collection.aggregate([
{
"$unwind": "$factors"
},
{
"$project": {
"_id": 1, # Change to 0 if you wish to ignore "_id" field.
"name": 1,
"factorId": "$factors.factorId",
"index": "$factors.index",
"weight": "$factors.weight"
}
}
])
# This is how we turn the results into a DataFrame.
# We can simply pass `list(results)` into `DataFrame(..)`,
# due to how our query works.
results_as_dataframe = pd.DataFrame(list(results))
print(results_as_dataframe)
Which outputs:
_id name factorId index weight
0 1 Growth Lead Momentum C24 0 1
1 1 Growth Lead Momentum D74 7 9
Original Answer
You could use the aggregation pipeline to unwind factors and then project the fields you want.
Something like this should do the trick.
Live demo here.
Database Structure
[
{
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
factorId: "C24",
index: 0,
weight: 1
},
{
factorId: "D74",
index: 7,
weight: 9
}
]
}
]
Query
db.collection.aggregate([
{
$unwind: "$factors"
},
{
$project: {
_id: 1,
name: 1,
factorId: "$factors.factorId",
index: "$factors.index",
weight: "$factors.weight"
}
}
])
Results
(.csv friendly)
[
{
"_id": 1,
"factorId": "C24",
"index": 0,
"name": "Growth Lead Momentum",
"weight": 1
},
{
"_id": 1,
"factorId": "D74",
"index": 7,
"name": "Growth Lead Momentum",
"weight": 9
}
]

Wonderful answer by Matt, In case you want to use pandas:
Use this after you have retrieved documents from db:
df = pd.json_normalize(data)
df = df['factors'].explode().apply(lambda x: [val for _, val in x.items()]).explode().apply(pd.Series).join(df).drop(columns=['factors'])
Output:
factorId Index weight name
0 C24 0 1 Growth Led Momentum
0 C25 1 1 Growth Led Momentum
0 C26 2 1 Growth Led Momentum

How to get length this Json element in Robot Framework

I would like to get length this json element in Robot Framework.
Json Example
[
[
{
"a": "2020-01",
"value": "1"
},
{
"a": "2020-02",
"value": "2"
},
{
"a": "2020-03",
"value": "10"
},
{
"a": "2020-04",
"value": "9"
},
{
"a": "2020-05",
"value": "0"
},
{
"a": "2020-06",
"value": "7"
}
]
]
The expected result is
a 2020-01
value 1
a 2020-02
value 2
a 2020-03
value 10
a 2020-04
value 9
a 2020-05
value 0
a 2020-06
value 7
length = 6
I try
${data_length}= get length ${json_data}
is not working
I think there are [ ] 2 levels. Please guide me, Thanks

You need to convert the JSON to a python data structure, and then you can use the Get Length keyword on the first element of the outer-most list.
Here's one way to do that. It assumes that the JSON data is not null, and that the raw JSON data is in a variable named ${json_data}
${data}= Evaluate json.loads($json_data)
${length}= Get length ${data[0]}
Should be equal as numbers ${length} 6

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

get value of nested lists and dictionaries of a json - python

Related

dictionaries to pandas dataframe

Python - Grab specific value from known key inside large json

How to normalize uneven JSON structures in pandas?

MongoDB collection to pandas Dataframe

How to get length this Json element in Robot Framework

Categories

Resources