get value of nested lists and dictionaries of a json - python

I'm trying to get value of 'description' and first 'x','y' of related to that description from a json file so I used pandas.io.json.json_normalize and followed this example at end of page but getting error:
KeyError: ("Try running with errors='ignore' as key %s is not always present", KeyError('description',))
How can I get value of 'description' "Play" and "Game" and first 'x','y' of related to that description (0,2) and (1, 2) respectively from following json file and save result as a data frame?
I edited the code and I want to get this as result:
0 1 2 3
0 Play Game
1
2
3
4
but Game is not in the x,y that should be.
import pandas as pd
from pandas.io.json import json_normalize
data = [
{
"responses": [
{
"text": [
{
"description": "Play",
"bounding": {
"vertices": [
{
"x": 0,
"y": 2
},
{
"x": 513,
"y": -5
},
{
"x": 513,
"y": 73
},
{
"x": 438,
"y": 73
}
]
}
},
{
"description": "Game",
"bounding": {
"vertices": [
{
"x": 1,
"y": 2
},
{
"x": 307,
"y": 29
},
{
"x": 307,
"y": 55
},
{
"x": 201,
"y": 55
}
]
}
}
]
}
]
}
]
#w is columns h is rows
w, h = 4, 5;
Matrix = [[' ' for j in range(w)] for i in range(h)]
for row in data:
for response in row["responses"]:
for entry in response["text"]:
Description = entry["description"]
x = entry["bounding"]["vertices"][0]["x"]
y = entry["bounding"]["vertices"][0]["y"]
Matrix[x][y] = Description
df = pd.DataFrame(Matrix)
print(df)

you need to pass data[0]['responses'][0]['text'] to json_normalize like this
df = json_normalize(data[0]['responses'][0]['text'],[['bounding','vertices']], 'description')
which will result in
x y description
0 438 -5 Play
1 513 -5 Play
2 513 73 Play
3 438 73 Play
4 201 29 Game
5 307 29 Game
6 307 55 Game
7 201 55 Game
I hope this is what you are expecting.
EDIT:
df.groupby('description').get_group('Play').iloc[0]
will give you the first item of a group 'play'
x 438
y -5
description Play
Name: 0, dtype: object

Related

dictionaries to pandas dataframe

I'm trying to extract data from dictionaries, here's an example for one dictionary. Here's what I have so far (probably not the greatest solution).
def common():
ab={
"names": ["Brad", "Chad"],
"org_name": "Leon",
"missing": 0.3,
"con": {
"base": "abx",
"conditions": {"func": "**", "ref": 0},
"results": 4,
},
"change": [{"func": "++", "ref": 50, "res": 31},
{"func": "--", "ref": 22, "res": 11}]
}
data = []
if "missing" in ab.keys():
data.append(
{
"names": ab["names"],
"org_name": ab["org_name"],
"func": "missing",
"ref": "",
"res": ab["missing"],
}
)
if "con" in ab.keys():
data.append(
{
"names": ab["names"],
"org_name": ab["con"]["base"],
"func": ab["con"]["conditions"]["func"],
"ref": ab["con"]["conditions"]["ref"],
"res": ab["con"]["results"],
}
)
df = pd.DataFrame(data)
print(df)
return df
Output:
names org_name func ref res
0 [Brad, Chad] Leon missing 0.3
1 [Brad, Chad] abx ** 0 4.0
What I would like the output to look like:
names org_name func ref res
0 [Brad, Chad] Leon missing 0.3
1 [Brad, Chad] abx ** 0 4
2 [Brad, Chad] Leon ++ 50 31
3 [Brad, Chad] Leon -- 22 11
The dictionaries can be different length, ultimately a list of several dictionaries will be passed. I'm not sure how to repeat the names and org_name values based on the ref and res values... I don't want to keep adding row by row, dynamic solution is always preferred.
Try:
import pandas as pd
ab={
"names": ["Brad", "Chad"],
"org_name": "Leon",
"missing": 0.3,
"con": {
"base": "abx",
"conditions": {"func": "**", "ref": 0},
"results": 4,
},
"change": [{"func": "++", "ref": 50, "res": 31},
{"func": "--", "ref": 22, "res": 11}]
}
out = []
if 'change' in ab:
for ch in ab['change']:
out.append({'names': ab['names'], 'org_name': ab['org_name'], **ch})
if 'con' in ab:
out.append({'names': ab['names'], 'org_name': ab['con']['base'], **ab['con']['conditions'], 'res': ab['con']['results']})
if 'missing' in ab:
out.append({'names': ab['names'], 'org_name': ab['org_name'], 'func': 'missing', 'res': ab['missing']})
print(pd.DataFrame(out).fillna(''))
Prints:
names org_name func ref res
0 [Brad, Chad] Leon ++ 50.0 31.0
1 [Brad, Chad] Leon -- 22.0 11.0
2 [Brad, Chad] abx ** 0.0 4.0
3 [Brad, Chad] Leon missing 0.3

Python - Grab specific value from known key inside large json

I need to get just 2 entries inside a very large json object, I don't know the array position, but I do know key:value pairs of the entry I want to find and where I want another value from this entry.
In this example there are only 4 examples, but in the original there are over 1000, and I need only 2 entries of which I do know "name" and "symbol" each. I need to get the value of quotes->ASK->time.
x = requests.get('http://example.org/data.json')
parsed = x.json()
gettime= str(parsed[0]["quotes"]["ASK"]["time"])
print(gettime)
I know that I can get it that way, and then loop through that a thousand times, but that seems like an overkill for just 2 values. Is there a way to do something like parsed["symbol":"kalo"]["quotes"]["ASK"]["time"] which would give me kalo time without using a loop, without going through all thousand entries?
[
{
"id": "nem-cri",
"name": "nemlaaoo",
"symbol": "nem",
"rank": 27,
"owner": "marcel",
"quotes": {
"ASK": {
"price": 19429,
"time": 319250866,
"duration": 21
}
}
},
{
"id": "kalo-lo-leek",
"name": "kalowaaa",
"symbol": "kalo",
"rank": 122,
"owner": "daniel",
"quotes": {
"ASK": {
"price": 12928,
"time": 937282932,
"duration": 09
}
}
},
{
"id": "reewmaarwl",
"name": "reeqooow",
"symbol": "reeq",
"rank": 4,
"owner": "eric",
"quotes": {
"ASK": {
"price": 9989,
"time": 124288222,
"duration": 19
}
}
},
{
"id": "sharkooaksj",
"name": "sharkmaaa",
"symbol": "shark",
"rank": 22,
"owner": "eric",
"quotes": {
"ASK": {
"price": 11122,
"time": 482773882,
"duration": 22
}
}
}
]
If you are OK with using pandas I would just create a DataFrame.
import pandas as pd
df = pd.json_normalize(parsed)
print(df)
id name symbol rank owner quotes.ASK.price \
0 nem-cri nemlaaoo nem 27 marcel 19429
1 kalo-lo-leek kalowaaa kalo 122 daniel 12928
2 reewmaarwl reeqooow reeq 4 eric 9989
3 sharkooaksj sharkmaaa shark 22 eric 11122
quotes.ASK.time quotes.ASK.duration
0 319250866 21
1 937282932 9
2 124288222 19
3 482773882 22
If you want the kalo value then
print(df[df['symbol'] == 'kalo']['quotes.ASK.price']) # -> 12928

How to normalize uneven JSON structures in pandas?

I am using the Google Maps Distance Matrix API to get several distances from multiple origins. The API response comes in a JSON structured like:
{
"destination_addresses": [
"Destination 1",
"Destination 2",
"Destination 3"
],
"origin_addresses": [
"Origin 1",
"Origin 2"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "8.7 km",
"value": 8687
},
"duration": {
"text": "19 mins",
"value": 1129
},
"status": "OK"
},
{
"distance": {
"text": "223 km",
"value": 222709
},
"duration": {
"text": "2 hours 42 mins",
"value": 9704
},
"status": "OK"
},
{
"distance": {
"text": "299 km",
"value": 299156
},
"duration": {
"text": "4 hours 17 mins",
"value": 15400
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "216 km",
"value": 215788
},
"duration": {
"text": "2 hours 44 mins",
"value": 9851
},
"status": "OK"
},
{
"distance": {
"text": "20.3 km",
"value": 20285
},
"duration": {
"text": "21 mins",
"value": 1283
},
"status": "OK"
},
{
"distance": {
"text": "210 km",
"value": 210299
},
"duration": {
"text": "2 hours 45 mins",
"value": 9879
},
"status": "OK"
}
]
}
],
"status": "OK"
}
Note the rows array has the same number of elements in origin_addresses (2), while each elements array has the same number of elements in destination_addresses (3).
Is one able to use the pandas API to normalize everything inside rows while fetching the corresponding data from origin_addresses and destination_addresses?
The output should be:
status distance.text distance.value duration.text duration.value origin_addresses destination_addresses
0 OK 8.7 km 8687 19 mins 1129 Origin 1 Destination 1
1 OK 223 km 222709 2 hours 42 mins 9704 Origin 1 Destination 2
2 OK 299 km 299156 4 hours 17 mins 15400 Origin 1 Destination 3
3 OK 216 km 215788 2 hours 44 mins 9851 Origin 2 Destination 1
4 OK 20.3 km 20285 21 mins 1283 Origin 2 Destination 2
5 OK 210 km 210299 2 hours 45 mins 9879 Origin 2 Destination 3
If pandas does not provide a relatively simple way to do it, how would one accomplish this operation?
If data contains the dictionary from the question you can try:
df = pd.DataFrame(data["rows"])
df["origin_addresses"] = data["origin_addresses"]
df = df.explode("elements")
df = pd.concat([df.pop("elements").apply(pd.Series), df], axis=1)
df = pd.concat(
[df.pop("distance").apply(pd.Series).add_prefix("distance."), df], axis=1
)
df = pd.concat(
[df.pop("duration").apply(pd.Series).add_prefix("duration."), df], axis=1
)
df["destination_addresses"] = data["destination_addresses"] * len(
data["origin_addresses"]
)
print(df)
Prints:
duration.text duration.value distance.text distance.value status origin_addresses destination_addresses
0 19 mins 1129 8.7 km 8687 OK Origin 1 Destination 1
0 2 hours 42 mins 9704 223 km 222709 OK Origin 1 Destination 2
0 4 hours 17 mins 15400 299 km 299156 OK Origin 1 Destination 3
1 2 hours 44 mins 9851 216 km 215788 OK Origin 2 Destination 1
1 21 mins 1283 20.3 km 20285 OK Origin 2 Destination 2
1 2 hours 45 mins 9879 210 km 210299 OK Origin 2 Destination 3

MongoDB collection to pandas Dataframe

My MongoDB document structure is as follows and some of the factors are NaN.
_id :ObjectId("5feddb959297bb2625db1450")
factors: Array
0:Object
factorId:"C24"
Index:0
weight:1
1:Object
factorId:"C25"
Index:1
weight:1
2:Object
factorId:"C26"
Index:2
weight:1
name:"Growth Led Momentum"
I want to convert it to pandas data frame as follows using pymongo and pandas.
|name | factorId | Index | weight|
----------------------------------------------------
|Growth Led Momentum | C24 | 0 | 0 |
----------------------------------------------------
|Growth Led Momentum | C25 | 1 | 0 |
----------------------------------------------------
|Growth Led Momentum | C26 | 2 | 0 |
----------------------------------------------------
Thank you
Update
I broke out the ol Python to give this a crack - the following code works flawlessly!
from pymongo import MongoClient
import pandas as pd
uri = "mongodb://<your_mongo_uri>:27017"
database_name = "<your_database_name"
collection_name = "<your_collection_name>"
mongo_client = MongoClient(uri)
database = mongo_client[database_name]
collection = database[collection_name]
# I used this code to insert a doc into a test collection
# before querying (just incase you wanted to know lol)
"""
data = {
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
"factorId": "C24",
"index": 0,
"weight": 1
},
{
"factorId": "D74",
"index": 7,
"weight": 9
}
]
}
insert_result = collection.insert_one(data)
print(insert_result)
"""
# This is the query that
# answers your question
results = collection.aggregate([
{
"$unwind": "$factors"
},
{
"$project": {
"_id": 1, # Change to 0 if you wish to ignore "_id" field.
"name": 1,
"factorId": "$factors.factorId",
"index": "$factors.index",
"weight": "$factors.weight"
}
}
])
# This is how we turn the results into a DataFrame.
# We can simply pass `list(results)` into `DataFrame(..)`,
# due to how our query works.
results_as_dataframe = pd.DataFrame(list(results))
print(results_as_dataframe)
Which outputs:
_id name factorId index weight
0 1 Growth Lead Momentum C24 0 1
1 1 Growth Lead Momentum D74 7 9
Original Answer
You could use the aggregation pipeline to unwind factors and then project the fields you want.
Something like this should do the trick.
Live demo here.
Database Structure
[
{
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
factorId: "C24",
index: 0,
weight: 1
},
{
factorId: "D74",
index: 7,
weight: 9
}
]
}
]
Query
db.collection.aggregate([
{
$unwind: "$factors"
},
{
$project: {
_id: 1,
name: 1,
factorId: "$factors.factorId",
index: "$factors.index",
weight: "$factors.weight"
}
}
])
Results
(.csv friendly)
[
{
"_id": 1,
"factorId": "C24",
"index": 0,
"name": "Growth Lead Momentum",
"weight": 1
},
{
"_id": 1,
"factorId": "D74",
"index": 7,
"name": "Growth Lead Momentum",
"weight": 9
}
]
Wonderful answer by Matt, In case you want to use pandas:
Use this after you have retrieved documents from db:
df = pd.json_normalize(data)
df = df['factors'].explode().apply(lambda x: [val for _, val in x.items()]).explode().apply(pd.Series).join(df).drop(columns=['factors'])
Output:
factorId Index weight name
0 C24 0 1 Growth Led Momentum
0 C25 1 1 Growth Led Momentum
0 C26 2 1 Growth Led Momentum

How to get length this Json element in Robot Framework

I would like to get length this json element in Robot Framework.
Json Example
[
[
{
"a": "2020-01",
"value": "1"
},
{
"a": "2020-02",
"value": "2"
},
{
"a": "2020-03",
"value": "10"
},
{
"a": "2020-04",
"value": "9"
},
{
"a": "2020-05",
"value": "0"
},
{
"a": "2020-06",
"value": "7"
}
]
]
The expected result is
a 2020-01
value 1
a 2020-02
value 2
a 2020-03
value 10
a 2020-04
value 9
a 2020-05
value 0
a 2020-06
value 7
length = 6
I try
${data_length}= get length ${json_data}
is not working
I think there are [ ] 2 levels. Please guide me, Thanks
You need to convert the JSON to a python data structure, and then you can use the Get Length keyword on the first element of the outer-most list.
Here's one way to do that. It assumes that the JSON data is not null, and that the raw JSON data is in a variable named ${json_data}
${data}= Evaluate json.loads($json_data)
${length}= Get length ${data[0]}
Should be equal as numbers ${length} 6

Categories