Access a variable within a dictionary with unknown nested location - python

I have a JSON file and I want to query it using python. However, I do not know the nested location of a variable before hand. E.g. to query a JSON object below loaded into python and called 'data', I could do the following:
data['experiments']['initial_ns']['icdat']
However, this assumes that I know that the icdat variable is located below initial_ns which is located under experiments. Unfortunately I do not have this information and also the JSON structure could change in the future. Is there a simpler variable to access variables within a JSON string without explicitly specifying the entire structure?
thanks!!!
{
"experiments": [
{
"management": {
"events": [
{
"date": "19122",
"timp": "TI3",
"eve": "tage"
}
]
},
"initial_ns": {
"icpcr": "MZ",
"icdat": "1922"
},
"observed": {
"mdat": "19403",
"time_series": [
{
"date": "198423",
"etac": "0"
}
],
"adat": "190218"
},
"local_name": "lhi",
"exname": "SE",
"exp_dur": "1"
}
]
}

Have a look at the jsonpath module. http://goessner.net/articles/JsonPath/. I think the search string $..icdat will match your needs.

"...without explicitly specifying the entire structure?"
Yes, there are many ways. Unfortunately you have not specified which answer you are looking for.
To be "unique in terms of the schema" (my terminology) is as follows: If you have for example multiple Foo dictionaries with the key Foo.bar, then that is still unique. What is not unique is if you have Foo objects with Foo.bar, and Baz objects with Baz.bar: searching for {... baz:...} will return different kinds of objects.
If the key is unique in terms of the schema, you can search the entire tree. You can make this go faster by caching all key-value pairs in a dictionary for later use (therefore the operation is O(1) "instant" amortized cost, since you needed to go through the entire data structure anyway to parse it!). This even works if you would like to return sets of objects: use a cache = collections.defaultdict(set) and when you preprocess items to cache, do cache[key].add(value).
If the key is not unique in terms of the schema, you will want to make a reasonable guess about the path and provide some partial information, per Hans Then's answer utilization JsonPath: https://stackoverflow.com/a/12291240/711085 (alternatively, change the schema)

No. You need to know the format, or you'll have to manually loop over everything in it.

You can write a function to recursively search nested containers for a given key, similar to findElementByID() in an XML DOM parser.
def find_key(json, key):
if isinstance(json, dict):
if key in json:
yield json[key]
if isinstance(json, (dict, list)):
for value in (json.itervalues() if isinstance(json, dict) else json):
if isinstance(value, (dict, list)):
for item in find_key(value, key):
yield item
>>> next(items_by_key(data, "icdat"))
'1922'
Since the same key may be found in multiple places in the document, this is actually written as a generator. You can iterate over the results to get all the values or, if you just want the first one (or know it's the only one), use next() around it as I've shown above. You could also convert it to a list() if desired.

Related

Accessing list within a dictionary via Python from a JSON file

I am new to Python and could use some help, please. I'm trying to use Python to read from my JSON file and get the values of the list within in a dictionary.
Once I read the JSON into my program, I have:
request body = {
"providerName": "ProviderNameXYZ",
"rateRequestPayments": [
{
"amount": 128.0,
"creditorProfileId": "7539903942457444658",
"debtorProfileId": "2072612712266192555",
"paymentMethodId": "2646780961603748694016",
"currency": "EUR",
"userReference": "INVALID user ref automation single",
"reconciliationId": "343546578753349"
},
{
"amount": 129.0,
"creditorProfileId": "7539903942457444658",
"debtorProfileId": "2072612712266192555",
"paymentMethodId": "2646780961603748694016",
"currency": "EUR",
"userReference": "INVALID user ref automation single",
"reconciliationId": "343546578753340"
}
]
}
I now want to be able to grab the value of any particular key.
I've tried accessing it via several routes:
rateRequestPayments[0].amount
rateRequestPayments()[0].amount
for k, v in request_body.rateRequestPayments(): print(k, v)
for each in request_body.rateRequestPayments: print(each('amount'))
values = eval(request_body['rateRequestPayments']) print(values[0])
All of these end up with errors.
Per the 2nd comment below: request_body['rateRequestPayments'][0]['amount']
This works!
I also want to be able to delete the whole key-value pair ("amount": 128.0,) from the request_body. request_body['rateRequestPayments'][0]['amount'] does not work for this. Not sure how to reference this.
I know I am missing something simple but I'm just unsure what it is. Any help would be greatly appreciated.
Python is great for data science, and has many features that make it fit for the job. One of those features is finding data in a data set. body['rateRequestPayments'][<!WHICH DATA SET YOU WANT TO ACCESS>][<!VALUE YOU WANT TO FIND>]
First you'll need to put an underscore between requests and body so its requests_body = {...}.
To access a key-value pair:
requests_body["rateRequestPayments"] #to access the list of two dicts
requests_body["rateRequestPayments"][0] #to access the individual dicts by index
requests_body["rateRequestPayments"][0]["amount"] #to access the value of a particular key (in this case "amount" is the key)
To delete a key value pair, just use the del statement:
del requests_body["rateRequestPayments"][0]["amount"]
This will delete the key-value pair in the variable requests_body, so trying to access this key again will raise a KeyError error.
To access and delete a key-value pair, use .pop():
value = requests_body["rateRequestPayments"][0].pop("amount")
#value is now 128.0 and the key-value pair is deleted

Parse an embedded object (JSON) into an ordered dictionary in Python

I am looking to parse some JSON into a dictionary but need to preserve order for one particular part of the dictionary.
I know that I can parse the entire JSON file into an ordered dictionary (ex. Can I get JSON to load into an OrderedDict?) but this is not quite what I'm looking for.
{
"foo": "bar",
"columns":
{
"col_1": [],
"col_2": []
}
}
In this example, I would want to parse the entire file in as a dictionary with the "columns" portion being an OrderedDict. Is it possible to get that granular with the JSON parsing tools while guaranteeing that order is preserved throughout? Thank you!
From the comments meanwhile, I gathered that a complete, nested OrderedDict is fine as well, but this could be a solution too, if you don't mind using some knowledge about the names of the columns:
import json
from collections import OrderedDict
def hook(partialjson):
if "col_1" in partialjson:
return OrderedDict(partialjson)
return dict(partialjson)
result = json.loads("YOUR JSON STRING", object_hook=hook)
Hope this helps!

json.dumps with indent prints different output [duplicate]

I've noticed the order of elements in a JSON object not being the original order.
What about the elements of JSON lists? Is their order maintained?
Yes, the order of elements in JSON arrays is preserved. From RFC 7159 -The JavaScript Object Notation (JSON) Data Interchange Format
(emphasis mine):
An object is an unordered collection of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.
An array is an ordered sequence of zero or more values.
The terms "object" and "array" come from the conventions of
JavaScript.
Some implementations do also preserve the order of JSON objects as well, but this is not guaranteed.
The order of elements in an array ([]) is maintained. The order of elements (name:value pairs) in an "object" ({}) is not, and it's usual for them to be "jumbled", if not by the JSON formatter/parser itself then by the language-specific objects (Dictionary, NSDictionary, Hashtable, etc) that are used as an internal representation.
Practically speaking, if the keys were of type NaN, the browser will not change the order.
The following script will output "One", "Two", "Three":
var foo={"3":"Three", "1":"One", "2":"Two"};
for(bar in foo) {
alert(foo[bar]);
}
Whereas the following script will output "Three", "One", "Two":
var foo={"#3":"Three", "#1":"One", "#2":"Two"};
for(bar in foo) {
alert(foo[bar]);
}
Some JavaScript engines keep keys in insertion order. V8, for instance, keeps all keys in insertion order except for keys that can be parsed as unsigned 32-bit integers.
This means that if you run either of the following:
var animals = {};
animals['dog'] = true;
animals['bear'] = true;
animals['monkey'] = true;
for (var animal in animals) {
if (animals.hasOwnProperty(animal)) {
$('<li>').text(animal).appendTo('#animals');
}
}
var animals = JSON.parse('{ "dog": true, "bear": true, "monkey": true }');
for (var animal in animals) {
$('<li>').text(animal).appendTo('#animals');
}
You'll consistently get dog, bear, and monkey in that order, on Chrome, which uses V8. Node.js also uses V8. This will hold true even if you have thousands of items. YMMV with other JavaScript engines.
Demo here and here.
"Is the order of elements in a JSON list maintained?" is not a good question. You need to ask "Is the order of elements in a JSON list maintained when doing [...] ?"
As Felix King pointed out, JSON is a textual data format. It doesn't mutate without a reason. Do not confuse a JSON string with a (JavaScript) object.
You're probably talking about operations like JSON.stringify(JSON.parse(...)). Now the answer is: It depends on the implementation. 99%* of JSON parsers do not maintain the order of objects, and do maintain the order of arrays, but you might as well use JSON to store something like
{
"son": "David",
"daughter": "Julia",
"son": "Tom",
"daughter": "Clara"
}
and use a parser that maintains order of objects.
*probably even more :)

Multiple FOR loops in iterating over dictionary in Python

This is a simplistic example of a dictionary created by a json.load that I have t deal with:
{
"name": "USGS REST Services Query",
"queryInfo": {
"timePeriod": "PT4H",
"format": "json",
"data": {
"sites": [{
"id": "03198000",
"params": "[00060, 00065]"
},
{
"id": "03195000",
"params": "[00060, 00065]"
}]
}
}
}
Sometimes there may be 15-100 sites with unknown sets of parameters at each site. My goal is to either create two lists (one storing "site" IDs and the other storing "params") or a much simplified dictionary from this original dictionary. Is there a way to do this using nested for loops with kay,value pairs using the iteritem() method?
What I have tried to far is this:
queryDict = {}
for key,value in WS_Req_dict.iteritems():
if key == "queryInfo":
if value == "data":
for key, value in WS_Req_dict[key][value].iteritems():
if key == "sites":
siteVal = key
if value == "params":
paramList = [value]
queryDict["sites"] = siteVal
queryDict["sites"]["params"] = paramList
I run into trouble getting the second FOR loop to work. I haven't looked into pulling out lists yet.
I think this maybe an overall stupid way of doing it, but I can't see around it yet.
I think you can make your code much simpler by just indexing, when feasible, rather than looping over iteritems.
for site in WS_Req_dict['queryInfo']['data']['sites']:
queryDict[site['id']] = site['params']
If some of the keys might be missing, dict's get method is your friend:
for site in WS_Req_dict.get('queryInfo',{}).get('data',{}).get('sites',[]):
would let you quietly ignore missing keys. But, this is much less readable, so, if I needed it, I'd encapsulate it into a function -- and often you may not need this level of precaution! (Another good alternative is a try/except KeyError encapsulation to ignore missing keys, if they are indeed possible in your specific use case).

Mongodb: How to change an element of a nested arrary?

From what I have read it is impossible to update an element in an nested array using the positional operator $ in mongo. The $ only works one level deep. I see it is a requested feature in mongo 2.7.
Updating the whole document one level up is not an option because of write conflicts. I need to just be able to change the 'username' for a particular reward program for instance.
One of the ideas would to be pull, modify, and push the entire 'reward_programs' element but then I would loose the order. Order is important.
Consider this document:
{
"_id:"0,
"firstname":"Tom",
"profiles" : [
{
"profile_name": "tom",
"reward_programs:[
{
'program_name':'American',
'username':'tomdoe',
},
{
'program_name':'Delta',
'username':'tomdoe',
}
]
}
]
}
How would you go about specifically changing the 'username' of 'program_name'=Delta?
After doing more reading it looks like this is unsupported in mongodb at the moment. Positional updates are only supported for one level deep. The feature might be added for mongodb 2.7.
The are a couple of work arounds.
1) Flatten out your database structure. In this case, make 'reward_programs' it's own collection and do your operation on that.
2) Instead of arrays of dicts, use dicts of dicts. That way you can just have an absolute path down to the object you need to modify. This can have drawbacks to query flexibility.
3) Seems hacky to me but you can also walk the list on the nested array find it's position index in the array and do something like this:
users.update({'_id': request._id, 'profiles.profile_name': profile_name}, {'$set': {'profiles.$.reward_programs.{}.username'.format(index): new_username}})
4) Read in the whole document, modify, write back. However, this has possible write conflicts
Setting up your database structure initially is extremely important. It really depends on how you are going to use it.
A simple way to do this:
doc = collection.find_one({'_id': 0})
doc['profiles'][0]["reward_programs"][1]['username'] = 'new user name'
#replace the whole collection
collection.save(doc)

Categories