Python match nested structure - python

Im trying to define some logic, that verifies everything in one nested dictionary belongs to another nested nested dictionary.
Ie:
official_data = {
'Name': 'John Smith',
'ID': 123123232,
'Family': [
{'Name': 'Sarah Smith','ID': 12312323},
{'Name': 'Joe Smith','ID': 12312324}
{'Name': 'Tim Smith','ID': 12312325}
{'Name': 'Sally Smith','ID': 12312326}
],
'Info': {
'InfoList': [
{'text': ['Personal Info Message']},
{'text': ['Secondary Message']}
]
}
}
sample_data = {
'Family': [
{"Name": 'Joe Smith'}
],
'Info': {
'InfoList': [
{'text': ['Secondary Message']}
]
}
}
matches(official_data, sample_data) # True, because everything in sample data exists in official_data, despite official_data having MORE values.
different_sample = {
'Info': {
'InfoList': [{}]
}
}
matches(official_data, different_sample) # True, because the structure of Dict -> Dict -> List -> Dict exists
bad_data = {'ID': 54242343}
matches(official_data, bad_data) # False, because the ID of bad_data is not the ID of official_data
other_bad_data = {
'Info': {
'InfoList': {}
}
}
matches(official_data, other_bad_data) # False, because InfoList is a list in official data
I have a feeling such logic SHOULD be easy to implement, or has already been implemented and is in wide use, but I am struggling to find what i want, and implementing it on my own becomes complicated, with recursive solutions and casting lists into sets in order to make sure order is ignored.
Im wondering if im missing something obvious, or if this logic is actually really niche and would have to be designed from scratch.

Related

Filter python dictionary with dictionary-comprehension

I have a dictionary that is really a geojson:
points = {
'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}, 'type': 'name'},
'features': [
{'geometry': {
'coordinates':[[[-3.693162104185235, 40.40734504903418],
[-3.69320229317164, 40.40719570724241],
[-3.693227952841606, 40.40698546120488],
[-3.693677594635894, 40.40712700492216]]],
'type': 'Polygon'},
'properties': {
'name': 'place1',
'temp': 28},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place2',
'temp': 27},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place',
'temp': 25},
'type': 'Feature'
}
],
'type': u'FeatureCollection'
}
I would like to filter it to stay only with places that have a specific temperature, for example, more than 25 degrees Celsius.
I have managed to do it this way:
dict(crs = points["crs"],
features = [i for i in points["features"] if i["properties"]["temp"] > 25],
type = points["type"])
But I wondered if there was any way to do it more directly, with dictionary comprehension.
Thank you very much.
I'm very late. A dict compreheneison won't help you since you have only three keys. But if you meet the following conditions: 1. you don't need a copy of features (e.g. your dict is read only); 2. you don't need index access to features, you my use a generator comprehension instead of a list comprehension:
dict(crs = points["crs"],
features = (i for i in points["features"] if i["properties"]["temp"] > 25),
type = points["type"])
The generator is created in constant time, while the list comprehension is created in O(n). Furthermore, if you create a lot of those dicts, you have only one copy of the features in memory.

Trouble accessing a value from a dictionary

I have the following dictionary
Dict = {'Manu':{u'ID0020879.07': [{'ID': u'ID0020879.07', 'log': u'log-123-56', 'owner': [Manu], 'item': u'WRAITH', 'main_id': 5013L, 'status': u'noticed', 'serial': u'89980'}]}}
How can I access the serial from this dictionary?
I tried Dict['Manu']['serial'], But its not working as expected..
Guys any idea?
Your dictionary is very nested one.Try like this.
In [1]: Dict['Manu']['ID0020879.07'][0]['serial']
Out[1]: u'89980'
Here is the restructured dictionary.
{
'Manu': {
u'ID0020879.07': [{
'ID': u'ID0020879.07',
'log': u'log-123-56',
'owner': [Manu],
'item': u'WRAITH',
'main_id': 5013L,
'status': u'noticed',
'serial': u'89980'
}]
}
}
Now, you can see where the serial key is located more clearly (not under Manu)...
It is instead
Dict['Manu']['ID0020879.07'][0]['serial']
I suggest you fix that data source to not make ID0020879.07 a key of the data (because it is duplicated in the ID key of that object in the list).
Perhaps fix like so where the Manu key maps to a list of "accounts", each with an ID and other fields
{
'Manu': [{
'ID': u'ID0020879.07',
'log': u'log-123-56',
'owner': [Manu],
'item': u 'WRAITH',
'main_id': 5013L,
'status': u'noticed',
'serial': u'89980'
}]
}
And then you could do
Dict['Manu'][0]['serial']
Or loop the list to get all the serial keys
for item in Dict['Manu']:
print(item['serial'])

Updating complex JSON object in Python

I am grabbing sort of a complex MongoDB document with Python (v3.5) and I should update some values in it which are scattered all around the object and have no particular pattern in the structure and save it back to a different MongoDB collection. The object looks like this:
# after json.loads(mongo_db_document) my dict looks like this
notification = {
'_id': '570f934f45213b0d14b1256f',
'key': 'receipt',
'label': 'Delivery Receipt',
'version': '0.0.1',
'active': True,
'children': [
{
'key': 'started',
'label': 'Started',
'children': [
'date',
'time',
'offset'
]
},
{
'key': 'stop',
'label': 'Ended',
'children': [
'date',
'time',
'offset'
]
},
{
'label': '1. Particulars',
'template': 'formGroup',
'children': [
{
'children': [
{
'key': 'name',
'label': '2.1 Name',
'value': '********** THIS SHOULD BE UPDATED **********',
'readonly': 'true'
},
{
'key': 'ims_id',
'label': '2.2 IMS Number',
'value': '********** THIS SHOULD BE UPDATED **********',
'readonly': 'true'
}
]
},
{
'children': [
{
'key': 'type',
'readonly': '********** THIS SHOULD BE UPDATED **********',
'label': '2.3 Type',
'options': [
{
'label': 'Passenger',
'value': 'A37'
},
{
'label': 'Cargo',
'value': 'A35'
},
{
'label': 'Other',
'value': '********** THIS SHOULD BE UPDATED **********'
}
]
}
]
}
]
},
{
'template': 'formGroup',
'key': 'waste',
'label': '3. Waste',
'children': [
{
'label': 'Waste',
'children': [
{
'label': 'Plastics',
'key': 'A',
'inputType': 'number',
'inputAttributes': {
'min': 0
},
'value': '********** THIS SHOULD BE UPDATED **********'
},
{
'label': 'B. Oil',
'key': 'B',
'inputType': 'number',
'inputAttributes': {
'min': 0
},
'value': '********** THIS SHOULD BE UPDATED **********'
},
{
'label': 'C. Operational',
'key': 'C',
'inputType': 'number',
'inputAttributes': {
'min': 0
},
'value': '********** THIS SHOULD BE UPDATED **********'
}
]
}
]
},
{
'template': 'formRow',
'children': [
'empty',
'signature'
]
}
],
'filter': {
'timestamp_of_record': [
'date',
'time',
'offset'
]
}
}
My initial idea was to put placeholders (like $var_name) in places where I need to update values, and load the string with Python's string.Template, but that approach unfortunately breaks lots of stuff to other users of the same MongoDB document for some reason.
Is there a solution to simply modify this kind of object without "hardcoding" path to find the values I need to update?
There's this small script that I had written a couple years ago - I used it to find entries in some very long and unnerving JSONs. Admittedly it's not beautiful, but it might help in your case, perhaps?
You can find the script on Bitbucket, here (and here is the code).
Unfortunately it's not documented; at the time I wasn't really believing other people would use it, I guess.
Anyways, if you'd like to try it, save the script in your working directory and then use something like this:
from RecursiveSearch import Retriever
def alter_data(json_data, key, original, newval):
'''
Alter *all* values of said keys
'''
retr = Retriever(json_data)
for item_no, item in enumerate(retr.__track__(key)): # i.e. all 'value'
# Pick parent objects with a last element False in the __track__() result,
# indicating that `key` is either a dict key or a set element
if not item[-1]:
parent = retr.get_parent(key, item_no)
try:
if parent[key] == original:
parent[key] = newval
except TypeError:
# It's a set, this is not the key you're looking for
pass
if __name__ == '__main__':
alter_data(notification, key='value',
original = '********** THIS SHOULD BE UPDATED **********',
newval = '*UPDATED*')
Unfortunately as I said the script isn't well documented, so if you want to try it and need more info, I'll be glad to provide it.
Not sure if I understood correctly, but this will dynamically find all keys "value" and "readonly" and print out the paths to address the fields.
def findem(data, trail):
if isinstance(data, dict):
for k in data.keys():
if k in ('value', 'readonly'):
print("{}['{}']".format(trail, k))
else:
findem(data[k], "{}['{}']".format(trail, k))
elif isinstance(data, list):
for k in data:
findem(k, '{}[{}]'.format(trail, data.index(k)))
if __name__ == '__main__':
findem(notification, 'notification')
notification['children'][2]['children'][0]['children'][0]['readonly']
notification['children'][2]['children'][0]['children'][0]['value']
notification['children'][2]['children'][0]['children'][1]['readonly']
notification['children'][2]['children'][0]['children'][1]['value']
notification['children'][2]['children'][1]['children'][0]['readonly']
notification['children'][2]['children'][1]['children'][0]['options'][0]['value']
notification['children'][2]['children'][1]['children'][0]['options'][1]['value']
notification['children'][2]['children'][1]['children'][0]['options'][2]['value']
notification['children'][3]['children'][0]['children'][0]['value']
notification['children'][3]['children'][0]['children'][1]['value']
notification['children'][3]['children'][0]['children'][2]['value']
Add another list to the JSON object. Each item in that list would be a list of keys that lead to the values to be changed. An example for one such list is: ['children', 2, 'children', 'children', 0, 'value'].
Then, to access the value you could use a loop:
def change(json, path, newVal):
cur = json
for key in path[:-1]:
cur = cur[key]
cur[path[-1]] = newVal
path = notification['paths'][0]
#path, for example, could be ['children', 2, 'children', 'children', 0, 'value']
newVal = 'what ever you want'
change(notification, path, newVal)

KeyError in MongoKit when adding nested field to structure

I have the following structure:
structure = {
'firstname': basestring,
'lastname': basestring,
'genres': [basestring],
'address': [
{'number': basestring, 'street': basestring, 'town': basestring}
],
'phone': [
{'type': basestring, 'number': basestring}
],
}
And I have a small helper method to iterate over cursors to return a python dict like so:
def to_django_context(cursor):
records = []
for r in cursor:
records.append(r.to_json_type())
return records
this works fine until I want to add another nested field to the structure like this:
structure = {
'firstname': basestring,
'lastname': basestring,
'genres': [basestring],
'address': [
{'number': basestring, 'street': basestring, 'town': basestring}
],
'phone': [
{'type': basestring, 'number': basestring}
],
'title': [{'TEST_FIELD': basestring}],
}
at which point my cursor iterator fails with a KeyError. If I delete all the documents in the collection it works normally. So does this mean that everytime I change my document structure object I have to essentially drop the collection?
Cheers,
F

Accessing Data Nested in Dictionary in Array in Dictionary

I'm feeling stumped and looking for help. I'm trying to access data that lives inside of a dictionary that's inside of an array that is inside of a dictionary. See below:
{
'files': [
{
'type': 'diskDescriptor',
'name': '[VM] VM1/VM1.vmdk',
'key': 4,
'size': 0
},
{
'type': 'diskExtent',
'name': '[VM] VM1/VM1-flat.vmdk',
'key': 5,
'size': 32457621504
}
],
'capacity': 32505856,
'label': 'Hard disk 1',
'descriptor': '[VM] VM1/VM1.vmdk',
'committed': 31696896,
'device': {
'summary': '32,505,856 KB',
'_obj': <pysphere.vi_property.VIProperty object at 0x17442910>,
'unitNumber': 0,
'key': 2000,
'label': 'Hard disk 1',
'type': 'VirtualDisk',
'capacityInKB': 32505856
}
}
If I want to access, let's say the descriptor key value how would I go about this with Python? For some reason all of the combinations I've tried do not work.
Any help and guidance would be appreciated and if more information is needed I can provide. Thanks.
Lets call your main dictionary bob, because I like bob:
bob['files'] #get you the list with second dictionary
bob['files'][0] #get you the first item in the list, which is the nested 2nd dictionary
bob['files'][0]['type'] == 'diskDescriptor'

Categories