Generate JSON object from json-path in python

Generate JSON object from json-path in python - python

I have a list of json path-s and some values for every path, for example:
bla.[0].ble with a value: 3
and I would like to generate a json object where to output will look like this:
{
"bla": [
{
"ble": 3
}
]
}
To find the expression in the json I used jsonpath-ng library, but now I want to do the other direction, and build json from json-paths.
Can you give me some advice how make this json-generator, which can be used for every json-path?
I tried to just loop through the keys and create list if needed, but maybe there is a more generic solution for this? (any open source library is also perfect if there is any)

As a workaround my solution was to build a new dictionary using the expressions (or their hash) as keys and the values as the values:
generated_json[hash(bla.[0].ble)] = 3
So even though the json object doesn't match the expected output format, I can use this to lookup my expressions as they describe unique paths.
Please feel free to suggest any better solution as this is just a workaround.

Related

looping through json python is very slow

Can someone help me understand what I'm doing wrong in the following code:
def matchTrigTohost(gtriggerids,gettriggers):
mylist = []
for eachid in gettriggers:
gtriggerids['params']['triggerids'] = str(eachid)
hgetjsonObject = updateitem(gtriggerids,processor)
hgetjsonObject = json.dumps(hgetjsonObject)
hgetjsonObject = json.loads(hgetjsonObject)
hgetjsonObject = eval(hgetjsonObject)
hostid = hgetjsonObject["result"][0]["hostid"]
hname = hgetjsonObject["result"][0]["name"]
endval = hostid + "--" + hname
mylist.append(endval)
return(hgetjsonObject)
The variable gettriggers contain a lot of ids (~3500):
[ "26821", "26822", "26810", ..... ]
I'm looping through the ids in the variable and assigning them to a json object.
gtriggerids = {
"jsonrpc": "2.0",
"method": "host.get",
"params": {
"output": ["hostid", "name"],
"triggerids": "26821"
},
"auth": mytoken,
"id": 2
}
When I run the code against the above json variable, it is very slow. It is taking several minutes to check each ID. I'm sure I'm doing many things wrong here or at least not in the pythonic way. Can anyone help me speed this up? I'm very new to python.
NOTE:
The dump() , load(), eval() were used to convert the str produced to json.

You asked for help knowing what you're doing wrong. Happy to oblige :-)
At the lowest level—why your function is running slowly—you're running many unnecessary operations. Specifically, you're moving data between formats (python dictionaries and JSON strings) and back again which accomplishes nothing but wasting CPU cycles.
You mentioned this is only way you could get the data in the format you needed. That brings me to the second thing you're doing wrong.
You're throwing code at the wall instead of understanding what's happening.
I'm quite sure (and several of your commenters appear to agree) that your code is not the only way to arrange your data into a usable structure. What you should do instead is:
Understand as much as you can about the data you're being given. I suspect the output of updateitem() should be your first target of learning.
Understand the right/typical way to interact with that data. Your data doesn't have to be a dictionary before you can use it. Maybe it's not the best approach.
Understand what regularities and irregularities the data may have. Part of your problem may not be with types or dictionaries, but with an unpredictable/dirty data source.
Armed with all this new knowledge, manipulate your as simply as you can.
I can pretty much guarantee the result will run faster.
More detail! Some things you wrote suggest misconceptions:
I'm looping through the ids in the variable and assigning them to a json object.
No, you can't assign to a JSON object. In python, JSON data is always a string. You probably mean that you're assigning to a python dictionary, which (sometimes!) can be converted to a JSON object, represented as a string. Make sure you have all those concepts clear before you move forward.
The dump() , load(), eval() were used to convert the str produced to json.
Again, you don't call dumps() on a string. You use that to convert a python object to a string. Run this code in a REPL, go step by step, and inspect or play with each output to understand what it is.

Check if JSON var has nullable key (Twitter Streaming API)

I'm downloading tweets from Twitter Streaming API using Tweepy. I manage to check if downloaded data has keys as 'extended_tweet', but I'm struggling with an specific key inside another key.
def on_data(self, data):
savingTweet = {}
if not "retweeted_status" in data:
dataJson = json.loads(data)
if 'extended_tweet' in dataJson:
savingTweet['text'] = dataJson['extended_tweet']['full_text']
else:
savingTweet['text'] = dataJson['text']
if 'coordinates' in dataJson:
if 'coordinates' in dataJson['coordinates']:
savingTweet['coordinates'] = dataJson['coordinates']['coordinates']
else:
savingTweet['coordinates'] = 'null'
I'm checking 'extended_key' propertly, but when I try to do the same with ['coordinates]['coordinates] I get the following error:
TypeError: argument of type 'NoneType' is not iterable
Twitter documentation says that key 'coordinates' has the following structure:
"coordinates":
{
"coordinates":
[
-75.14310264,
40.05701649
],
"type":"Point"
}
I achieved to solve it by just putting the conflictive check in a try, except, but I think this is not the most suitable approach to the problem. Any other idea?

So the twitter API docs are probably lying a bit about what they return (shock horror!) and it looks like you're getting a None in place of the expected data structure. You've already decided against using try, catch, so I won't go over that, but here are a few other suggestions.
Using dict get() default
There are a couple of options that occur to me, the first is to make use of the default ability of the dict get command. You can provide a fall back if the expected key does not exist, which allows you to chain together multiple calls.
For example you can achieve most of what you are trying to do with the following:
return {
'text': data.get('extended_tweet', {}).get('full_text', data['text']),
'coordinates': data.get('coordinates', {}).get('coordinates', 'null')
}
It's not super pretty, but it does work. It's likely to be a little slower that what you are doing too.
Using JSONPath
Another option, which is likely overkill for this situation is to use a JSONPath library which will allow you to search within data structures for items matching a query. Something like:
from jsonpath_rw import parse
matches = parse('extended_tweet.full_text').find(data)
if matches:
print(matches[0].value)
This is going to be a lot slower that what you are doing, and for just a few fields is overkill, but if you are doing a lot of this kind of work it could be a handy tool in the box. JSONPath can also express much more complicated paths, or very deeply nested paths where the get method might not work, or would be unweildy.
Parse the JSON first!
The last thing I would mention is to make sure you parse your JSON before you do your test for "retweeted_status". If the text appears anywhere (say inside the text of a tweet) this test will trigger.
JSON parsing with a competent library is usually extremely fast too, so unless you are having real speed problems it's not necessarily worth worrying about.

convert string representation of a dict inside json dict value to dict

Hello all and sorry if the title was worded poorly. I'm having a bit of trouble wrapping my head around how to solve this issue I have encountered. I would have liked to simply pass a dict as the value for this key in my json obj but sadly I have to pass it as a string. So, I have a json dict object that looks like this
data = {"test": "Fuzz", "options": "'{'size':'Regular','connection':'unconnected'}'"}. Obviously, I would prefer that the second dict value weren't a string representation of a dictionary but rather a dictionary. Is the best route here to just strip the second and second to last single quotes for the data[options] or is there a better alternative?
Sorry for any confusion. This is how the json object looks after I perform
json.dump(data, <filename>)
The value for options can be thought of as another variable say x and it's equivalent to '{'size':'Regular','connection':'unconnected'}'
I could do x[1:-1] but I'm not sure if that is the most pythonic way to do things here.

import ast
bad_string_dict = "'{'size':'Regular','connection':'unconnected'}'"
good_string_dict = bad_string_dict.strip("'")
good_dict = ast.literal_eval(good_string_dict)
print(good_dict)
You will have to strip quotation mark, no other way around

Given OP's comments I suggest the following:
Set the environment variable to a known data format (example: json/yaml/...), not a specific language (python)
Use the json module (or the format you've chosen) to load the data
The data should look like this:
raw_data = {"test": "Fuzz", "options": "{\"size\": \"Regular\", \"connection\": \"unconnected\"}"}
And the code should look like this:
raw_options = raw_data['options']
options = json.loads(raw_options)
data = {**raw_data, 'options': options}

python json dump, how to make specify key first?

I want to dump this json to a file:
json.dumps(data)
This is the data:
{
"list":[
"one": { "id": "12","desc":"its 12","name":"pop"},
"two": {"id": "13","desc":"its 13","name":"kindle"}
]
}
I want id to be the first property after I dump it to file, but it is not. How can I fix this?

My guess is that it's because you're using a dictionary (hash-map). It's unsortable.
What you could do is:
from collections import OrderedDict
data = OrderedDict()
data['list'] = OrderedDict()
data['list']['one'] = OrderedDict()
data['list']['one']['id'] = '12'
data['list']['one']['idesc'] = ...
data['list']['two'] = ...
This makes it sorted by order of input.
It's "impossible" to know the output of a dict/hashmap because the nature (and speed) of a traditional dictionary makes the sort/access order vary depending on usage, items in the dictionary and a lot of other factors.
So you need to either pass your dictionary to a sort() function prior to sending it to json or use a slower version of the dictionary called OrderedDict (see above).
Many thanks goes out to #MarcoNawijn for checking the source of JSON that does not honor the sort structure of the dictionary, which means you'll have to build the JSON string yourself.
If the parser on the other end of your JSON string honors the order (which i doubt), you could pass this to a function that builds a regular text-string representation of your OrderedDict and formatting the string as per JSON standards. This will however take up more time than I have at this moment since i'm not 100% certain of the RFC for JSON strings.

You shouldnt worry about the order in which json is saved. The order will be changed when dumping. Better look at these too. JSON order mixed up
and
Is the order of elements in a JSON list maintained?

Mongodb: How to change an element of a nested arrary?

From what I have read it is impossible to update an element in an nested array using the positional operator $ in mongo. The $ only works one level deep. I see it is a requested feature in mongo 2.7.
Updating the whole document one level up is not an option because of write conflicts. I need to just be able to change the 'username' for a particular reward program for instance.
One of the ideas would to be pull, modify, and push the entire 'reward_programs' element but then I would loose the order. Order is important.
Consider this document:
{
"_id:"0,
"firstname":"Tom",
"profiles" : [
{
"profile_name": "tom",
"reward_programs:[
{
'program_name':'American',
'username':'tomdoe',
},
{
'program_name':'Delta',
'username':'tomdoe',
}
]
}
]
}
How would you go about specifically changing the 'username' of 'program_name'=Delta?

After doing more reading it looks like this is unsupported in mongodb at the moment. Positional updates are only supported for one level deep. The feature might be added for mongodb 2.7.
The are a couple of work arounds.
1) Flatten out your database structure. In this case, make 'reward_programs' it's own collection and do your operation on that.
2) Instead of arrays of dicts, use dicts of dicts. That way you can just have an absolute path down to the object you need to modify. This can have drawbacks to query flexibility.
3) Seems hacky to me but you can also walk the list on the nested array find it's position index in the array and do something like this:
users.update({'_id': request._id, 'profiles.profile_name': profile_name}, {'$set': {'profiles.$.reward_programs.{}.username'.format(index): new_username}})
4) Read in the whole document, modify, write back. However, this has possible write conflicts
Setting up your database structure initially is extremely important. It really depends on how you are going to use it.

A simple way to do this:
doc = collection.find_one({'_id': 0})
doc['profiles'][0]["reward_programs"][1]['username'] = 'new user name'
#replace the whole collection
collection.save(doc)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.