Filtering out Json with jmespath - python

I have this simple Python script, it is supposed to be part of something bigger I just cannot figure out how to work with jmespath
#!/usr/bin/env python
import jmespath
if __name__ == '__main__':
# input json
text = \
{
'topology': [
{
'node': [
{
'topology-stats:session-state': {
'synchronized': True,
'local-pref': {
'session-id': 0,
'deadtimer': 120,
'ip-address': '10.30.170.187',
'keepalive': 30
},
'messages': {
'stateful-stats:sent-upd-msg-count': 1,
'last-sent-msg-timestamp': 1513334157,
'stateful-stats:last-received-rpt-msg-timestamp': 1513334157,
'unknown-msg-received': 0,
'stateful-stats:received-rpt-msg-count': 3,
'reply-time': {
'max-time': 77,
'average-time': 77,
'min-time': 77
},
'stateful-stats:sent-init-msg-count': 0,
'sent-msg-count': 1,
'received-msg-count': 3
},
'session-duration': '0:00:00:12'
},
'node-id': '10.30.170.117'
}
],
'topology-id': 'asdf-topology'
}
]
}
exp = jmespath.compile('''topology[*].node[?'topology-stats:session-state' != 'session-duration'][]''')
result = exp.search(text)
print result
What I want is to basically remove lines with keys which have unpredictable values ( in perfect world I would switch the value for something generic ) - like: last-sent-msg-timestamp, session-duration, stateful-stats:last-received-rpt-msg-timestamp. And perfectly I would want to keep everything else, although I can live with loosing topology and node tags.
The thing is I can use only one jmespath.search and I can do only one jmespath expression. Also I cannot use anything else from Python - this script is just example.
Is this possible with jmespath? It is currently my best option due to limitations of project.

Removing fields with Jmespath is not possible currently. There is a pending feature request:
Ability to set and delete based on a jmespath #121
https://github.com/jmespath/jmespath.py/issues/121
I am using jq to do that:
jq 'del(.foo)'
Input {"foo": 42, "bar": 9001, "baz": 42}
Output {"bar": 9001, "baz": 42}

Related

JSONPath issues with Python and jsonpath_ng (Parse error near token ?)

I'm trying to work with jsonpath_ng Python library. For most of the JSONPath filters I usually use it works.
However, I'm struggling with a simple filter clause. It can be summarized in 2 lines.
from jsonpath_ng.ext import parse
jsonpath_expression = parse(f"$.jobs.*.jobSummary.[?(#.storagePolicy.storagePolicyName=='{SPname}')].sizeOfApplication")
My JSON payload is this one:
{
"processinginstructioninfo": {
"attributes": [
{
"name": "WebServer",
"value": "IDM-COMMSERVE"
}
]
},
"totalRecordsWithoutPaging": 161,
"jobs": [
{
"jobSummary": {
"sizeOfApplication": 65552265428,
"vsaParentJobID": 28329591,
"commcellId": 2,
"backupSetName": "defaultBackupSet",
"opType": 59,
"totalFailedFolders": 0,
"totalFailedFiles": 0,
"alertColorLevel": 0,
"jobAttributes": 288232025419153408,
"jobAttributesEx": 67108864,
"isVisible": true,
"localizedStatus": "Completed",
"isAged": false,
"totalNumOfFiles": 0,
"jobId": 28329592,
"jobSubmitErrorCode": 0,
"sizeOfMediaOnDisk": 34199,
"currentPhase": 0,
"status": "Completed",
"lastUpdateTime": 1661877467,
"percentSavings": 99.99995,
"localizedOperationName": "Snap Backup",
"statusColor": "black",
"pendingReason": "",
"errorType": 0,
"backupLevel": 2,
"jobElapsedTime": 59,
"jobStartTime": 1661877408,
"currentPhaseName": "",
"jobType": "Snap Backup",
"isPreemptable": 0,
"backupLevelName": "Incremental",
"attemptStartTime": 0,
"pendingReasonErrorCode": "",
"appTypeName": "Virtual Server",
"percentComplete": 100,
"averageThroughput": 27472.637,
"localizedBackupLevelName": "Incremental",
"currentThroughput": 0,
"subclientName": "default",
"destClientName": "desktop-1058kvf",
"jobEndTime": 1661877467,
"dataSource": {
"dataSourceId": 0
},
"subclient": {
"clientName": "desktop-1058kvf",
"instanceName": "VMInstance",
"backupsetId": 161,
"commCellName": "idm-commserve",
"instanceId": 2,
"subclientId": 235,
"clientId": 71,
"appName": "Virtual Server",
"backupsetName": "defaultBackupSet",
"applicationId": 106,
"subclientName": "default"
},
"storagePolicy": {
"storagePolicyName": "IDM-Metallic-Replica_ReplicationPlan",
"storagePolicyId": 78
},
"destinationClient": {
"clientId": 71,
"clientName": "desktop-1058kvf",
"displayName": "idm-laptop1"
},
"userName": {
"userName": "admin",
"userId": 1
},
"clientGroups": [
{
"clientGroupId": 4,
"clientGroupName": "Laptop Clients"
},
{
"clientGroupId": 46,
"clientGroupName": "Clients For Commserv LiveSync"
},
{
"clientGroupId": 47,
"clientGroupName": "idm-vcsa"
},
{
"clientGroupId": 55,
"clientGroupName": "Laptop plan test clients"
}
]
}
}
]
}
I need to get just the "sizeOfApplication" parameter for every object with a particular "storagePolicyName". That's it. Say, in this case, that the "storagePolicyName" I'm looking values for is "IDM-Metallic-Replica_ReplicationPlan" as an example.
I usually go to My favourite JSONPath site to test the JSONpath I use, and this one
"$.jobs.*.jobSummary.[?(#.storagePolicy.storagePolicyName=='IDM-Metallic-Replica_ReplicationPlan')].sizeOfApplication" works.
But, on Python side, I keep getting "jsonpath_ng.exceptions.JsonPathParserError: Parse error at 1:21 near token ? (?)" errors.
What am I doing wrong?
Thank you!
Mattia
I think the problem here is that jsonpath_ng is being stricter to the JSONPath proposal than the other parsers you have tried.
The first problem is that there shouldn't be a . immediately before a filter condition [?(...)]. So the first step is to remove the . after jobSummary in jobSummary.[?(#storagePolicy....
I made that change to your JSONPath expression, and used jsonpath_ng to run it on your sample data. The parser error had gone, but it returned no matches. So it's still not right.
From reading the JSONPath proposal, it's not clear if you can use a filter operator such as [?(...)] on an object, or only on an array. When used on an array it would return all elements of the array that match the filter. If a JSONPath parser does support a filter on an object, then it seems it returns the object if the filter matches and an empty list of matches otherwise.
I would guess that jsonpath_ng only permits filters on arrays. So let's modify your JSONPath expression to only use filters on arrays. Your JSON has an array in $.jobs, and within each element of this array you want to look within the jobSummary object for a storagePolicy with storagePolicyName=={SPname}. So the following JSONPath expression should return the matching job:
$.jobs[?(#.jobSummary.storagePolicy.storagePolicyName=='{SPname}')]
What we then want to do is to get the value of the sizeOfApplication property within the jobSummary object within each matching job. Note that the matches returned by the above JSONPath expression are elements of the jobs array, not jobSummary objects. We can't just select sizeOfApplication because we're one level further up than we were before. We need to go back into the jobSummary object to get the sizeOfApplication:
$.jobs[?(#.jobSummary.storagePolicy.storagePolicyName=='{SPname}')].jobSummary.sizeOfApplication
I used jsonpath_ng to run this JSONPath expression on your sample data and it gave me the output [65552265428], which seems to be the expected output.

Django: Transform dict of queryset

I use the following code to get my data out of the dict.
test = self.events_max_total_gross()
events = organizer.events.all()
for event in events:
test.get(event.pk, {}).values()
[...]
I use this query set to get the data. My question is: Does the transformation at the end makes sense or is there a better way to access the dict (without transforming it first). As I have several of these my approach doesn't seem to follow the DRY principle.
def events_max_total_gross(self):
events_max_total_gross = (
Event.objects.filter(
organizer__in=self.organizers,
status=EventStatus.LIVE
)
.annotate(total_gross=Sum(F('tickets__quantity') * F('tickets__price_gross')))
.values('pk', 'total_gross')
)
"""
Convert this
[
{
'pk': 2,
'total_gross': 12345
},
{
'pk': 3,
'total_gross': 54321
},
...
]
to this:
{
2: {
'total_gross': 12345,
},
3: {
'total_gross': 12345,
}
...
}
"""
events_max_total_gross_transformed = {}
for item in events_max_total_gross:
events_max_total_gross_transformed.setdefault(
item['pk'], {}
).update({'total_gross': item['total_gross']})
return events_max_total_gross_transformed
Use:
transformed = {
v['pk']: { 'total_gross': v['total_gross'] } for v in events_max_total_gross
}
This is called a python dict comprehension. Google that term if you want tutorials or examples.
If I understood it correctly, then for every organiser's event you need total_gross, you can query like this instead of going over events and events_max_total_gross in loop.
First get all the events of that particular organizer.
event_ids = Event.objects.filter(organizer=organizer).values_list('id',flat=True)
Run this then:
Event.objects.filter(
id__in=event_ids,
organizer__in=self.organizers,
status=EventStatus.LIVE
)
.annotate(total_gross=Sum(F('tickets__quantity') * F('tickets__price_gross')))
.values('pk', 'total_gross')
)
This way you will save your transformation of dict and looping it over again.
In case you need to do like this because of some other requirement then you can use python dict comprehension:
events_max_total_gross_transformed = {
event['pk']: { 'total_gross': event['total_gross'] } for event in events_max_total_gross
}
But as you have several of these, you might wanna have a look at proxy-models also which might help. Here you can write manager functions to help you with your queries.

Python export some list and put all in for loop

I am writing a program that exports monitoring data.
I have python code which sends API requests and gets a response as json in form of a dictionary.
Responses look like this:
[
{
"diskwrite": 667719532544,
"name": "test-hostname",
"maxmem": 536870912,
"diskread": 876677015576,
"mem": 496111616,
"id": "qemu/102",
"node": "node1",
"template": 0,
"cpu": 0.00947819269772254,
"vmid": 102,
"type": "qemu",
"maxcpu": 2,
"netout": 15081562546,
"maxdisk": 10737418240,
**"status": "running",**
"netin": 15852619497,
"disk": 0,
"uptime": 3273086
},
{
"maxcpu": 8,
"type": "qemu",
"vmid": 106,
"cpu": 0.500598230113925,
"template": 0,
"node": "node1",
"id": "qemu/106",
"mem": 10341007360,
"maxmem": 10737418240,
"diskread": 8586078094720,
"name": "some.other-hostname",
"diskwrite": 6052378838016,
"uptime": 1900411,
"disk": 0,
"netin": 4899018841106,
**"status": "stopping",**
"maxdisk": 107374182400,
"netout": 4788420573355
},
...
I'd like to loop through all the hostnames and their items as is ("diskwrite", "mem", "cpu", etc) but I'd like to add these items to a dictionary only if they have a status of running ("status":"running").
ram_metric.set({'type': "total"}, ram[0])
cpu_metric.set({'type': 'average', 'interval': '5 minutes'}, cpu[0])
I also need a loop that will make this line of code and for every "name" item creating this row with host=name:
ram_metric = Gauge("memory_usage_bytes", "Memory usage in bytes.",
{'host': host})
cpu_metric = Gauge("cpu_usage_percent", "CPU usage percent.",
{'host': host})
Please, I don't know how to do that.
Maybe I am not understanding your question correctly, since I am having a little trouble understanding exactly what you want.
If you want to get a list of all host-names you can do the following.
You can use a list comprehension for this. Something like:
running_hosts = [running_host['name'] for running_host in my_list_of_dicts if running_host['status'] == "running"]
I can try to help.
The response you are getting is a list with dictionaries and you are only interested if the status is running so first:
def get_all_vm():
all_vm = get_request("/api2/json/cluster/resources?type=vm")
vm = all_vm['data']
list = [item for item in vm if item['status'] == "running"]
return list
metric = get_all_vm()
Now you can loop through all the items and (1) add new values by item["newkey"] = newvalue or (2) loop through existing keys with dict iteration
for item in metric:
item["justanexample"] = "test"
for key,value in item.items():
if key == "mem":
ram_metric.set({'name': metric['data']['name'], 'type': 'usage'}, metric['data']['mem'])
elif key == "cpu":
cpu_metric.set({'name': metric['data']['name'], 'type': 'load'}, metric['data']['cpu'])
...

merging json files by keys in python (or some other easy scripting language)

I would like to merge multiple json files into a single object or file. An example JSON object could be
{
"food": {
"ingredents": [
"one": "this",
"two": "that",
]
}
}
and
"food": {
"amount": [
"tablespoons": "3",
]
}
}
I would like to be able to merge these so that we get
"food": {
"ingredents": [
"one": "this",
"two": "that",
],
"amount": [
"tablespoons": "3",
]
}
}
so that they are all combined via the parent keys instead of just as a list where "food" would repeat itself. Also I would like the outgoing file to replace anything that is repeated, such as if ingredients "one" : "this" existed somewhere else it would only appear once.
Any help would be greatly appreciated. Specifically something that iterates through a list of JSON files and applies the method would be ideal.
I have tried using glob and iterating through the JSON files
such as
ar = []
for f in glob.glob("*.json"):
with open(f, "rb") as filename:
ar.append(json.load(filename))
with open("outfile.json", "w") as outfile:
json.dump(ar, outfile)
yet this just gives me a list of JSON objects without connecting them by key.
I could likely write one solution where it collects the key data and uses conditionals to determine where to place an object inside a certain key, but this would require a lot more work especially since I am dealing with a large amount of files. If there is a simpler solution that would be amazing.
Not sure which libraries you've tried that you say didn't work correctly for your needs, but I would recommend lodash. It's an incredibly fast, tiny, robust library to handle these kinds of operations. For this specific case you could easily accomplish it with lodash merge https://lodash.com/docs#merge
example:
var users = {
'data': [{ 'user': 'barney' }, { 'user': 'fred' }]
};
var ages = {
'data': [{ 'age': 36 }, { 'age': 40 }]
};
_.merge(users, ages);
// → { 'data': [{ 'user': 'barney', 'age': 36 }, { 'user': 'fred', 'age': 40 }] }

build dynamical JSON/LIST obj in Python

I'm a real Python newbee and I'm having problems creating a JSON/LIST obj.
What I want to end up with is the following JSON to send o an API
{
"request": {
"slice": [
{
"origin": "AMS",
"destination": "SYD",
"date": "2015-06-23"
}
],
"passengers": {
"adultCount": 1,
"infantInLapCount": 0,
"infantInSeatCount": 0,
"childCount": 0,
"seniorCount": 0
},
"solutions": 20,
"refundable": false
}
}
I figured to make a list and then to convert to JSON with the dumps() function. This works. The thing is, I need to change the date field with an iterator to add a day, but I'm stuck on changing this field.
Any advice?
thx!
As your question is a bit vague i can only guess that you're trying to modify the JSON version of your data directly, while you should modify the Python object before converting it into JSON... something like this:
d = {
"request": {
"slice": [
{
"origin": "AMS",
"destination": "SYD",
"date": "2015-06-23"
}
],
"passengers": {
"adultCount": 1,
"infantInLapCount": 0,
"infantInSeatCount": 0,
"childCount": 0,
"seniorCount": 0
},
"solutions": 20,
"refundable": False # note how this is python False, not js false!
}
}
# then you can do:
d["request"]["slice"][0]["date"] = "2015-05-23"
# and finally convert to json:
j = json.dumps(d)
If it happens that you get JSON as a string, you should first convert it into a python object so you can work on it:
# if j is your json string, convert it into a python object
d = json.loads(j)
# then do your modifications as above:
d["request"]["slice"][0]["date"] = "2015-05-23"
# and finally convert back to json:
j = json.dumps(d)

Categories