Correctly parsing data with jq - python

I have the following data:
[
{
"M": [
{
"id": 1,
"nk": "MATH$$SPRING$$INST1$$2",
"section": {
"nk": "MATH$$SPRING$$INST1",
"course": 1,
"id": 1
},
"location": {
"id": 1,
"nk": "mcu$$101",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "101"
},
"day_of_week": 2,
"start_time": "09:00:00",
"end_time": "10:00:00"
},
{
"id": 3,
"nk": "ENG$$SPRING$$INST2$$2",
"section": {
"nk": "ENG$$SPRING$$INST2",
"course": 2,
"id": 4
},
"location": {
"id": 2,
"nk": "mcu$$201",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "201"
},
"day_of_week": 2,
"start_time": "09:00:00",
"end_time": "10:00:00"
},
{
"id": 4,
"nk": "ENG$$SPRING$$INST2$$22",
"section": {
"nk": "ENG$$SPRING$$INST2",
"course": 2,
"id": 4
},
"location": {
"id": 2,
"nk": "mcu$$201",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "201"
},
"day_of_week": 2,
"start_time": "10:00:00",
"end_time": "11:00:00"
}
]
},
{
"W": [
{
"id": 2,
"nk": "MATH$$SPRING$$INST1$$4",
"section": {
"nk": "MATH$$SPRING$$INST2",
"course": 1,
"id": 2
},
"location": {
"id": 2,
"nk": "mcu$$201",
"campus": {
"id": 1,
"nk": "mcu",
"name": "Main Campus"
},
"address": "1 st",
"building": "1",
"room": "201"
},
"day_of_week": 4,
"start_time": "08:00:00",
"end_time": "10:00:00"
}
]
}
]
I'm trying to extract "W"'s list.
When i do: jq('[.[].W][]').transform(data) i get None, But when i do jq('[.[].M][]').transform(data) I get the desired result. Why im i experiencing this?

I'm trying to extract "W"'s list.
OK, so let's first deal with jq, and then with the python interface.
jq
.[] yields all the items in the top-level array, and therefore
.[] | .W will yield two items:
null (because the first item does not have .W), and
the desired list
To extract just "W"'s list, you could use any of the following filters,
depending on your precise requirements:
.[] | select(has("W")) | .W
.[] | .W | select(.)
.[] | .W // empty
.[1].W
from jq import jq
As the documentation at https://pypi.org/project/pyjq/ says:
If multiple_output is False (the default), then the first output is used
For example:
print jq('1,2').transform(data)
yields just 1.
In summary
Depending on the precise requirements, you can use any of the filters given above, for example:
jq('.[] | .W // empty').transform(data)
Moral
If there's a moral to this tale, it might be that, when in doubt, one should consider using jq (the command-line executable) or jqplay to make sure your jq filter is doing what you want.

Related

How to group child objects in python list

I have the parent child structure like following which is generated from the database entries by iterating rows data and append childrens
"path_data": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": 1,
"key": "0",
"name": "Home"
},
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": 2,
"key": "0",
"name": "About Us"
}
],
"count": 1,
"id": 1,
"key": "0",
"name": "Home"
},
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PRODUCT_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [],
"count": 1,
"id": 1,
"key": "0",
"name": "Home"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": 2,
"key": "0",
"name": "About Us"
}
],
"count": 1,
"id": 1,
"key": "0",
"name": "Home"
},
{
"action": "PAGE_VIEW",
"children": [
[]
],
"count": 1,
"id": 1,
"key": "0",
"name": "Home"
}
],
Expected output need to convert this by grouping the same level child & increase count of it like following.
"path_data": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
},{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [
{
"action": "PAGE_VIEW",
"children": [],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 1,
"id": "",
"key": "0",
"name": "Untagged"
}, {
"action": "PAGE_VIEW",
"children": [],
"count": 1,
"id": 1,
"key": "0",
"name": "Home"
}
],
"count": 2,
"id": "",
"key": "0",
"name": "Untagged"
}
],
"count": 2,
"id": 2,
"key": "0",
"name": "About Us"
}
],
"count": 4,
"id": 1,
"key": "0",
"name": "Home"
},
],
Is there any inbuilt functions or library to do this?
This code is written in python 3.8

How to eliminate duplicate items while adding them to their own structure

I have a list of dictionary items, with each dictionary containing a list of presentation items. The sample dictionaries below are a small prototype of my real data set.
I need to remove duplicate presentations based on day (one presentation per day) and store them in a new dictionary with the same structure within the existing list.
So starting with:
[
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "ABC",
"day": 7,
},
{
"presentation": "DEF",
"day": 7,
},
{
"presentation": "GHI",
"day": 8,
},
{
"presentation": "JKL",
"day": 8,
},
{
"presentation": "MNO",
"day": 9,
},
{
"presentation": "PQR",
"day": 9,
},
{
"presentation": "STU",
"day": 9,
}
]
} #only one dictionary item in the list for simplicity
]
The end result should be three dictionaries containing lists of presentations where there is one presentation for a given day:
[
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "ABC",
"day": 7
},
{
"presentation": "DEF",
"day": 8
},
{
"presentation": "GHI",
"day": 9
}
]
},
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "JKL",
"day": 7
},
{
"presentation": "MNO",
"day": 8
},
{
"presentation": "PQR",
"day": 9
}
]
},
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "STU",
"day": 9
}
]
}
]
I don't know how to go about removing these duplicates (based on day) while adding them to their own dictionary.

Get different values from repeating item JSON

I have this json derived dict:
{
"stats": [
{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
},
{
"name": "- k",
"time": 20,
"uid": "199295228664872961",
"id": 2
},
{
"name": "MAD MARX",
"time": "0",
"uid": "336539711785009153",
"id": 3
},
{
"name": "loli",
"time": 20,
"uid": "366299640976375818",
"id": 4
},
{
"name": "Woona",
"time": 20,
"uid": "246996981178695686",
"id": 5
}
]
}
I want to get the "time" from everybody in the list and use it with sort.
So the result I get has this:
TOP 10:
Jengas: 166
Loli: 20
My first try is to list different values from repeating item.
Right now the code is:
with open('db.json') as json_data:
topvjson = json.load(json_data)
print(topvjson)
d = topvjson['stats'][0]['time']
print(d)
Extract the stats list, apply sort to it with the appropriate key:
from json import loads
data = loads("""{
"stats": [{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
}, {
"name": "- k",
"time": 20,
"uid": "199295228664872961",
"id": 2
}, {
"name": "MAD MARX",
"time": "0",
"uid": "336539711785009153",
"id": 3
}, {
"name": "loli",
"time": 20,
"uid": "366299640976375818",
"id": 4
}, {
"name": "Woona",
"time": 20,
"uid": "246996981178695686",
"id": 5
}]
}""")
stats = data['stats']
stats.sort(key = lambda entry: int(entry['time']), reverse=True)
print("TOP 10:")
for entry in stats[:10]:
print("%s: %d" % (entry['name'], int(entry['time'])))
This prints:
TOP 10:
Jengas: 166
- k: 20
loli: 20
Woona: 20
MAD MARX: 0
Note that your time is neither an integer nor string: there are both 0 and "0" in the dataset. That's why you need the conversion int(...).
You can sort the list of dict values like:
Code:
top_three = [(x[1], -x[0]) for x in sorted(
(-int(user['time']), user['name']) for user in stats['stats'])][:3]
This works by taking the time and the name and building a tuple. The tuples can the be sorted, and then the names can be extracted (via: x[1]) after the sort.
Test Code:
stats = {
"stats": [{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
}, {
"name": "- k",
"time": 20,
"uid": "199295228664872961",
"id": 2
}, {
"name": "MAD MARX",
"time": "0",
"uid": "336539711785009153",
"id": 3
}, {
"name": "loli",
"time": 20,
"uid": "366299640976375818",
"id": 4
}, {
"name": "Woona",
"time": 20,
"uid": "246996981178695686",
"id": 5
}]
}
top_three = [x[1] for x in sorted(
(-int(user['time']), user['name']) for user in stats['stats'])][:3]
print(top_three)
Results:
[('Jengas', 166), ('- k', 20), ('Woona', 20)]
Here's a way to do it using the built-in sorted() function:
data = {
"stats": [
{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
},
{
etc ...
}
]
}
print('TOP 3')
sorted_by_time = sorted(data['stats'], key=lambda d: int(d['time']), reverse=True)
for i, d in enumerate(sorted_by_time, 1):
if i > 3: break
print('{name}: {time}'.format(**d))
Output:
TOP 3
Jengas: 166
- k: 20
loli: 20

Combine 2 JSON files into 1 file in Node or Python (i.e. longitude and latitude)

I want to append the longitude to a latitude stored in 2 separated json files
The result should be stored in a 3rd file
How can I do that on Python OR Javascript/Node?
Many thanks for your support,
LATITUDE
{
"tags": [{
"name": "LATITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 46.9976859318, 3],
[1123306774000, 46.9976859319, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
LONGITUDE
{
"tags": [{
"name": "LONGITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 36.9976859318, 3],
[1123306774000, 36.9976859317, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
Expected result: LATITUDE_AND_LONGITUDE
{
"tags": [{
"name": "LATITUDE_AND_LONGITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 46.9976859318, 36.9976859318, 3],
[1123306774000, 46.9976859319, 36.9976859317, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
I have written the solution with a colleague, find the source code on github: https://gist.github.com/Abdelkrim/715eb222cc318219196c8be293c233bf

converting list of dictionary to dictionary tree based on parent id

I want to make a list of dictionary that way, every element which has a parent id, it should be child of the parent element.
Let's say we have a python list, which contains multiple dictionaries.
[{
"id": 1,
"title": "node1",
"parent": null
},
{
"id": 2,
"title": "node2",
"parent": 1
},
{
"id": 3,
"title": "node3",
"parent": 1
},
{
"id": 4,
"title": "node4",
"parent": 2
},
{
"id": 5,
"title": "node5",
"parent": 2
}]
And I want to convert this list to tree based on parent key. like,
[{
'id':1,
'title':'node1',
'childs':[
{
'id':2,
'title':'node2'
'childs':[
{
'id':4,
'title':'node4',
'childs': []
},
{
'id':5,
'title':'node5',
'childs': []
}
]
},
{
'id':3,
'title':'node3'
'childs':[]
}
]
}]
data = [{
"id": 1,
"title": "node1",
"parent": "null"
},
{ "id": 2,
"title": "node2",
"parent": "null"
},
{
"id": 2,
"title": "node2",
"parent": 1
},
{
"id": 3,
"title": "node3",
"parent": 1
},
{
"id": 4,
"title": "node4",
"parent": 2
},
{
"id": 5,
"title": "node5",
"parent": 2
}]
parent_data=[]
for keys in data:
if keys['parent'] == "null":
keys['childs']=[]
parent_data.append(keys)
for keys in data:
for key in parent_data:
if key['id'] == keys['parent']:
key['childs'].append(keys)
print parent_data
k = [{
"id": 1,
"title": "node1",
"parent": "null"
},
{
"id": 2,
"title": "node2",
"parent": 1
},
{
"id": 3,
"title": "node3",
"parent": 1
},
{
"id": 4,
"title": "node4",
"parent": 2
},
{
"id": 5,
"title": "node5",
"parent": 2
}]
result, t = [], {}
for i in k:
i['childs'] = []
if i['parent'] == 'null':
del i['parent']
result.append(i)
t[1] = result[0]
else:
t[i['parent']]['childs'].append(i)
t[i['id']] = t[i['parent']]['childs'][-1]
del t[i['parent']]['childs'][-1]['parent']
print result

Categories