Sorting multidimensional JSON objects in python - python

I have an object as such and want to sort it by time (line first, point second) in each dimension (simplified json):
[{
"type":"point"
},
{
"type":"line",
"children": [
{
"type":"point"
},
{
"type":"point"
},
{
"type":"line"
}
]
},
{
"type":"point"
}]
This dimention could be deeper and have much more points/lines within each other.
The sorted output would be something like this:
[{
"type":"line",
"children": [
{
"type":"line"
},
{
"type":"point"
},
{
"type":"point"
}
]
},
{
"type":"point"
},
{
"type":"point"
}]
Thanks

You'd need to process this recursively:
from operator import itemgetter
def sortLinesPoints(data):
if isinstance(data, dict):
if 'children' in data:
sortLinesPoints(data['children'])
else:
for elem in data:
sortLinesPoints(elem)
data.sort(key=itemgetter('type'))

Related

Remove previous values and only return the recent values

I'm trying to loop over this list:
data = [
{
"date":"2022-09-02 08:00:00.000",
"endtime":"2022-09-02 12:00:00.000",
"timebilled":"4.00",
"projectno":"2479"
},
{
"date":"2022-09-02 13:00:00.000",
"endtime":"2022-09-02 16:00:00.000",
"timebilled":"3.00",
"projectno":"2696"
},
{
"date":"2022-09-02 16:00:00.000",
"endtime":"2022-09-02 17:00:00.000",
"timebilled":"1.00",
"projectno":"2479"
},
{
"date":"2022-09-01 08:00:00.000",
"endtime":"2022-09-01 10:00:00.000",
"timebilled":"2.00",
"projectno":"2696"
},
{
"date":"2022-09-01 10:00:00.000",
"endtime":"2022-09-01 12:00:00.000",
"timebilled":"2.00",
"projectno":"2479"
},
{
"date":"2022-09-01 13:00:00.000",
"endtime":"2022-09-01 17:00:00.000",
"timebilled":"4.00",
"projectno":"2479"
},
{
"date":"2022-08-31 08:00:00.000",
"endtime":"2022-08-31 16:00:00.000",
"timebilled":"8.00",
"projectno":"2489"
},
...
]
and condense the info to make it look like this:
projectHours = {
{ "2696":[
{
"2022-09-02 13:00:00.000":"3.00"
},
{
"2022-09-01 10:00:00.000":"2.00"
}
]
}, {
"2479":[
{
"2022-09-02 08:00:00.000":"4.00"
},
{
"2022-09-02 16:00:00.000":"1.00"
},
{
"2022-09-01 10:00:00.00":"2.00"
},
{
"2022-09-01 13:00:00.000":"4.00"
}
]
}, {
"2489":[
{
"2022-08-31 08:00:00.000":"8.00"
}
]
}
}
so far this is what I have:
info = {}
batch = {}
projectNums = ['2479', '2696', '2489']
for element in data:
for projNum in projectNums:
if element["projectno"] == projNum:
startDate = element["date"]
info = {
startDate: element["timebilled"]
}
batch.append(info)
project = {
projNum: batch
}
print(project)
this is my result:
{
"2696":[
{
"2022-09-02 13:00:00.000":"3.00"
}
]
}, {
"2696":[
{
"2022-09-02 13:00:00.000":"3.00"
}
{
"2022-09-01 08:00:00.000":"2.00"
}
]
}, {
"2489":[
{
"2022-08-31 08:00:00.000":"8.00"
}
]
}, {
"2479":[
{
"2022-09-02 08:00:00.000":"4.00"
}
]
}, {
"2479":[
{
"2022-09-02 08:00:00.000":"4.00"
},
{
"2022-09-02 16:00:00.000":"1.00"
}
]
}, {
"2479":[
{
"2022-09-02 08:00:00.000":"4.00"
},
{
"2022-09-02 16:00:00.000":"1.00"
},
{
"2022-09-01 08:00:00.000":"2.00"
}
]
}, {
"2479":[
{
"2022-09-02 08:00:00.000":"4.00"
},
{
"2022-09-02 16:00:00.000":"1.00"
},
{
"2022-09-01 08:00:00.000":"2.00"
},
{
"2022-09-01 13:00:00.000":"4.00"
}
]
}
I'd like to remove the previous key-value pairs and replace them with the most recent ones. How would I be able to do that? I've tried changing the type of collection but did not work for me. I'm pretty new to Python and any help would be appreciated.
Thank you!
I think you're not really using dicts properly, you're using them as structs. The most useful thing about them using the key to access them. You're hiding the key by putting them in lists, making it difficult to use effectively.
Instead of the projectHours structure you have, you might want something more like:
projectHours = {
"2696":{
"2022-09-02 13:00:00.000":"3.00",
"2022-09-01 10:00:00.000":"2.00"
},
"2479": {
"2022-09-02 08:00:00.000":"4.00",
"2022-09-02 16:00:00.000":"1.00",
"2022-09-01 10:00:00.00":"2.00",
"2022-09-01 13:00:00.000":"4.00"
],
"2489": {
"2022-08-31 08:00:00.000":"8.00"
}
}
Then your code would look like this:
info = {}
batch = {}
# Make this a set, not a list, because it makes the 'in' operation much faster.
projectNums = {'2479', '2696', '2489'}
for element in data:
projectno = element["projectno"]
# I think this is what you wanted:
if projectno in projectNums:
if projectno in batch:
# Update info for existing project.
info = batch[projectno]
else:
# New project, new empty info dict.
info = {}
startDate = element["date"]
info[startDate] = element["timebilled"]
# You should be able to print it out directly at the end.
print(batch)
I haven't run it, there may be typos, and I've made a few assumptions about what you want, but I hope the helps at least.

Sort an array by occurances mongodb

Is it possible to sort an array by occurrences?
For Example, given
{
"_id": {
"$oid": "60d20d342c7951852a21s53a"
},
"site": "www.xyz.ie",
"A": ["mary", "jamie", "john", "mary", "mary", "john"],
}
return
{
"_id": {
"$oid": "60d20d342c7951852a21s53a"
},
"site": "www.xyz.ie",
"A": ["mary", "jamie", "john", "mary", "mary", "john"],
"sorted_A" : ["mary","john","jamie"]
}
I am able to get it most of the way there but I cannot figure out how to join them all back together in an array.
I have been using an aggregation pipeline
Starting with $match to find the site I want
Then $unwind on with path: "$A"
Next $sortByCount on "$A"
???? I can't figure out how to group it all back together.
Here is the pipeline:
[
{
'$match': {
'site': 'www.xyz.ie'
}
}, {
'$unwind': {
'path': '$A'
}
}, {
'$sortByCount': '$A'
}, {
????
}
]
$group nu _id and A, get first site and count total elements
$sort by count in descending order
$group by only _id and get first site, and construct array of A
[
{ $match: { site: "www.xyz.ie" } },
{ $unwind: "$A" },
{
$group: {
_id: { _id: "$_id", A: "$A" },
site: { $first: "$site" },
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } },
{
$group: {
_id: "$_id._id",
site: { $first: "$site" },
A: { $push: "$_id.A" }
}
}
]
Playground

Filter MongoDB query to find documents only if a field in a list of objects is not empty

I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.
Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group
This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )

How to make pymongo aggregation with count all elements and grouping by one request

I have a collection with fields like this:
{
"_id":"5cf54857bbc85fd0ff5640ba",
"book_id":"5cf172220fb516f706d00591",
"tags":{
"person":[
{"start_match":209, "length_match":6, "word":"kimmel"}
],
"organization":[
{"start_match":107, "length_match":12, "word":"philadelphia"},
{"start_match":209, "length_match":13, "word":"kimmel center"}
],
"location":[
{"start_match":107, "length_match":12, "word":"philadelphia"}
]
},
"deleted":false
}
I want to collect the different words in the categories and count it.
So, the output should be like this:
{
"response": [
{
"tag": "location",
"tag_list": [
{
"count": 31,
"phrase": "philadelphia"
},
{
"count": 15,
"phrase": "usa"
}
]
},
{
"tag": "organization",
"tag_list": [ ... ]
},
{
"tag": "person",
"tag_list": [ ... ]
},
]
}
The pipeline like this works:
def pipeline_func(tag):
return [
{'$replaceRoot': {'newRoot': '$tags'}},
{'$unwind': '${}'.format(tag)},
{'$group': {'_id': '${}.word'.format(tag), 'count': {'$sum': 1}}},
{'$project': {'phrase': '$_id', 'count': 1, '_id': 0}},
{'$sort': {'count': -1}}
]
But it make a request for each tag. I want to know how to make it in one request.
Thank you for attention.
As noted, there is a slight mismatch in the question data to the current claimed pipeline process since $unwind can only be used on arrays and the tags as presented in the question is not an array.
For the data presented in the question you basically want a pipeline like this:
db.collection.aggregate([
{ "$addFields": {
"tags": { "$objectToArray": "$tags" }
}},
{ "$unwind": "$tags" },
{ "$unwind": "$tags.v" },
{ "$group": {
"_id": {
"tag": "$tags.k",
"phrase": "$tags.v.word"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.tag",
"tag_list": {
"$push": {
"count": "$count",
"phrase": "$_id.phrase"
}
}
}}
])
Again as per the note, since tags is in fact an object then what you actually need in order to collect data based on it's sub-keys as the question is asking, is to turn that essentially into an array of items.
The usage of $replaceRoot in your current pipeline would seem to indicate that $objectToArray is of fair use here, as it is available from later patch releases of MongoDB 3.4, being the bare minimal version you should be using in production right now.
That $objectToArray actually does pretty much what the name says and produces an array ( or "list" to be more pythonic ) of entries broken into key and value pairs. These are essentially a "list" of objects ( or "dict" entries ) which have the keys k and v respectively. The output of the first pipeline stage would look like this on the supplied document:
{
"book_id": "5cf172220fb516f706d00591",
"tags": [
{
"k": "person",
"v": [
{
"start_match": 209,
"length_match": 6,
"word": "kimmel"
}
]
}, {
"k": "organization",
"v": [
{
"start_match": 107,
"length_match": 12,
"word": "philadelphia"
}, {
"start_match": 209,
"length_match": 13,
"word": "kimmel center"
}
]
}, {
"k": "location",
"v": [
{
"start_match": 107,
"length_match": 12,
"word": "philadelphia"
}
]
}
],
"deleted" : false
}
So you should be able to see how you can now easily access those k values and use them in grouping, and of course the v is the standard array as well. So it's just the two $unwind stages as shown and then two $group stages. Being the first $group in order to collection over the combination of keys, and the second to collect as per the main grouping key whilst adding the other accumulations to a "list" within that entry.
Of course the output by the above listing is not exactly how you asked for in the question, but the data is basically there. You can optionally add an $addFields or $project stage to essentially rename the _id key as the final aggregation stage:
{ "$addFields": {
"_id": "$$REMOVE",
"tag": "$_id"
}}
Or simply do something pythonic with a little list comprehension on the cursor output:
cursor = db.collection.aggregate([
{ "$addFields": {
"tags": { "$objectToArray": "$tags" }
}},
{ "$unwind": "$tags" },
{ "$unwind": "$tags.v" },
{ "$group": {
"_id": {
"tag": "$tags.k",
"phrase": "$tags.v.word"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.tag",
"tag_list": {
"$push": {
"count": "$count",
"phrase": "$_id.phrase"
}
}
}}
])
output = [{ 'tag': doc['_id'], 'tag_list': doc['tag_list'] } for doc in cursor]
print({ 'response': output });
And final output as a "list" you can use for response:
{
"tag_list": [
{
"count": 1,
"phrase": "philadelphia"
}
],
"tag": "location"
},
{
"tag_list": [
{
"count": 1,
"phrase": "kimmel"
}
],
"tag": "person"
},
{
"tag_list": [
{
"count": 1,
"phrase": "kimmel center"
}, {
"count": 1,
"phrase": "philadelphia"
}
],
"tag": "organization"
}
Noting that using a list comprehension approach you have a bit more control over the order of "keys" as output, as MongoDB itself would simply append NEW key names in a projection keeping existing keys ordered first. If that sort of thing is important to you that is. Though it really should not be since all Object/Dict like structures should not be considered to have any set order of keys. That's what arrays ( or lists ) are for.

How to merge json objects containing arrays using python?

I have two json files which contain all kinds of levels of properties. I want to write a python script that will replace existing properties and add missing ones, but keep all the other ones in place.
In my attempts until now the entire "configurations" array of the original file is overwritten, including all properties. All examples I could find show merge for objects without arrays. Any help would be appreciated.
Original:
{
"configurations": [
{
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "EXISTING",
"this.wont.be.overwritten": "EXISTING"
}
}
}
],
"other-values-1": [
{
"components": [
{
"name": "EXISTING"
}
],
"name": "somename"
}
],
"other-values-2": {
"randomProperties": {
"type": "random"
},
"and_so_on": "you_get_the_point"
}
}
Additional data that should be added to original:
{
"configurations" : [
{
"this-would-be-added": {
"properties": {
"some-property": "ADDED"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "CHANGED",
"this.would.be.added": "ADDED"
}
}
}
]
}
Result is a merging of the two on the property level:
{
"configurations": [
{
"this-would-be-added": {
"properties": {
"some-property": "ADDED"
}
}
},
{
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "CHANGED",
"this.would.be.added": "ADDED"
"this.wont.be.overwritten": "EXISTING"
}
}
}
],
"other-values-1": [
{
"components": [
{
"name": "EXISTING"
}
],
"name": "somename"
}
],
"other-values-2": {
"randomProperties": {
"type": "random"
},
"and_so_on": "you_get_the_point"
}
}
Using funcy.merge:
from funcy import merge
x, y = map(lambda d: {hash(frozenset(c.keys())):c for c in d}, (a['configurations'], b['configurations']))
merged = list(merge(x, y).values())
print(json.dumps(merged, indent=4))
Result:
[
{
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "CHANGED",
"this.would.be.added": "ADDED"
}
}
},
{
"this-would-be-added": {
"properties": {
"some-property": "ADDED"
}
}
}
]
In the items of configurations in you sample data, looks like you are using items' only key as a unique key in the array. Therefore, we can convert the list into a dict by using that unique key.
That is turning
[{"ID_1": "VALUE_1"}, {"ID_2": "VALUE_2"}]
into {"ID_1": "VALUE_1", "ID_2": "VALUE_2"}
Then, we just want to merge those two dict. Here I use {**a, **b} to merge them. For this part, you can take a look at How to merge two dictionaries in a single expression?
So
{"ID_1": "value_1", "ID_2": "value_2"}
and
{"ID_2": "new_value_2", "ID_3": "new_value_3"}
would be merged as
{"ID_1": "value_1", "ID_2": "new_value_2", "ID_3": "new_value_3"}
Once they are merged, convert the result dict back into list and that's the final result.
[{"ID_1": "value_1"}, {"ID_2": "new_value_2"}, {"ID_3": "new_value_3"}]
Codes:
def list_to_dict(l):
return {list(item.keys())[0]: list(item.values())[0] for item in l}
def list_item_merge(a, b):
return [{k: v} for k, v in {**list_to_dict(a), **list_to_dict(b)}.items()]
list_item_merge(original['configurations'], additional['configurations'])
I would suggest reviewing your conf structure. A list of dicts with single key doesn't make sense to me. Why not just use a dict?:
{
"configurations": {
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
},
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "EXISTING",
"this.wont.be.overwritten": "EXISTING"
}
}
},
# ...
}
Then you can simply use:
from funcy import merge
conf = base
conf['configurations'] = merge(base['configurations'],
new['configurations'])

Categories