I have two json files which contain all kinds of levels of properties. I want to write a python script that will replace existing properties and add missing ones, but keep all the other ones in place.
In my attempts until now the entire "configurations" array of the original file is overwritten, including all properties. All examples I could find show merge for objects without arrays. Any help would be appreciated.
Original:
{
"configurations": [
{
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "EXISTING",
"this.wont.be.overwritten": "EXISTING"
}
}
}
],
"other-values-1": [
{
"components": [
{
"name": "EXISTING"
}
],
"name": "somename"
}
],
"other-values-2": {
"randomProperties": {
"type": "random"
},
"and_so_on": "you_get_the_point"
}
}
Additional data that should be added to original:
{
"configurations" : [
{
"this-would-be-added": {
"properties": {
"some-property": "ADDED"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "CHANGED",
"this.would.be.added": "ADDED"
}
}
}
]
}
Result is a merging of the two on the property level:
{
"configurations": [
{
"this-would-be-added": {
"properties": {
"some-property": "ADDED"
}
}
},
{
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "CHANGED",
"this.would.be.added": "ADDED"
"this.wont.be.overwritten": "EXISTING"
}
}
}
],
"other-values-1": [
{
"components": [
{
"name": "EXISTING"
}
],
"name": "somename"
}
],
"other-values-2": {
"randomProperties": {
"type": "random"
},
"and_so_on": "you_get_the_point"
}
}
Using funcy.merge:
from funcy import merge
x, y = map(lambda d: {hash(frozenset(c.keys())):c for c in d}, (a['configurations'], b['configurations']))
merged = list(merge(x, y).values())
print(json.dumps(merged, indent=4))
Result:
[
{
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
}
},
{
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "CHANGED",
"this.would.be.added": "ADDED"
}
}
},
{
"this-would-be-added": {
"properties": {
"some-property": "ADDED"
}
}
}
]
In the items of configurations in you sample data, looks like you are using items' only key as a unique key in the array. Therefore, we can convert the list into a dict by using that unique key.
That is turning
[{"ID_1": "VALUE_1"}, {"ID_2": "VALUE_2"}]
into {"ID_1": "VALUE_1", "ID_2": "VALUE_2"}
Then, we just want to merge those two dict. Here I use {**a, **b} to merge them. For this part, you can take a look at How to merge two dictionaries in a single expression?
So
{"ID_1": "value_1", "ID_2": "value_2"}
and
{"ID_2": "new_value_2", "ID_3": "new_value_3"}
would be merged as
{"ID_1": "value_1", "ID_2": "new_value_2", "ID_3": "new_value_3"}
Once they are merged, convert the result dict back into list and that's the final result.
[{"ID_1": "value_1"}, {"ID_2": "new_value_2"}, {"ID_3": "new_value_3"}]
Codes:
def list_to_dict(l):
return {list(item.keys())[0]: list(item.values())[0] for item in l}
def list_item_merge(a, b):
return [{k: v} for k, v in {**list_to_dict(a), **list_to_dict(b)}.items()]
list_item_merge(original['configurations'], additional['configurations'])
I would suggest reviewing your conf structure. A list of dicts with single key doesn't make sense to me. Why not just use a dict?:
{
"configurations": {
"this-needs-to-stay": {
"properties": {
"some_property": "EXISTING"
}
},
"this-needs-to-be-updated": {
"properties": {
"this.would.stay": "EXISTING",
"this.wont.be.overwritten": "EXISTING"
}
}
},
# ...
}
Then you can simply use:
from funcy import merge
conf = base
conf['configurations'] = merge(base['configurations'],
new['configurations'])
Related
I am needing to take a highly nested json file (i.e. Elasticsearch mapping for an index) and produce a list of items.
Example Elasticsearch Mapping:
{
"mappings": {
"properties": {
"class": {
"properties": {
"name": {
"properties": {
"firstname": {
"type": "text"
},
"lastname": {
"type": "text"
}
}
},
"age": {
"type": "text "
}
}
}
}
}
}
Example Desired Result:
["mappings.properties.class.properties.name.properties.firstname",
"mappings.properties.class.properties.name.properties.lastname",
"mappings.properties.class.properties.age"]
I pandas.json_normalize() doesn't quite do what I want. Neither does glom()
You should be able to make a fairly short recursive generator to do this. I'm assuming you want all the keys until you see a dict with type in it:
d = {
"mappings": {
"properties": {
"class": {
"properties": {
"name": {
"properties": {
"firstname": {
"type": "text"
},
"lastname": {
"type": "text"
}
}
},
"age": {
"type": "text "
}
}
}
}
}
}
def all_keys(d, path=None):
if path is None:
path = []
if not isinstance(d, dict) or 'type' in d:
yield '.'.join(path)
return
for k, v in d.items():
yield from all_keys(v, path + [k])
list(all_keys(d))
Which gives:
['mappings.properties.class.properties.name.properties.firstname',
'mappings.properties.class.properties.name.properties.lastname',
'mappings.properties.class.properties.age']
Is it possible to sort an array by occurrences?
For Example, given
{
"_id": {
"$oid": "60d20d342c7951852a21s53a"
},
"site": "www.xyz.ie",
"A": ["mary", "jamie", "john", "mary", "mary", "john"],
}
return
{
"_id": {
"$oid": "60d20d342c7951852a21s53a"
},
"site": "www.xyz.ie",
"A": ["mary", "jamie", "john", "mary", "mary", "john"],
"sorted_A" : ["mary","john","jamie"]
}
I am able to get it most of the way there but I cannot figure out how to join them all back together in an array.
I have been using an aggregation pipeline
Starting with $match to find the site I want
Then $unwind on with path: "$A"
Next $sortByCount on "$A"
???? I can't figure out how to group it all back together.
Here is the pipeline:
[
{
'$match': {
'site': 'www.xyz.ie'
}
}, {
'$unwind': {
'path': '$A'
}
}, {
'$sortByCount': '$A'
}, {
????
}
]
$group nu _id and A, get first site and count total elements
$sort by count in descending order
$group by only _id and get first site, and construct array of A
[
{ $match: { site: "www.xyz.ie" } },
{ $unwind: "$A" },
{
$group: {
_id: { _id: "$_id", A: "$A" },
site: { $first: "$site" },
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } },
{
$group: {
_id: "$_id._id",
site: { $first: "$site" },
A: { $push: "$_id.A" }
}
}
]
Playground
I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.
Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group
This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )
I have a elastic search index collection like below,
"_index":"test",
"_type":"abc",
"_source":{
"file_name":"xyz.ex"
"metadata":{
"format":".ex"
"profile":[
{"date_value" : "2018-05-30T00:00:00",
"key_id" : "1",
"type" : "date",
"value" : [ "30-05-2018" ]
},
{
"key_id" : "2",
"type" : "freetext",
"value" : [ "New york" ]
}
}
Now I need to search for document by matching key_id to its value. (key_id is some field whose value is stored in "value")
Ex. For key_id='1'field, if it's value = "30-05-2018" it should match the above document.
I tried mapping this as a nested object, But I am not able to write query to search with 2 or more key_id matching its respective value.
This is how I would do it. You need to AND together via bool/filter (or bool/must) two nested queries for each of the condition pair, since you want to match two different nested elements from the same parent document.
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "metadata.profile",
"query": {
"bool": {
"filter": [
{
"term": {
"metadata.profile.f1": "a"
}
},
{
"term": {
"metadata.profile.f2": true
}
}
]
}
}
}
},
{
"nested": {
"path": "metadata.profile",
"query": {
"bool": {
"filter": [
{
"term": {
"metadata.profile.f1": "b"
}
},
{
"term": {
"metadata.profile.f2": false
}
}
]
}
}
}
}
]
}
}
}
I have an object as such and want to sort it by time (line first, point second) in each dimension (simplified json):
[{
"type":"point"
},
{
"type":"line",
"children": [
{
"type":"point"
},
{
"type":"point"
},
{
"type":"line"
}
]
},
{
"type":"point"
}]
This dimention could be deeper and have much more points/lines within each other.
The sorted output would be something like this:
[{
"type":"line",
"children": [
{
"type":"line"
},
{
"type":"point"
},
{
"type":"point"
}
]
},
{
"type":"point"
},
{
"type":"point"
}]
Thanks
You'd need to process this recursively:
from operator import itemgetter
def sortLinesPoints(data):
if isinstance(data, dict):
if 'children' in data:
sortLinesPoints(data['children'])
else:
for elem in data:
sortLinesPoints(elem)
data.sort(key=itemgetter('type'))