Prompt me, please, how to normalize this JSON file using Python?
This question is related to the previous
The current JSON contains:
{
"total_stats": [
{
"domain": "domain.com",
"uptime": "100"
},
{
"domain": "domain.com",
"threats": "345.01111783804436"
}
]
}
Desirable
{
"total_stats": [
{
"domain": "domain.com",
"uptime": "100",
"threats": "345.01111783804436"
}
]
}
If you want to merge the dictionaries according the "domain" key you can use (note: if dictionaries have common keys, the last dictionary value will be used):
dct = {
"total_stats": [
{"domain": "domain.com", "uptime": "100"},
{"domain": "domain.com", "threats": "345.01111783804436"},
]
}
out = {}
for d in dct["total_stats"]:
out.setdefault(d["domain"], {}).update(d)
dct["total_stats"] = list(out.values())
print(dct)
Prints:
{
"total_stats": [
{
"domain": "domain.com",
"uptime": "100",
"threats": "345.01111783804436",
}
]
}
Related
I have a collection named 'attendance' that has an array:
[
{
"faculty": "20XX-XXXXX-XX-1",
"sections": [
{
"section": "XXXX 3-1",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
},
{
"section": "XXXX 3-2",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
}
]
}
]
I have been trying to query the values of the specific element in my array using $and and $elemMatch in:
db.attendance.find({$and:[{faculty:"20XX-XXXXX-XX-1"},{sections:{$elemMatch:{section:"XXXX 3-1",date:"04-11-2022"}}}]});
But it still prints the other section rather than one. I want to output to be:
{
"faculty": "20XX-XXXXX-XX-1",
"sections": [
{
"section": "XXXX 3-1",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
}
And I tried using the dot notation like:
db.attendance.find({"sections.section":"XXXX 3-1", "sections.date":"04-11-2022});
Still no luck. I'm not sure if what I'm doing is right or not. Thanks in advance!
Option 1: find/elemMatch-> You will need to add the $elemMatch also to the project section of the find query as follow:
db.collection.find({
"faculty": "20XX-XXXXX-XX-1",
sections: {
$elemMatch: {
section: "XXXX 3-1",
date: "04-11-2022"
}
}
},
{
sections: {
$elemMatch: {
section: "XXXX 3-1",
date: "04-11-2022"
}
}
})
Explained:
Find query has the following syntax:db.collection.find({query},{project})
Adding the project section allow you to filter the expected output.
playground option 1
Option 2: Via aggregation/$filter:
db.collection.aggregate([
{
"$addFields": {
"sections": {
"$filter": {
"input": "$sections",
"as": "s",
"cond": {
$and: [
{
$eq: [
"$$s.section",
"XXXX 3-1"
]
},
{
$eq: [
"$$s.date",
"04-11-2022"
]
}
]
}
}
}
}
}
])
Explaned:
Replace the original sections array with new ones where the array elements are filtered based on the provided criteria.
playground option 2
(Python beginner alert) I am trying to create a custom JSON from an existing JSON. The scenario is - I have a source which can send many set of fields but I want to cherry pick some of them and create a subset of that while maintaining the original JSON structure. Original Sample
{
"Response": {
"rCode": "11111",
"rDesc": "SUCCESS",
"pData": {
"code": "123-abc-456-xyz",
"sData": [
{
"receiptTime": "2014-03-02T00:00:00.000",
"sessionDate": "2014-02-28",
"dID": {
"d": {
"serialNo": "3432423423",
"dType": "11111",
"dTypeDesc": "123123sd"
},
"mode": "xyz"
},
"usage": {
"duration": "661",
"mOn": [
"2014-02-28_20:25:00",
"2014-02-28_22:58:00"
],
"mOff": [
"2014-02-28_21:36:00",
"2014-03-01_03:39:00"
]
},
"set": {
"abx": "1",
"ayx": "1",
"pal": "1"
},
"rEvents": {
"john": "doe",
"lorem": "ipsum"
}
},
{
"receiptTime": "2014-04-02T00:00:00.000",
"sessionDate": "2014-04-28",
"dID": {
"d": {
"serialNo": "123123",
"dType": "11111",
"dTypeDesc": "123123sd"
},
"mode": "xyz"
},
"usage": {
"duration": "123",
"mOn": [
"2014-04-28_20:25:00",
"2014-04-28_22:58:00"
],
"mOff": [
"2014-04-28_21:36:00",
"2014-04-01_03:39:00"
]
},
"set": {
"abx": "4",
"ayx": "3",
"pal": "1"
},
"rEvents": {
"john": "doe",
"lorem": "ipsum"
}
}
]
}
}
}
Here the sData array tag has got few tags out of which I want to keep only 24 and get rid of the rest. I know I could use element.pop() but I cannot go and delete a new incoming field every time the source publishes it. Below is the expected output -
Expected Output
{
"Response": {
"rCode": "11111",
"rDesc": "SUCCESS",
"pData": {
"code": "123-abc-456-xyz",
"sData": [
{
"receiptTime": "2014-03-02T00:00:00.000",
"sessionDate": "2014-02-28",
"usage": {
"duration": "661",
"mOn": [
"2014-02-28_20:25:00",
"2014-02-28_22:58:00"
],
"mOff": [
"2014-02-28_21:36:00",
"2014-03-01_03:39:00"
]
},
"set": {
"abx": "1",
"ayx": "1",
"pal": "1"
}
},
{
"receiptTime": "2014-04-02T00:00:00.000",
"sessionDate": "2014-04-28",
"usage": {
"duration": "123",
"mOn": [
"2014-04-28_20:25:00",
"2014-04-28_22:58:00"
],
"mOff": [
"2014-04-28_21:36:00",
"2014-04-01_03:39:00"
]
},
"set": {
"abx": "4",
"ayx": "3",
"pal": "1"
}
}
]
}
}
}
I myself took reference from How can I create a new JSON object form another using Python? but its not working as expected. Looking forward for inputs/solutions from all of you gurus. Thanks in advance.
Kind of like this:
data = json.load(open("fullset.json"))
def subset(d):
newd = {}
for name in ('receiptTime','sessionData','usage','set'):
newd[name] = d[name]
return newd
data['Response']['pData']['sData'] = [subset(d) for d in data['Response']['pData']['sData']]
json.dump(data, open('newdata.json','w'))
I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.
Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group
This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )
I am trying to customize json data using object_hook in Python 3, but do not know how to get started. Any pointers are much appreciated. I am trying to introduce a new key and move existing data into the new key in Python Object.
I am trying to convert below json text:
{
"output": [
{
"Id": "101",
"purpose": "xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "ef gh ij"
}
]
},
{
"Id": "102",
"purpose": "11xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "java"
},
{
"data": "ef gh ij"
}
]
}
]
}
to
{
"output": [
{
"Id": "101",
"mydata": {
"purpose": "xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "ef gh ij"
}
]
}
},
{
"Id": "102",
"mydata": {
"purpose": "11xyz text",
"array": [
{
"data": "abcd"
},
{
"data": "java"
},
{
"data": "ef gh ij"
}
]
}
}
]
}
My Python JSON object hook is defined as:
class JSONObject:
def __init__( self, dict ):
vars(self).update( dict )
def toJSON(self):
return json.dumps(self, default=lambda o: o.__dict__,
sort_keys=True, indent=4)
You can specify a custom object_pairs_hook (input_json is the string with your input JSON).
def mydata_hook(obj):
obj_d = dict(obj)
if 'Id' in obj_d:
return {'Id': obj_d['Id'], 'mydata': {k: v for k, v in obj_d.items() if 'Id' not in k}}
else:
return obj_d
print(json.dumps(json.loads(input_json, object_pairs_hook=mydata_hook), indent=2))
And the output:
{
"output": [
{
"mydata": {
"array": [
{
"data": "abcd"
},
{
"data": "ef gh ij"
}
],
"purpose": "xyz text"
},
"Id": "101"
},
{
"mydata": {
"array": [
{
"data": "abcd"
},
{
"data": "java"
},
{
"data": "ef gh ij"
}
],
"purpose": "11xyz text"
},
"Id": "102"
}
]
}
I have this json object, and I am curious how to iterate through servicecatalog:name and alert for any name that does not equal "service-foo" or "service-bar".
Here is my json object:
{
"access": {
"serviceCatalog": [
{
"endpoints": [
{
"internalURL": "https://snet-storage101.example.com//v1.0",
"publicURL": "https://storage101.example.com//v1.0",
"region": "LON",
"tenantId": "1
},
{
"internalURL": "https://snet-storage101.example.com//v1.0",
"publicURL": "https://storage101.example.com//v1.0",
"region": "USA",
"tenantId": "1
}
],
"name": "service-foo",
"type": "object-store"
},
{
"endpoints": [
{
"publicURL": "https://x.example.com:9384/v1.0/x",
"tenantId": "6y5t4re32"
}
],
"name": "service-bar",
"type": "rax:test"
},
{
"endpoints": [
{
"publicURL": "https://y.example.com:9384/v1.0/x",
"tenantId": "765432"
}
],
"name": "service-thesystem",
"type": "rax:test"
}
]
}
If x is the above mentioned dictionary. You could do
for item in x["access"]["serviceCatalog"]:
if item["name"] not in ["service-foo", "service-bar"]:
print(item["name"])
ps: you could use json.loads() to decode json data if you are asking for that. And also you have errors in your JSON.