I'm trying to merge 2 json files in Python. Here are the files:
test1.json
{
"version": "1.0",
"data": {
"admin1": {
"id": "1",
"location": "NY"
},
"admin2": {
"id": "2",
"name": "Bob",
"location": "LA",
"admin_key": {
"adminvalue1": "admin1",
"adminvalue2": "admin2"
}
},
"admin3": {
"name": "john"
}
}
}
test2.json
{
"data": {
"user1": {
"name": "jane",
"phone": "555-666-7777",
"enail": "jane#jane.com"
},
"user2": {
"location": "LA",
"id": "5"
},
"user3": {
"description": "user",
"location": "NY",
"name": "zoe",
"phone": "111-222-3333",
"user_key": {
"uservalue1": "user1",
"uservalue2": "user2"
}
}
}
}
I have this code to merge the two files
import json
with open("test1.json", "r") as data1_file:
data1 = json.load(data1_file)
with open("test2.json", "r") as data2_file:
data2 = json.load(data2_file)
data1.update(data2)
with open("out.json", "w") as out_file:
json.dump(data1, out_file, indent=4)
The output I'm getting is this. It only has test2.json contents under "data".
{
"version": "1.0",
"data": {
"user1": {
"name": "jane",
"phone": "555-666-7777",
"enail": "jane#jane.com"
},
"user2": {
"location": "LA",
"id": "5"
},
"user3": {
"description": "user",
"location": "NY",
"name": "zoe",
"phone": "111-222-3333",
"user_key": {
"uservalue1": "user1",
"uservalue2": "user2"
}
}
}
}
I want the output to have contents of both files under "data" like below
{
"version": "1.0",
"data": {
"admin1": {
"id": "1",
"location": "NY"
},
"admin2": {
"id": "2",
"name": "Bob",
"location": "LA",
"admin_key": {
"adminvalue1": "admin1",
"adminvalue2": "admin2"
}
},
"admin3": {
"name": "john"
},
"user1": {
"name": "jane",
"phone": "555-666-7777",
"enail": "jane#jane.com"
},
"user2": {
"location": "LA",
"id": "5"
},
"user3": {
"description": "user",
"location": "NY",
"name": "zoe",
"phone": "111-222-3333",
"user_key": {
"uservalue1": "user1",
"uservalue2": "user2"
}
}
}
}
How can I achieve this? Thanks!
You need to merge the "sub-dictionary" data1['data'], not data1 itself. In the current code, you are updating data1 with data2, so that data2['data'] overwrites data1['data'].
So replace data1.update(data2) with:
data1['data'].update(data2['data'])
I think this is what you are looking for:
https://stackoverflow.com/a/7205107/8786297
def merge(a, b, path=None):
"merges b into a"
if path is None: path = []
for key in b:
if key in a:
if isinstance(a[key], dict) and isinstance(b[key], dict):
merge(a[key], b[key], path + [str(key)])
elif a[key] == b[key]:
pass # same leaf value
else:
raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
else:
a[key] = b[key]
return a
I have a dictionary returned by a response in AWS Lambdas which looks like below:
data = {
"id": 1,
"name": "Hey",
"category": null,
"salary": 1222,
"Roles": [
{
"attributes1": {
"key": "sum_assured",
"value": 192000
},
"attributes2": {
"key": "cash_value",
"value": 48576
},
"attributes3": {
"key": "expiry_date",
"value": null
}
}
]
}
Now, I want to format a few numeric columns like salary to 1222.00 and convert other non-string columns to strings for "ID" I should get "1" as a string and not an Int. Here, null is the actual NaN returned by the response.
Below is the code which I tried:
new_dict = {}
for i, j in data.items():
print(i)
if (i == "salary"):
new_dict[i] = "%0.2f" % j
if isinstance(j, str):
new_dict[i] = j
elif j is None:
new_dict[i] = ""
else:
new_dict[i] = str(j)
I know that I could maintain a list for such amount related columns and then I can do some kind of comparisons and replace it. When I test the value for salary inside the loop it gives me decimal places but in the response, it doesn't show up.
Output:
{
"id": "1",
"name": "Hey",
"category": "",
"salary": "1222",
"Roles": "[{'attributes4': {'key': 'sum_assured', 'value': 192000}, 'attributes5': {'key': 'cash_value', 'value': 48576}, 'attributes6': {'key': 'expiry_date', 'value': 'null'}}]"
}
Expected:
{
"id": "1",
"name": "Hey",
"category": "",
"salary": "1222",
"Roles": "[{"attributes4": {"key": "sum_assured", "value": "192000.00"}, "attributes5": {"key": "cash_value", "value": "48576.00"}, "attributes6": {"key": "expiry_date", "value": ""}}]"
}
Also, how could I format the records of the Roles array with some kind of indent in it? I wasn't able to do that.
Update 1:
I tried the below code for accessing the nested attributes as per my JSON but it gives me a key error:
for i, j in formatted_json.items():
print(type(j))
if isinstance(j, list):
for k in j:
for key, value in k.items():
for l, m in value.items():
if (m == "salary" or m == 'value'):
new_dict[key][l] = "%0.2f" % m
elif isinstance(m, str):
new_dict[key][l] = m
elif m is None:
new_dict[key][l] = ""
else:
new_dict[key][l] = str(m)
elif (i == "salary" or i == 'value'):
new_dict[i] = "%0.2f" % j
elif isinstance(j, str):
new_dict[i] = j
elif j is None:
new_dict[i] = ""
else:
new_dict[i] = str(j)
Traceback (most recent call last):
File "/home/.../Downloads/pyspark/src/sample/json_to_string.py", line 392, in <module>
new_dict[key][l] = m
KeyError: 'attributes4'
Update 2:
The actual dataset is as below:
json_data = '''{
"policy_no": "",
"paid_to_date": "2016-09-28",
"proposal_no": null,
"agent_code": "1234",
"policy_status": "Death",
"policy_term": "15",
"policy_premium_payment_term": "7",
"policy_effective_date": "2013-09-28",
"last_premium_paid": 13790.85,
"last_premium_paid_date": "2016-06-17",
"pivc_flag_status": null,
"product_code": "",
"product_name": "Monthly Income Plan",
"application_date": "2013-09-16",
"underwriting_decesion_date": "2013-09-26",
"issuance_date": "2013-09-30",
"ols_policy": "N",
"next_premium_due_date": "2016-09-28",
"policy_year": null,
"premium_holiday_from_date": null,
"premium_holiday_to_date": null,
"product_type": "Non-Linked",
"proposer_client_code": "",
"life_insured_client_code": "",
"ppt_completion_date": "2020-09-28",
"ppt_left": 4,
"auto_termination_date": null,
"fund_value": null,
"loan_amt": 0,
"welcome_kit_current_status": null,
"dispatch_date": "2013-10-25",
"status_date": "2013-10-25",
"updated_on": "2013-10-26 19:51:51.983",
"surrender_value": null,
"loan_eligiability": null,
"courier_name": null,
"awb_no": "",
"status_id": 5,
"claim_registration_date": "2021-12-15",
"claim_approval_reject_date": "2021-12-15",
"claim_requirement_document": "HEART ATTACK",
"claim_requirement_status": "P",
"requirement_raised_on": "2021-12-15",
"requirement_closed_on": "2021-12-15",
"claim_status": "P",
"status_for_death_claims": "P",
"date_of_transaction": "2021-12-15",
"ecs_si_registration_date": null,
"ecs_si_status": "inactive",
"ecs_si_failure_reasons": null,
"ecs_si_status_as_per_rls": null,
"ecs_day": "0",
"payment_mode": "S",
"payment_mode_desc": "Semi Anually",
"payment_method": "S",
"payment_method_desc": "Cash/Cheque",
"payment_date": "2021-12-15",
"payment_amount": 13790.85,
"premium_base_amt": null,
"premium_tax_amt": null,
"due_premium": null,
"total_premium": 13790.85,
"advance_premium": null,
"assignee": [
{
"assignee_name": "",
"attributes1": {
"key": "assignee_client_id",
"value": ""
},
"attributes2": {
"key": "assignee_type",
"value": ""
}
}
],
"agents": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"agent_name": "",
"attributes2": {
"key": "agent_code",
"value": "1234"
},
"attributes3": {
"key": "email",
"value": ""
},
"attributes4": {
"key": "channel",
"value": "AGENCY"
},
"attributes5": {
"key": "subchannel",
"value": null
},
"attributes6": {
"key": "branch_name",
"value": ""
},
"attributes7": {
"key": "contact_no",
"value": ""
},
"attributes8": {
"key": "status",
"value": "T"
}
}
],
"bank_accounts": {
"attributes1": {
"key": "accountnumber",
"value": ""
},
"attributes2": {
"key": "accountholdername",
"value": ""
},
"attributes3": {
"key": "ifsccode",
"value": ""
},
"attributes4": {
"key": "micrcode",
"value": ""
},
"bankname": "",
"attributes5": {
"key": "branchname",
"value": ""
},
"attributes6": {
"key": "bank_address",
"value": ""
}
},
"Funds": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "allocation_mode",
"value": "percentage"
},
"attributes3": {
"key": "allocation_value",
"value": null
},
"attributes4": {
"key": "fund_code",
"value": null
},
"fund_name": null,
"investment_strategy_name": null
}
],
"Investment_Strategies": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "allocation_mode",
"value": "percentage"
},
"attributes3": {
"key": "allocation_value",
"value": null
},
"attributes4": {
"key": "fund_code",
"value": null
},
"fund_name": null,
"investment_strategy_name": null
}
],
"Roles": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "beneficiary_percentage",
"value": "100.00"
},
"attributes3": {
"key": "relationship",
"value": "SPOUSE"
},
"role_name": "Nominee",
"attributes4": {
"key": "age",
"value": "NaN"
},
"attributes5": {
"key": "party_name",
"value": ""
},
"attributes6": {
"key": "gender",
"value": "F"
},
"attributes7": {
"key": "date_of_birth",
"value": "1953-03-20"
},
"attributes8": {
"key": "email",
"value": null
},
"attributes9": {
"key": "registred_mobile",
"value": null
}
},
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "beneficiary_percentage",
"value": null
},
"attributes3": {
"key": "relationship",
"value": null
},
"role_name": "Insured",
"attributes4": {
"key": "age",
"value": 63
},
"attributes5": {
"key": "party_name",
"value": ""
},
"attributes6": {
"key": "gender",
"value": "M"
},
"attributes7": {
"key": "date_of_birth",
"value": "1950-06-20"
},
"attributes8": {
"key": "email",
"value": null
},
"attributes9": {
"key": "registred_mobile",
"value": null
}
},
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"attributes2": {
"key": "beneficiary_percentage",
"value": null
},
"attributes3": {
"key": "relationship",
"value": null
},
"role_name": "Owner",
"attributes4": {
"key": "age",
"value": 63
},
"attributes5": {
"key": "party_name",
"value": ""
},
"attributes6": {
"key": "gender",
"value": "M"
},
"attributes7": {
"key": "date_of_birth",
"value": "1950-06-20"
},
"attributes8": {
"key": "email",
"value": ""
},
"attributes9": {
"key": "registred_mobile",
"value": ""
}
}
],
"Benefits": [
{
"attributes1": {
"key": "policy_no",
"value": ""
},
"benefit_name": "Miii",
"attributes2": {
"key": "benefit_term",
"value": "7"
},
"attributes3": {
"key": "modal_premium",
"value": null
},
"attributes4": {
"key": "sum_assured",
"value": 192000
},
"attributes5": {
"key": "cash_value",
"value": 48576
},
"attributes6": {
"key": "expiry_date",
"value": null
},
"attributes7": {
"key": "status",
"value": "Death"
},
"attributes8": {
"key": "benefit_code",
"value": ""
},
"attributes9": {
"key": "benefit_type",
"value": "base"
}
}
]
}
'''
When you have a strong model in your data, and you want to process/validate the data you receive, its better to be done in a object oriented fashion. If you can consider your data value as an object with all logic written inside for validating or correcting, it becomes simple.
See an example approach below:
(Assumed your null as None)
import json
# Your data object..
data = '''{
"id": 1,
"name": "Hey",
"category": null,
"salary": 1222,
"Roles": [
{
"attributes1": {
"key": "sum_assured",
"value": 192000
},
"attributes2": {
"key": "cash_value",
"value": 48576
},
"attributes3": {
"key": "expiry_date",
"value": null
}
}
]
}'''
# Create simple object models to represent the data.
class Role:
def __init__(self, key, value):
self.key = key
self.value = value or "" # fallback to empty string
class Policy:
def __init__(self, id, name, category, salary, Roles):
# write all your pre/post processing here
self.id = id
self.name = name
self.category = category or ""
self.salary = format(salary, '0.2f') if salary else ""
self.Roles = [Role(**role) for role in Roles[0].values()] if Roles else []
# a json serialization helper function
def serial_default(obj):
if isinstance(obj, (Policy, Role)):
return obj.__dict__
else:
return obj
# You just create the object with your data. Now all data is in the object.
data = json.loads(data)
p = Policy(**data)
# Serialize the object to json in the proper format
print(json.dumps(p, default=serial_default))
The print generates a valid version of your object as JSON like
{"id": 1, "name": "Hey", "category": "", "salary": "1222.00", "Roles": [{"key": "sum_assured", "value": 192000}, {"key": "cash_value", "value": 48576}, {"key": "expiry_date", "value": ""}]}
This is just an example to the approach. You can modify or improvise a similar approach and make things simpler.
Firstly i have to agree with Kris answer, this kind of conversion looks now a bit ugly and with every added special case Object-Oriented-style makes more sense.
A few points:
null, None, ""
It seems that null won't be seen as None, so your not getting into the if clause.
Check JSON to Dict for that
How can JSON data with null value be converted to a dictionary
Salary
According to your question you would expect
"salary": "1222.00"
correct?
If so you are currently overwriting your previously formatted number with
if isinstance(j, str):
new_dict[i] = j
Just change it to elif so you wont overwrite your conversion.
Roles
The objects within the array of Roles are currently not iterated over. Hence the number are not being formatted. For that you would need another loop on the Roles array.
Following a recursion attempt to cover multiple nested collections.
import json
from collections import abc
dataRaw = '''
{
"id": 1,
"name": "Hey",
"category": null,
"salary": 1222,
"Roles": [
{
"attributes1": {
"key": "sum_assured",
"value": 192000
},
"attributes2": {
"key": "cash_value",
"value": 48576
},
"attributes3": {
"key": "expiry_date",
"value": null
}
}
]
}
'''
data = json.loads(dataRaw)
def processNested(nested):
# Check for list
if (isinstance(nested, list)):
# for each element in list processNested
for element in nested:
element = processNested(element)
else:
for key, value in nested.items():
# If nested Collection
if isinstance(value, abc.Mapping):
# Allow special "treatment" for {"key": "...", "value": ... } like objects
if "key" in value and "value" in value:
if ((value["key"] == "cash_value" or value["key"] == "sum_assured") and value["value"] != "" and
value["value"] != None):
nested[key]["value"] = str("%0.2f" % int(value["value"]))
else:
nested[key] = processNested(value)
else:
if (key == "value" and value != None and value != ""):
nested[key] = str(nested["value"])
return nested
new_dict = data
for i, j in new_dict.items():
if (i == "salary"):
new_dict[i] = "%0.2f" % j
elif isinstance(j, str):
new_dict[i] = j
elif j is None:
new_dict[i] = ""
elif i == "Roles":
new_dict[i] = processNested(j)
else:
new_dict[i] = str(j)
print(new_dict)
I have a dictionary which is contain multiple keys and values and the values also contain the key, value pair. I am not getting how to create dynamic json using this dictionary in python. Here's the dictionary:
image_dict = {"IMAGE_1":{"img0":"IMAGE_2","img1":"IMAGE_3","img2":"IMAGE_4"},"IMAGE_2":{"img0":"IMAGE_1", "img1" : "IMAGE_3"},"IMAGE_3":{"img0":"IMAGE_1", "img1":"IMAGE_2"},"IMAGE_4":{"img0":"IMAGE_1"}}
My expected result like this :
{
"data": [
{
"image": {
"imageId": {
"id": "IMAGE_1"
},
"link": {
"target": {
"id": "IMAGE_2"
},
"target": {
"id": "IMAGE_3"
},
"target": {
"id": "IMAGE_4"
}
}
},
"updateData": "link"
},
{
"image": {
"imageId": {
"id": "IMAGE_2"
},
"link": {
"target": {
"id": "IMAGE_1"
},
"target": {
"id": "IMAGE_3"
}
}
},
"updateData": "link"
},
{
"image": {
"imageId": {
"id": "IMAGE_3"
},
"link": {
"target": {
"id": "IMAGE_1"
},
"target": {
"id": "IMAGE_2"
}
}
},
"updateData": "link"
} ,
{
"image": {
"imageId": {
"id": "IMAGE_4"
},
"link": {
"target": {
"id": "IMAGE_1"
}
}
},
"updateData": "link"
}
]
}
I tried to solve it but I didn't get expected result.
result = {"data":[]}
for k,v in sorted(image_dict.items()):
for a in sorted(v.values()):
result["data"].append({"image":{"imageId":{"id": k},
"link":{"target":{"id": a}}},"updateData": "link"})
print(json.dumps(result, indent=4))
In Python dictionaries you can't have 2 values with the same key. So you can't have multiple targets all called "target". So you can index them. Also I don't know what this question has to do with dynamic objects but here's the code I got working:
import re
dict_res = {}
ind = 0
for image in image_dict:
lin_ind = 0
sub_dict = {'image' + str(ind): {'imageId': {image}, 'link': {}}}
for sub in image_dict[image].values():
sub_dict['image' + str(ind)]['link'].update({'target' + str(lin_ind): {'id': sub}})
lin_ind += 1
dict_res.update(sub_dict)
ind += 1
dict_res = re.sub('target\d', 'target', re.sub('image\d', 'image', str(dict_res)))
print dict_res
I've been trying to figure out how I can iterate over a json like object, so I could get a user id by its name.
json
{
"ApiSearchResult": [
{
"totalNumberResults": 55,
"type": "User",
"searchResults": [
{
"firstName": "shashank",
"name": "0o_shashank._o0",
"uid": 81097836
},
{
"firstName": "Shahnawaz",
"name": "0shahnawaz.0",
"uid": 83697589
},
{
"firstName": "Ashu",
"name": "ashu.-3",
"uid": 83646061
},
{
"bgImage": "photoalbum_491396460_user82597906-1-jpeg.jpg",
"firstName": "Garfield",
"name": "beast.boy",
"uid": 82597906
},
{
"firstName": "Bharath",
"name": "bharath_mohan69",
"uid": 80197615
},
{
"bgImage": "photoalbum_481041410_user79819261-1-jpg.jpg",
"firstName": "Wille-ICE",
"name": "blowhole",
"uid": 79819261
}
]
}
]
}
Python
def getidbyname(name):
event = response['ApiSearchResult'][0]['searchResults'][0]
for key, value in event.iteritems():
if value == name: continue
elif key == "uid":
return value
But, this won't work, I've never really worked with this many nested elements.
def getidbyname(name):
for i in data['ApiSearchResult'][0]['searchResults']:
if i['name'] == name:
return i['uid']
This might work if your response is already a python dictionary:
def getidbyname(name):
for event in data["ApiSearchResult"][0]["searchResults"]:
if event["name"] == name:
return event["uid"]
If your input is a text value, you need to use json.loads(response) to get a python dictionary out of it.
I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.
json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']
You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'
json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']