Python - Replace value in JSON file from second file if keys match

Python - Replace value in JSON file from second file if keys match - python

I have two JSON files that look like this
{"type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { **"id"**: "Carlow", **"density"**: "0" } , "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -6.58901, 52.906464 ], [ -6.570265, 52.905682 ], [ -6.556207, 52.906464 ],
Second JSON file
{"features": [{"**count**": 2, "name": "**Sligo**"}, {"count": 3"name":"Fermanagh"},{"count": 1, "name": "Laois"},
I am trying to check if "id" in the first file matches with "name" in the second file and if so change the value for "density" to the value for "count" from the second file. I am looking at using recursion from a similar question I found here Replace value in JSON file for key which can be nested by n levels but it only checks if one key matches and changes value. I need two keys to match before changing values. This is the code I have used so far but not sure how to add two keys and two values. I use Counter to count the number of times a string appears and save it to county_names.json, which is my second JSON file. ire_countiesTmp.json is my first file that I am trying to replace the values with from the second file. Im not sure how to do this with Python as only started learning it. Any help would be great, or if you know a better way. Thanks
import json, pprint
from collections import Counter
with open('../county_names.json') as data_file:
county_list = json.load(data_file)
for i in county_list:
c = Counter(i for i in county_list)
for county,count in c.iteritems():
with open('ire_countiesTmp.json') as f:
def fixup(adict, k1, v1, k2, v2):
for key in adict.keys():
if adict[key] == v1:
adict[key] = v
elif type(adict[key]) is dict:
fixup(adict[key], k, v)
#pprint.pprint( data )
fixup(data, 'id', county, 'density', count)
pprint.pprint( data )

Generally speaking, recursion is not a good idea in Python. The compiler/interpreter does not handle it well and it becomes terribly slow, as there is no tail recursion optimisation: Why is recursion in python so slow? .
A possible brute-force-solution that assumes you have converted your JSON-data into a dict could look like this:
def fixup_dict_a_with_b(a, b):
for feature_a in a["features"]:
for feature_b in b["features"]:
if feature_a["properties"]["id"] == feature_b["name"]:
feature_a["properties"]["density"] = feature_b["count"]
break
This can of course be "abstractified" to your liking. ;)
Other, more elegant solutions exist, but this one is straightforward and easy to get when you just started to use Python. (Eventually, you might want to look into pandas, for example.)

Related

Is there an efficient way to write data to a JSON file using a dictionary in Python?

I'm writing a program in Python to use an API that needs to get input from a JSON payload in a really specific way which is shown below. The poid element will contain a different number with each run of the program, the inventories element contains a list of dictionaries that I am trying to send to the API.
[
{
"poid":"22130",
"inventories":
[
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
]
The data I need to put into the file is stored in a dictionary and a list as shown below. For simplicity of this post, I'm showing what the dictionary and list would look like after another method creates them. I'm not sure if this is the most efficient way of storing this data when I'm having to write it to JSON.
pn_and_mfgr_dict = {'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1', 'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}
quantities = ["1","3"]
poid = 22130 #this will be different each run
If it makes sense from what I've written above, I need to generate a JSON file that looks like the first codeblock given the information from the second codeblock. The item at index 0 in the quantities list corresponds to the first key/value pair in the dictionary and so on. The "condition" value in the first codeblock will always have "REF" as its value for my use, but I need to also include that in the final payload that gets sent to the API. Since the part number and manufacturer dictionary will be a different length with each run, I also need this method to work regardless of how many values are in the dictionary. This dictionary and the quantities list will always be the same length though. I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be, then reading the file when the for loop is done and sending it as the payload but please correct me if there's a better way to do this like storing everything in variables. I also have no experience with JSON so I have attempted to use JSON libraries to accomplish this with no idea what I'm doing wrong. I can edit this with my attempts tonight but I wanted to post this as soon as possible.

Here is one possible solution:
import json
pn_and_mfgr_dict = {
'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1',
'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'
}
quantities = ['1', '3']
poid = 22130
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
The code above will result in:
{
"poid": 22130,
"inventories": [
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
Naturally, you can adjust that for multiple poids with something like this:
poids = [22130, 22131, 22132]
for poid in poids:
# implement here the logic to get items and quantities for
# each poid
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
You will need to change it to have the correspondents items and quantities for each poid, and I leave that as starting point for you to implement.

Your second block is your input, so you could immediately start by write down a function taking those input and returning a JSON string.
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
constructed_data = [] # TODO
return json.dumps(constructed_data)
Then you could start working on using the inputs to construct the output data you desired. And you already know how to do it.
I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be
Yes, that's the way to do it.
Here's my version of solution:
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
inventories = [
{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF',
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)
]
constructed_data = [
{
'poid': f'{poid}',
'inventories': inventories,
}
]
return json.dumps(constructed_data)

import json
data = {'inventories': [{'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1'}, {'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}]}
quantities = ["1", "3"]
poid = 22130
# Add poid to data
data['poid'] = poid
# Add quantities to data
for item in data['inventories']:
item['quantity'] = quantities.pop(0)
# Serializing json
json_object = json.dumps(data, indent=4)
print(json_object)

Get list value by comparing values

I have a list like this:
data.append(
{
"type": type,
"description": description,
"amount": 1,
}
)
Every time there is a new object I want to check if there already is an entry in the list with the same description. If there is, I need to add 1 to the amount.
How can I do this the most efficient? Is the only way going through all the entries?

I suggest making data a dict and using the description as a key.
If you are concerned about the efficiency of using the string as a key, read this: efficiency of long (str) keys in python dictionary.
Example:
data = {}
while loop(): # your code here
existing = data.get(description)
if existing is None:
data[description] = {
"type": type,
"description": description,
"amount": 1,
}
else:
existing["amount"] += 1
In either case you should first benchmark the two solutions (the other one being the iterative approach) before reaching any conclusions about efficiency.

For each loop with JSON object python

Alright, so I'm struggling a little bit with trying to parse my JSON object.
My aim is to grab the certain JSON key and return it's value.
JSON File
{
"files": {
"resources": [
{
"name": "filename",
"hash": "0x001"
},
{
"name": "filename2",
"hash": "0x002"
}
]
}
}
I've developed a function which allows me to parse the JSON code above
Function
def parsePatcher():
url = '{0}/{1}'.format(downloadServer, patcherName)
patch = urllib2.urlopen(url)
data = json.loads(patch.read())
patch.close()
return data
Okay so now I would like to do a foreach statement which prints out each name and hash inside the "resources": [] object.
Foreach statement
for name, hash in patcher["files"]["resources"]:
print name
print hash
But it only prints out "name" and "hash" not "filename" and "0x001"
Am I doing something incorrect here?

By using name, hash as the for loop target, you are unpacking the dictionary:
>>> d = {"name": "filename", "hash": "0x001"}
>>> name, hash = d
>>> name
'name'
>>> hash
'hash'
This happens because iteration over a dictionary only produces the keys:
>>> list(d)
['name', 'hash']
and unpacking uses iteration to produce the values to be assigned to the target names.
That that worked at all is subject to random events even, on Python 3.3 and newer with hash randomisation enabled by default, the order of those two keys could equally be reversed.
Just use one name to assign the dictionary to, and use subscription on that dictionary:
for resource in patcher["files"]["resources"]:
print resource['name']
print resource['hash']

So what you intend to do is :
for dic in x["files"]["resources"]:
print dic['name'],dic['hash']
You need to iterate on those dictionaries in that array resources.

The problem seems to be you have a list of dictionaries, first get each element of the list, and then ask the element (which is the dictionary) for the values for keys name and hash
EDIT: this is tested and works
mydict = {"files": { "resources": [{ "name": "filename", "hash": "0x001"},{ "name": "filename2", "hash": "0x002"}]} }
for element in mydict["files"]["resources"]:
for d in element:
print d, element[d]

If in case you have multiple files and multiple resources inside it. This generalized solution works.
for keys in patcher:
for indices in patcher[keys].keys():
print(patcher[keys][indices])
Checked output from myside
for keys in patcher:
... for indices in patcher[keys].keys():
... print(patcher[keys][indices])
...
[{'hash': '0x001', 'name': 'filename'}, {'hash': '0x002', 'name': 'filename2'}]

Python - Searching JSON

I have JSON output as follows:
{
"service": [{
"name": ["Production"],
"id": ["256212"]
}, {
"name": ["Non-Production"],
"id": ["256213"]
}]
}
I wish to find all ID's where the pair contains "Non-Production" as a name.
I was thinking along the lines of running a loop to check, something like this:
data = json.load(urllib2.urlopen(URL))
for key, value in data.iteritems():
if "Non-Production" in key[value]: print key[value]
However, I can't seem to get the name and ID from the "service" tree, it returns:
if "Non-Production" in key[value]: print key[value]
TypeError: string indices must be integers
Assumptions:
The JSON is in a fixed format, this can't be changed
I do not have root access, and unable to install any additional packages
Essentially the goal is to obtain a list of ID's of non production "services" in the most optimal way.

Here you go:
data = {
"service": [
{"name": ["Production"],
"id": ["256212"]
},
{"name": ["Non-Production"],
"id": ["256213"]}
]
}
for item in data["service"]:
if "Non-Production" in item["name"]:
print(item["id"])

Whatever I see JSON I think about functionnal programming ! Anyone else ?!
I think it is a better idea if you use function like concat or flat, filter and reduce, etc.
Egg one liner:
[s.get('id', [0])[0] for s in filter(lambda srv : "Non-Production" not in srv.get('name', []), data.get('service', {}))]
EDIT:
I updated the code, even if data = {}, the result will be [] an empty id list.

Split dictionary by keys in Python

I would like to clarify this code, especially variables. I am a newbie in python.
GOAL:
I would like split data dictionary pairs by keys of this dictionary. The output is list of lists of class Ward. I think, my solution is too complicated, is another better solution?
class Ward:
def __init__(self, code, data):
self.code = code
self.data = data
def prepare_data_for_templates(cs, h, f):
pairs = {'201': ['<tr><td>Dunajská Streda</td><td>201</td></tr>\n', '<tr><td>Dunajský Klátov</td><td>201</td></tr>\n'], '205': ['<tr><td>Košolná</td><td>205</td></tr>\n',]}
print "Pairs: " + str(sorted(pairs.keys())) + "\n"
#output data - ba, tt...
OUT = []
BA = []
TT = []
for k, v in sorted(pairs.iteritems()):
#print k + "\n", v
if int(k) < 199:
BA.append( Ward(k, v )
elif int(k) < 299:
TT.append( Ward(k, v )
OUT.append(BA)
OUT.append(TT)
for j in OUT:
for i in j:
print i.code
return OUT
EDIT: Thanks for the answer, I updated my code using JSON.
tab01.json:
{
"data": [
{
"id": "101", "c01": "mun1"
},
{
"id": "101", "c01": "mun2"
},
{
"id": "205", "c01": "mun3"
},
{
"id": "205", "c01": "mun4"
},
{
"id": "205", "c01": "mun5"
}
]
}
code.py:
import os, json
def prepare_data_for_templates(file):
pairs = {}
codes = []
with open(file, "r") as input:
json_obj = json.load(input)
for d in json_obj["data"]:
codes.append((str(d["id"]), d))
for c in codes:
pairs.setdefault(str(c[0]), []).append(c[1])
for k, v in pairs.iteritems():
with open( str(k) + '.json', 'w') as outfile:
json.dump(v, outfile)
prepare_data_for_templates("tab01.json")

"Clean up this (working) code" is generally not a good SO question because it's very vague.
I've downvoted, but, in this particular case, you have a few things that can be done right off the bat.
Use New Style Classes, or Tuples
Your Ward class appears to be unnecessary.
Unless there is other functionality there that you are not showing, you should just create tuples.
Instead of Ward(k, v) just use the tuple (k, v).
If you do need the class, at least write it as a new style class, class Ward(object):
The syntax that you have used, class Ward: is deprecated and supported only for historical reasons.
Keep Data External from Code
Right now, you have a giant, messy, hard to work with variable,
pairs = {'201': ['<tr><td>Dunajská Streda</td><td>201</td></tr>\n', '<tr><td>Dunajský Klátov</td><td>201</td></tr>\n'], '205': ['<tr><td>Košolná</td><td>205</td></tr>\n', '<tr><td>Leopoldov</td><td>205</td></tr>\n', '<tr><td>Trnava</td><td>205</td></tr>\n'], '705': ['<tr><td>Pušovce</td><td>705</td></tr>\n', '<tr><td>Radatice</td><td>705</td></tr>\n', '<tr><td>Rokycany</td><td>705</td></tr>\n'], '304': ['<tr><td>Rudnianska Lehota</td><td>304</td></tr>\n', '<tr><td>Sebedražie</td><td>304</td></tr>\n', '<tr><td>Seč</td><td>304</td></tr>\n', '<tr><td>Šútovce</td><td>304</td></tr>\n'], '305': ['<tr><td>Selec</td><td>305</td></tr>\n'], '103': ['<tr><td>Modra</td><td>103</td></tr>\n', '<tr><td>Pezinok</td><td>103</td></tr>\n'], '101': ['<tr><td>Bratislava - Nové Mesto</td><td>101</td></tr>\n', '<tr><td>Bratislava - Podunajské Biskupice</td><td>101</td></tr>\n'], '806': ['<tr><td>Plechotice</td><td>806</td></tr>\n', '<tr><td>Trebišov</td><td>806</td></tr>\n']}
This is pretty much impossible to sustain if you want to add data, or the data changes.
This looks like partially parsed HTML of some kind, so that might be a better form in which you store your data, and let your python code parse the HTML every time it runs.
If you want to keep processed data, and not the original HTML source, I'd recommend putting this into a JSON file; something like this:
{
"201": {
"name": "Dunajsky",
"municipalities": [
"Streda",
"Klatov"
]
},
"205": {
"name": "Kosoln",
"municipalities": {
"Leopoldov",
"Trnava"
}
}
}
Your data is pretty dirty, so this is just my best guess at the structure that you are trying to represent.
This will make your life much easier moving forward.
You can then parse this data using the python json library:
Don't Make a List of Lists
As far as I can tell, you are trying to sort data.
There is no need for a list of lists for this purpose -- it's unnecessarily complicated, and, as a result, confusing.
Consider something more like this:
with open('wards.json', 'r') as f:
json_obj = json.load(f)
# assume the structure above is used for the JSON
# don't do any validation (because that would require more work with something
# like a JSON schema, and I'm too lazy for that)
# convert the object to a list of tuples, and convert codes from strings to ints
code_list = []
for (code, data) in json_obj.items():
code_list.append((int(code), data))
# sorting tuples does a dictionary-order sorting, so this will sort on keys,
# then on the data components of the tuples (which presumably don't have
# meaningful ordering)
return sorted(code_list)
A slightly cleaner version of the conversion into code_list would use a comprehension:
code_list = [(int(code), data) for (code, data) in json_obj.items()]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.