Split dictionary by keys in Python

Split dictionary by keys in Python - python

I would like to clarify this code, especially variables. I am a newbie in python.
GOAL:
I would like split data dictionary pairs by keys of this dictionary. The output is list of lists of class Ward. I think, my solution is too complicated, is another better solution?
class Ward:
def __init__(self, code, data):
self.code = code
self.data = data
def prepare_data_for_templates(cs, h, f):
pairs = {'201': ['<tr><td>Dunajská Streda</td><td>201</td></tr>\n', '<tr><td>Dunajský Klátov</td><td>201</td></tr>\n'], '205': ['<tr><td>Košolná</td><td>205</td></tr>\n',]}
print "Pairs: " + str(sorted(pairs.keys())) + "\n"
#output data - ba, tt...
OUT = []
BA = []
TT = []
for k, v in sorted(pairs.iteritems()):
#print k + "\n", v
if int(k) < 199:
BA.append( Ward(k, v )
elif int(k) < 299:
TT.append( Ward(k, v )
OUT.append(BA)
OUT.append(TT)
for j in OUT:
for i in j:
print i.code
return OUT
EDIT: Thanks for the answer, I updated my code using JSON.
tab01.json:
{
"data": [
{
"id": "101", "c01": "mun1"
},
{
"id": "101", "c01": "mun2"
},
{
"id": "205", "c01": "mun3"
},
{
"id": "205", "c01": "mun4"
},
{
"id": "205", "c01": "mun5"
}
]
}
code.py:
import os, json
def prepare_data_for_templates(file):
pairs = {}
codes = []
with open(file, "r") as input:
json_obj = json.load(input)
for d in json_obj["data"]:
codes.append((str(d["id"]), d))
for c in codes:
pairs.setdefault(str(c[0]), []).append(c[1])
for k, v in pairs.iteritems():
with open( str(k) + '.json', 'w') as outfile:
json.dump(v, outfile)
prepare_data_for_templates("tab01.json")

"Clean up this (working) code" is generally not a good SO question because it's very vague.
I've downvoted, but, in this particular case, you have a few things that can be done right off the bat.
Use New Style Classes, or Tuples
Your Ward class appears to be unnecessary.
Unless there is other functionality there that you are not showing, you should just create tuples.
Instead of Ward(k, v) just use the tuple (k, v).
If you do need the class, at least write it as a new style class, class Ward(object):
The syntax that you have used, class Ward: is deprecated and supported only for historical reasons.
Keep Data External from Code
Right now, you have a giant, messy, hard to work with variable,
pairs = {'201': ['<tr><td>Dunajská Streda</td><td>201</td></tr>\n', '<tr><td>Dunajský Klátov</td><td>201</td></tr>\n'], '205': ['<tr><td>Košolná</td><td>205</td></tr>\n', '<tr><td>Leopoldov</td><td>205</td></tr>\n', '<tr><td>Trnava</td><td>205</td></tr>\n'], '705': ['<tr><td>Pušovce</td><td>705</td></tr>\n', '<tr><td>Radatice</td><td>705</td></tr>\n', '<tr><td>Rokycany</td><td>705</td></tr>\n'], '304': ['<tr><td>Rudnianska Lehota</td><td>304</td></tr>\n', '<tr><td>Sebedražie</td><td>304</td></tr>\n', '<tr><td>Seč</td><td>304</td></tr>\n', '<tr><td>Šútovce</td><td>304</td></tr>\n'], '305': ['<tr><td>Selec</td><td>305</td></tr>\n'], '103': ['<tr><td>Modra</td><td>103</td></tr>\n', '<tr><td>Pezinok</td><td>103</td></tr>\n'], '101': ['<tr><td>Bratislava - Nové Mesto</td><td>101</td></tr>\n', '<tr><td>Bratislava - Podunajské Biskupice</td><td>101</td></tr>\n'], '806': ['<tr><td>Plechotice</td><td>806</td></tr>\n', '<tr><td>Trebišov</td><td>806</td></tr>\n']}
This is pretty much impossible to sustain if you want to add data, or the data changes.
This looks like partially parsed HTML of some kind, so that might be a better form in which you store your data, and let your python code parse the HTML every time it runs.
If you want to keep processed data, and not the original HTML source, I'd recommend putting this into a JSON file; something like this:
{
"201": {
"name": "Dunajsky",
"municipalities": [
"Streda",
"Klatov"
]
},
"205": {
"name": "Kosoln",
"municipalities": {
"Leopoldov",
"Trnava"
}
}
}
Your data is pretty dirty, so this is just my best guess at the structure that you are trying to represent.
This will make your life much easier moving forward.
You can then parse this data using the python json library:
Don't Make a List of Lists
As far as I can tell, you are trying to sort data.
There is no need for a list of lists for this purpose -- it's unnecessarily complicated, and, as a result, confusing.
Consider something more like this:
with open('wards.json', 'r') as f:
json_obj = json.load(f)
# assume the structure above is used for the JSON
# don't do any validation (because that would require more work with something
# like a JSON schema, and I'm too lazy for that)
# convert the object to a list of tuples, and convert codes from strings to ints
code_list = []
for (code, data) in json_obj.items():
code_list.append((int(code), data))
# sorting tuples does a dictionary-order sorting, so this will sort on keys,
# then on the data components of the tuples (which presumably don't have
# meaningful ordering)
return sorted(code_list)
A slightly cleaner version of the conversion into code_list would use a comprehension:
code_list = [(int(code), data) for (code, data) in json_obj.items()]

Related

Is there an efficient way to write data to a JSON file using a dictionary in Python?

I'm writing a program in Python to use an API that needs to get input from a JSON payload in a really specific way which is shown below. The poid element will contain a different number with each run of the program, the inventories element contains a list of dictionaries that I am trying to send to the API.
[
{
"poid":"22130",
"inventories":
[
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
]
The data I need to put into the file is stored in a dictionary and a list as shown below. For simplicity of this post, I'm showing what the dictionary and list would look like after another method creates them. I'm not sure if this is the most efficient way of storing this data when I'm having to write it to JSON.
pn_and_mfgr_dict = {'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1', 'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}
quantities = ["1","3"]
poid = 22130 #this will be different each run
If it makes sense from what I've written above, I need to generate a JSON file that looks like the first codeblock given the information from the second codeblock. The item at index 0 in the quantities list corresponds to the first key/value pair in the dictionary and so on. The "condition" value in the first codeblock will always have "REF" as its value for my use, but I need to also include that in the final payload that gets sent to the API. Since the part number and manufacturer dictionary will be a different length with each run, I also need this method to work regardless of how many values are in the dictionary. This dictionary and the quantities list will always be the same length though. I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be, then reading the file when the for loop is done and sending it as the payload but please correct me if there's a better way to do this like storing everything in variables. I also have no experience with JSON so I have attempted to use JSON libraries to accomplish this with no idea what I'm doing wrong. I can edit this with my attempts tonight but I wanted to post this as soon as possible.

Here is one possible solution:
import json
pn_and_mfgr_dict = {
'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1',
'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'
}
quantities = ['1', '3']
poid = 22130
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
The code above will result in:
{
"poid": 22130,
"inventories": [
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
Naturally, you can adjust that for multiple poids with something like this:
poids = [22130, 22131, 22132]
for poid in poids:
# implement here the logic to get items and quantities for
# each poid
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
You will need to change it to have the correspondents items and quantities for each poid, and I leave that as starting point for you to implement.

Your second block is your input, so you could immediately start by write down a function taking those input and returning a JSON string.
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
constructed_data = [] # TODO
return json.dumps(constructed_data)
Then you could start working on using the inputs to construct the output data you desired. And you already know how to do it.
I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be
Yes, that's the way to do it.
Here's my version of solution:
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
inventories = [
{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF',
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)
]
constructed_data = [
{
'poid': f'{poid}',
'inventories': inventories,
}
]
return json.dumps(constructed_data)

import json
data = {'inventories': [{'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1'}, {'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}]}
quantities = ["1", "3"]
poid = 22130
# Add poid to data
data['poid'] = poid
# Add quantities to data
for item in data['inventories']:
item['quantity'] = quantities.pop(0)
# Serializing json
json_object = json.dumps(data, indent=4)
print(json_object)

get value by key from json data with python [duplicate]

I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?

In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.

You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.

The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.

How can I access the nested data in this complex JSON, which includes another JSON document as one of the strings?

I have some JSON data like:
{
"status": "200",
"msg": "",
"data": {
"time": "1515580011",
"video_info": [
{
"announcement": "{\"announcement_id\":\"6\",\"name\":\"INS\\u8d26\\u53f7\",\"icon\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-08-18_19:44:54\\\/ins.png\",\"icon_new\":\"http:\\\/\\\/liveme.cms.ksmobile.net\\\/live\\\/announcement\\\/2017-10-20_22:24:38\\\/4.png\",\"videoid\":\"15154610218328614178\",\"content\":\"FOLLOW ME PLEASE\",\"x_coordinate\":\"0.22\",\"y_coordinate\":\"0.23\"}",
"announcement_shop": "",
etc.
How do I grab the content "FOLLOW ME PLEASE"? I tried using
replay_data = raw_replay_data['data']['video_info'][0]
announcement = replay_data['announcement']
But now announcement is a string representing more JSON data. I can't continue indexing announcement['content'] results in TypeError: string indices must be integers.
How can I get the desired string in the "right" way, i.e. respecting the actual structure of the data?

In a single line -
>>> json.loads(data['data']['video_info'][0]['announcement'])['content']
'FOLLOW ME PLEASE'
To help you understand how to access data (so you don't have to ask again), you'll need to stare at your data.
First, let's lay out your data nicely. You can either use json.dumps(data, indent=4), or you can use an online tool like JSONLint.com.
{
'data': {
'time': '1515580011',
'video_info': [{
'announcement': ( # ***
"""{
"announcement_id": "6",
"name": "INS\\u8d26\\u53f7",
"icon": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-08-18_19:44:54\\\\/ins.png",
"icon_new": "http:\\\\/\\\\/liveme.cms.ksmobile.net\\\\/live\\\\/announcement\\\\/2017-10-20_22:24:38\\\\/4.png",
"videoid": "15154610218328614178",
"content": "FOLLOW ME PLEASE",
"x_coordinate": "0.22",
"y_coordinate": "0.23"
}"""),
'announcement_shop': ''
}]
},
'msg': '',
'status': '200'
}
*** Note that the data in the announcement key is actually more json data, which I've laid out on separate lines.
First, find out where your data resides. You're looking for the data in the content key, which is accessed by the announcement key, which is part of a dictionary inside a list of dicts, which can be accessed by the video_info key, which is in turn accessed by data.
So, in summary, "descend" the ladder that is "data" using the following "rungs" -
data, a dictionary
video_info, a list of dicts
announcement, a dict in the first dict of the list of dicts
content residing as part of json data.
First,
i = data['data']
Next,
j = i['video_info']
Next,
k = j[0] # since this is a list
If you only want the first element, this suffices. Otherwise, you'd need to iterate:
for k in j:
...
Next,
l = k['announcement']
Now, l is JSON data. Load it -
import json
m = json.loads(l)
Lastly,
content = m['content']
print(content)
'FOLLOW ME PLEASE'
This should hopefully serve as a guide should you have future queries of this nature.

You have nested JSON data; the string associated with the 'annoucement' key is itself another, separate, embedded JSON document.
You'll have to decode that string first:
import json
replay_data = raw_replay_data['data']['video_info'][0]
announcement = json.loads(replay_data['announcement'])
print(announcement['content'])
then handle the resulting dictionary from there.

The content of "announcement" is another JSON string. Decode it and then access its contents as you were doing with the outer objects.

Line-length based custom python JSON encoding for serializables

My problem is similar to Can I implement custom indentation for pretty-printing in Python’s JSON module? and How to change json encoding behaviour for serializable python object? but instead I'd like to collapse lines together if the entire JSON encoded structure can fit on that single line, with configurable line length, in Python 2.X and 3.X. The output is intended for easy-to-read documentation of the JSON structures, rather than debugging. Clarifying: the result MUST be valid JSON, and allow for the regular JSON encoding features of OrderedDicts/sort_keys, default handlers, and so forth.
The solution from custom indentation does not apply, as the individual structures would need to know their serialized lengths in advance, thus adding a NoIndent class doesn't help as every structure might or might not be indented. The solution from changing the behavior of json serializable does not apply as there aren't any (weird) custom overrides on the data structures, they're just regular lists and dicts.
For example, instead of:
{
"#context": "http://linked.art/ns/context/1/full.jsonld",
"id": "http://lod.example.org/museum/ManMadeObject/0",
"type": "ManMadeObject",
"classified_as": [
"aat:300033618",
"aat:300133025"
]
}
I would like to produce:
{
"#context": "http://linked.art/ns/context/1/full.jsonld",
"id": "http://lod.example.org/museum/ManMadeObject/0",
"type": "ManMadeObject",
"classified_as": ["aat:300033618", "aat:300133025"]
}
At any level of nesting within the structure, and across any numbers of levels of nesting until the line length was reached. Thus if there was a list with a single object inside, with a single key/value pair, it would become:
{
"#context": "http://linked.art/ns/context/1/full.jsonld",
"id": "http://lod.example.org/museum/ManMadeObject/0",
"type": "ManMadeObject",
"classified_as": [{"id": "aat:300033618"}]
}
It seems like a recursive descent parser on the indented output would work, along the lines of #robm's approach to custom indentation, but the complexity seems to quickly approach that of writing a JSON parser and serializer.
Otherwise it seems like a very custom JSONEncoder is needed.
Your thoughts appreciated!

Very inefficient, but seems to work so far:
def _collapse_json(text, collapse):
js_indent = 2
lines = text.splitlines()
out = [lines[0]]
while lines:
l = lines.pop(0)
indent = len(re.split('\S', l, 1)[0])
if indent and l.rstrip()[-1] in ['[', '{']:
curr = indent
temp = []
stemp = []
while lines and curr <= indent:
if temp and curr == indent:
break
temp.append(l[curr:])
stemp.append(l.strip())
l = lines.pop(0)
indent = len(re.split('\S', l, 1)[0])
temp.append(l[curr:])
stemp.append(l.lstrip())
short = " " * curr + ''.join(stemp)
if len(short) < collapse:
out.append(short)
else:
ntext = '\n'.join(temp)
nout = _collapse_json(ntext, collapse)
for no in nout:
out.append(" " * curr + no)
l = lines.pop(0)
elif indent:
out.append(l)
out.append(l)
return out
def collapse_json(text, collapse):
return '\n'.join(_collapse_json(text, collapse))
Happy to accept something else that produces the same output without crawling up and down constantly!

Python - Replace value in JSON file from second file if keys match

I have two JSON files that look like this
{"type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { **"id"**: "Carlow", **"density"**: "0" } , "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -6.58901, 52.906464 ], [ -6.570265, 52.905682 ], [ -6.556207, 52.906464 ],
Second JSON file
{"features": [{"**count**": 2, "name": "**Sligo**"}, {"count": 3"name":"Fermanagh"},{"count": 1, "name": "Laois"},
I am trying to check if "id" in the first file matches with "name" in the second file and if so change the value for "density" to the value for "count" from the second file. I am looking at using recursion from a similar question I found here Replace value in JSON file for key which can be nested by n levels but it only checks if one key matches and changes value. I need two keys to match before changing values. This is the code I have used so far but not sure how to add two keys and two values. I use Counter to count the number of times a string appears and save it to county_names.json, which is my second JSON file. ire_countiesTmp.json is my first file that I am trying to replace the values with from the second file. Im not sure how to do this with Python as only started learning it. Any help would be great, or if you know a better way. Thanks
import json, pprint
from collections import Counter
with open('../county_names.json') as data_file:
county_list = json.load(data_file)
for i in county_list:
c = Counter(i for i in county_list)
for county,count in c.iteritems():
with open('ire_countiesTmp.json') as f:
def fixup(adict, k1, v1, k2, v2):
for key in adict.keys():
if adict[key] == v1:
adict[key] = v
elif type(adict[key]) is dict:
fixup(adict[key], k, v)
#pprint.pprint( data )
fixup(data, 'id', county, 'density', count)
pprint.pprint( data )

Generally speaking, recursion is not a good idea in Python. The compiler/interpreter does not handle it well and it becomes terribly slow, as there is no tail recursion optimisation: Why is recursion in python so slow? .
A possible brute-force-solution that assumes you have converted your JSON-data into a dict could look like this:
def fixup_dict_a_with_b(a, b):
for feature_a in a["features"]:
for feature_b in b["features"]:
if feature_a["properties"]["id"] == feature_b["name"]:
feature_a["properties"]["density"] = feature_b["count"]
break
This can of course be "abstractified" to your liking. ;)
Other, more elegant solutions exist, but this one is straightforward and easy to get when you just started to use Python. (Eventually, you might want to look into pandas, for example.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.