Line-length based custom python JSON encoding for serializables - python

My problem is similar to Can I implement custom indentation for pretty-printing in Python’s JSON module? and How to change json encoding behaviour for serializable python object? but instead I'd like to collapse lines together if the entire JSON encoded structure can fit on that single line, with configurable line length, in Python 2.X and 3.X. The output is intended for easy-to-read documentation of the JSON structures, rather than debugging. Clarifying: the result MUST be valid JSON, and allow for the regular JSON encoding features of OrderedDicts/sort_keys, default handlers, and so forth.
The solution from custom indentation does not apply, as the individual structures would need to know their serialized lengths in advance, thus adding a NoIndent class doesn't help as every structure might or might not be indented. The solution from changing the behavior of json serializable does not apply as there aren't any (weird) custom overrides on the data structures, they're just regular lists and dicts.
For example, instead of:
{
"#context": "http://linked.art/ns/context/1/full.jsonld",
"id": "http://lod.example.org/museum/ManMadeObject/0",
"type": "ManMadeObject",
"classified_as": [
"aat:300033618",
"aat:300133025"
]
}
I would like to produce:
{
"#context": "http://linked.art/ns/context/1/full.jsonld",
"id": "http://lod.example.org/museum/ManMadeObject/0",
"type": "ManMadeObject",
"classified_as": ["aat:300033618", "aat:300133025"]
}
At any level of nesting within the structure, and across any numbers of levels of nesting until the line length was reached. Thus if there was a list with a single object inside, with a single key/value pair, it would become:
{
"#context": "http://linked.art/ns/context/1/full.jsonld",
"id": "http://lod.example.org/museum/ManMadeObject/0",
"type": "ManMadeObject",
"classified_as": [{"id": "aat:300033618"}]
}
It seems like a recursive descent parser on the indented output would work, along the lines of #robm's approach to custom indentation, but the complexity seems to quickly approach that of writing a JSON parser and serializer.
Otherwise it seems like a very custom JSONEncoder is needed.
Your thoughts appreciated!

Very inefficient, but seems to work so far:
def _collapse_json(text, collapse):
js_indent = 2
lines = text.splitlines()
out = [lines[0]]
while lines:
l = lines.pop(0)
indent = len(re.split('\S', l, 1)[0])
if indent and l.rstrip()[-1] in ['[', '{']:
curr = indent
temp = []
stemp = []
while lines and curr <= indent:
if temp and curr == indent:
break
temp.append(l[curr:])
stemp.append(l.strip())
l = lines.pop(0)
indent = len(re.split('\S', l, 1)[0])
temp.append(l[curr:])
stemp.append(l.lstrip())
short = " " * curr + ''.join(stemp)
if len(short) < collapse:
out.append(short)
else:
ntext = '\n'.join(temp)
nout = _collapse_json(ntext, collapse)
for no in nout:
out.append(" " * curr + no)
l = lines.pop(0)
elif indent:
out.append(l)
out.append(l)
return out
def collapse_json(text, collapse):
return '\n'.join(_collapse_json(text, collapse))
Happy to accept something else that produces the same output without crawling up and down constantly!

Related

Python3 - Parse list of strings inside nested json

Python Noob here. I saw many similar questions but none of it my exact use case. I have a simple nested json, and I'm trying to access the element name present inside metadata. Below is my sample json.
{
"items": [{
"metadata": {
"name": "myname1"
}
},
{
"metadata": {
"name": "myname1"
}
}
]
}
Below is the code That I have tried so far, but not successfull.
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for i in data['items']:
for j in i['metadata']:
print (j['name'])
It errors out stating below
File "pythonjson.py", line 8, in
print (j['name']) TypeError: string indices must be integers
When I printed print (type(j)) I received the following o/p <class 'str'>. So I can see that it is a list of strings and not an dictinoary. So now How can I parse through a list of strings? Any official documentation or guide would be much helpful to know the concept of this.
Your json is bad, and the python exception is clear and unambiguous. You have the basic string "name" and you are trying to ... do a lookup on that?
Let's cut out all the json and look at the real issue. You do not know how to iterate over a dict. You're actually iterating over the keys themselves. If you want to see their values too, you're going to need dict.items()
https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
metadata = {"name": "myname1"}
for key, value in metadata.items():
if key == "name":
print ('the name is', value)
But why bother if you already know the key you want to look up?
This is literally why we have dict.
print ('the name is', metadata["name"])
You likely need:
import json
f = open('./myfile.json')
x = f.read()
data = json.loads(x)
for item in data['items']:
print(item["metadata"]["name"]
Your original JSON is not valid (colons missing).
to access contents of name use "i["metadata"].keys()" this will return all keys in "metadata".
Working code to access all values of the dictionary in "metadata".
for i in data['items']:
for j in i["metadata"].keys():
print (i["metadata"][j])
**update:**Working code to access contents of "name" only.
for i in data['items']:
print (i["metadata"]["name"])

Split dictionary by keys in Python

I would like to clarify this code, especially variables. I am a newbie in python.
GOAL:
I would like split data dictionary pairs by keys of this dictionary. The output is list of lists of class Ward. I think, my solution is too complicated, is another better solution?
class Ward:
def __init__(self, code, data):
self.code = code
self.data = data
def prepare_data_for_templates(cs, h, f):
pairs = {'201': ['<tr><td>Dunajská Streda</td><td>201</td></tr>\n', '<tr><td>Dunajský Klátov</td><td>201</td></tr>\n'], '205': ['<tr><td>Košolná</td><td>205</td></tr>\n',]}
print "Pairs: " + str(sorted(pairs.keys())) + "\n"
#output data - ba, tt...
OUT = []
BA = []
TT = []
for k, v in sorted(pairs.iteritems()):
#print k + "\n", v
if int(k) < 199:
BA.append( Ward(k, v )
elif int(k) < 299:
TT.append( Ward(k, v )
OUT.append(BA)
OUT.append(TT)
for j in OUT:
for i in j:
print i.code
return OUT
EDIT: Thanks for the answer, I updated my code using JSON.
tab01.json:
{
"data": [
{
"id": "101", "c01": "mun1"
},
{
"id": "101", "c01": "mun2"
},
{
"id": "205", "c01": "mun3"
},
{
"id": "205", "c01": "mun4"
},
{
"id": "205", "c01": "mun5"
}
]
}
code.py:
import os, json
def prepare_data_for_templates(file):
pairs = {}
codes = []
with open(file, "r") as input:
json_obj = json.load(input)
for d in json_obj["data"]:
codes.append((str(d["id"]), d))
for c in codes:
pairs.setdefault(str(c[0]), []).append(c[1])
for k, v in pairs.iteritems():
with open( str(k) + '.json', 'w') as outfile:
json.dump(v, outfile)
prepare_data_for_templates("tab01.json")
"Clean up this (working) code" is generally not a good SO question because it's very vague.
I've downvoted, but, in this particular case, you have a few things that can be done right off the bat.
Use New Style Classes, or Tuples
Your Ward class appears to be unnecessary.
Unless there is other functionality there that you are not showing, you should just create tuples.
Instead of Ward(k, v) just use the tuple (k, v).
If you do need the class, at least write it as a new style class, class Ward(object):
The syntax that you have used, class Ward: is deprecated and supported only for historical reasons.
Keep Data External from Code
Right now, you have a giant, messy, hard to work with variable,
pairs = {'201': ['<tr><td>Dunajská Streda</td><td>201</td></tr>\n', '<tr><td>Dunajský Klátov</td><td>201</td></tr>\n'], '205': ['<tr><td>Košolná</td><td>205</td></tr>\n', '<tr><td>Leopoldov</td><td>205</td></tr>\n', '<tr><td>Trnava</td><td>205</td></tr>\n'], '705': ['<tr><td>Pušovce</td><td>705</td></tr>\n', '<tr><td>Radatice</td><td>705</td></tr>\n', '<tr><td>Rokycany</td><td>705</td></tr>\n'], '304': ['<tr><td>Rudnianska Lehota</td><td>304</td></tr>\n', '<tr><td>Sebedražie</td><td>304</td></tr>\n', '<tr><td>Seč</td><td>304</td></tr>\n', '<tr><td>Šútovce</td><td>304</td></tr>\n'], '305': ['<tr><td>Selec</td><td>305</td></tr>\n'], '103': ['<tr><td>Modra</td><td>103</td></tr>\n', '<tr><td>Pezinok</td><td>103</td></tr>\n'], '101': ['<tr><td>Bratislava - Nové Mesto</td><td>101</td></tr>\n', '<tr><td>Bratislava - Podunajské Biskupice</td><td>101</td></tr>\n'], '806': ['<tr><td>Plechotice</td><td>806</td></tr>\n', '<tr><td>Trebišov</td><td>806</td></tr>\n']}
This is pretty much impossible to sustain if you want to add data, or the data changes.
This looks like partially parsed HTML of some kind, so that might be a better form in which you store your data, and let your python code parse the HTML every time it runs.
If you want to keep processed data, and not the original HTML source, I'd recommend putting this into a JSON file; something like this:
{
"201": {
"name": "Dunajsky",
"municipalities": [
"Streda",
"Klatov"
]
},
"205": {
"name": "Kosoln",
"municipalities": {
"Leopoldov",
"Trnava"
}
}
}
Your data is pretty dirty, so this is just my best guess at the structure that you are trying to represent.
This will make your life much easier moving forward.
You can then parse this data using the python json library:
Don't Make a List of Lists
As far as I can tell, you are trying to sort data.
There is no need for a list of lists for this purpose -- it's unnecessarily complicated, and, as a result, confusing.
Consider something more like this:
with open('wards.json', 'r') as f:
json_obj = json.load(f)
# assume the structure above is used for the JSON
# don't do any validation (because that would require more work with something
# like a JSON schema, and I'm too lazy for that)
# convert the object to a list of tuples, and convert codes from strings to ints
code_list = []
for (code, data) in json_obj.items():
code_list.append((int(code), data))
# sorting tuples does a dictionary-order sorting, so this will sort on keys,
# then on the data components of the tuples (which presumably don't have
# meaningful ordering)
return sorted(code_list)
A slightly cleaner version of the conversion into code_list would use a comprehension:
code_list = [(int(code), data) for (code, data) in json_obj.items()]

Python script to convert complicated flattened data to JSON

Sorry about the vague title, I need some help with Python magic and couldn't think of anything more descriptive.
I have a fixed JSON data structure that I need to convert a CSV file to. The structure is fixed, but deeply nested with lists and such. It's similar to this but more complicated:
{
"foo" : bar,
"baz" : qux,
"nub" : [
{
"bub": "gob",
"nab": [
{
"nip": "jus",
"the": "tip",
},
...
],
},
...
],
"cok": "hed"
}
Hopefully you get the idea. Lists on dicts on lists on lists and so forth. My csv for that might look like this:
foo, baz, nub.bub, nub.nab.nip, nub.nab.the, cok
bar, qux, "gob" ,,,, "hed"
,,,,, "nab", "jus","tip",,
,,,,, "nab", "other", "values",,
Sorry if this is hard to read, but the basic idea is if there's a listed item it will be in the next row, and values are repeated to denote what sub-lists belong to what.
I'm not looking for anyone to come up with a solution to this mess, just maybe some pointers on techniques or things to look into.
Right now I have a rough plan:
I start by turning the header into a list of tuples containing the keys. For each group of rows (item) I'll create a copy of my template dict. I have a function that will set a dict value from a tuple of keys, unless it finds a list. In this case I'm going to call a funky recursive function and pass it my iterator, and continue filling up the dict in that function, and making recursive calls as I find new lists.
I could also do a lot of hardcoding, but what's the fun in that?
So that's my story. Again, just looking for some pointers on what the best way to do this might be. I wrote this quickly so it might be kinda confusing, please let me know if any more info would help. Thanks!
Your JSON is malformed. Additionally, your json must not contain arrays in order to achieve what you want.
def _tocsv(obj, base=''):
flat_dict = {}
for k in obj:
value = obj[k]
if isinstance(value, dict):
flat_dict.update(_tocsv(value, base + k + '.'))
elif isinstance(value, (int, long, str, unicode, float, bool)):
flat_dict[base + k] = value
else:
raise ValueError("Can't serialize value of type "+ type(value).__name__)
return flat_dict
def tocsv(json_content):
#assume you imported json
value = json.loads(json_content)
if isinstance(value, dict):
return _tocsv(value)
else:
raise ValueError("JSON root object must be a hash")
will let you flatten something like:
{
foo: "nestor",
bar: "kirchner",
baz: {
clorch: 1,
narf: 2,
peep: {
ooo: "you suck"
}
}
}
into something like:
{"foo": "nestor", "bar": "kirchner", "baz.clorch": 1, "baz.narf": 2, "baz.peep.ooo": "you suck"}
the keys don't preserve any specific order. you can replace flat_dict = {} with the construction of an OrderedDict if you want to preserve order.
assuming you have an array of such flat dicts:
def tocsv_many(json_str):
#assume you imported json
value = json.loads(json_content)
result = []
if isinstance(value, list):
for element in value:
if isinstance(element, dict):
result.append(_tocsv(element))
else:
raise ValueError("root children must be dicts")
else:
raise ValueError("The JSON root must be a list")
flat_dicts = tocsv_many(yourJsonInput)
you could:
create a csvlines = [] list which will hold the csv lines for ur file.
create a keysSet = set() which will hold the possible keys.
for each dict you have in this way, add the .keys() to the set. no key order is guaranteed with a normal set; use a sorted set instead. Finally we get the first CSV line.
for flat_dict in flat_dicts:
keysSet.extend(flat_dict.keys())
csvlines.appens(",".join(keysSet))
for each dict you have (iterate again), you generate an array like this:
for flat_dict in flat_dicts:
csvline = ",".join([json.dumps(flat_dict.get(keyInSet, '')) for keyInSet in keysSet])
csvlines.append(csvline)
voilah! you have your lines in csvlines

How do I use a for loop when reading from a dictionary that might contain a list of dicts, but might not?

I apologize in advance that the title is so confusing. It makes a lot more sense in code, so here goes:
I am parsing data from a REST API that returns JSON, and I have a bit of an issue with this particular structure:
{ 'Order' : [
{ 'orderID': '1',
'OrderLines': {
'OrderLine': [
{ 'lineID':'00001', 'quantity':'1', 'cost':'10', 'description':'foo' },
{ 'lineID':'00002', 'quantity':'2', 'cost':'23.42', 'description':'bar' }
]}
}
{ 'orderID': '2',
'OrderLines': {
'OrderLine':
{ 'lineID':'00003', 'quantity':'6', 'cost':'12.99', 'description':'lonely' }
}
}
]}
If you'll notice, the second order only has one OrderLine, so instead of returning a list containing dictionaries, it returns the dictionary. Here is what I am trying to do:
orders_json = json.loads(from_server)
for order in orders_json['Order']:
print 'Order ID: {}'.format(order['orderID'])
for line in order['OrderLines']['OrderLine']:
print '-> Line ID: {}, Quantity: {}'.format(line['lineID'], line['quantity'])
It works just fine for the first order, but the second order throws TypeError: string indices must be integers since line is now a string containing the dictionary, instead of a dictionary from the list. I've been banging my head against this for hours now, and I feel like I am missing something obvious.
Here are some of the things I have tried:
Using len(line) to see if it gave me something unique for the one line orders. It does not. It returns the number of key:value pairs in the dictionary, which in my real program is 10, which an order containing 10 lines would also return.
Using a try/except. Well, that stops the TypeError from halting the whole thing, but I can't figure out how to address the dictionary once I've done that. Line is a string for single line orders instead of a dictionary.
Whoever designed that API did not do a terribly good job. Anyway, you could check whether OrderLine is a list and, if it's not, wrap it in a one-element list before doing any processing:
if not isinstance(order_line, list):
order_line = [order_line]
That would work, my personal preference would be to get the API fixed.
I'd check if the type is correct and then convert it to a list if necessary to have a uniform access:
lines = order['OrderLines']['OrderLine']
lines = [lines] if not isinstance(lines, list) else lines
for line in lines:
...
You can check the type of the object you try to access:
# ...
print 'Order ID: {0}'.format(order['orderID'])
lines = order['OrderLines']['OrderLine']
if isinstance(lines, list):
for line in lines:
print line['lineID']
elif isinstance(lines, dict):
print lines['lineID']
else:
raise ValueError('Illegal JSON object')
Edit: Wrapping the dict in a list as proposed by #NPE is the nicer and smarter solution.

Find string of method in a file

Right now I have this to find a method
def getMethod(text, a, filetype):
start = a
fin = a
if filetype == "cs":
for x in range(a, 0, -1):
if text[x] == "{":
start = x
break
for x in range(a, len(text)):
if text[x] == "}":
fin = x
break
return text[start:fin + 1]
How can I get the method the index a is in?
I can't just find { and } because you can have things like new { } which won't work
if I had a file with a few methods and I wanted to find what method the index of x is in then I want the body of that method for example if I had the file
private string x(){
return "x";
}
private string b(){
return "b";
}
private string z(){
return "z";
}
private string a(){
var n = new {l = "l"};
return "a";
}
And I got the index of "a" which lets say is 100
then I want to find the body of that method. So everything within { and }
So this...
{
var n = new {l = "l"};
return "a";
}
But using what I have now it would return:
{l = "l"};
return "a";
}
If my interpretation is correct, it seems you are attempting to parse C# source code to find the C# method that includes a given position a in a .cs file, the content of which is contained in text.
Unfortunately, if you want to do a complete and accurate job, I think you would need a full C# parser.
If that sounds like too much work, I'd think about using a version of ctags that is compatible with C# to generate a tag file and then search in the tag file for the method that applies to a given source file line instead of the original source file.
As Simon stated, if your problem is to parse source code, the best bet is to get a proper parser for that language.
If you're just looking to match up the braces however, there is a well-known algorithm for that: Python parsing bracketed blocks
Just be aware that since source code is a complex beast, don't expect this to work 100%.

Categories