Change in tab indentation upon opening the file for writing [duplicate] - python

So I'm using Python 2.7, using the json module to encode the following data structure:
'layer1': {
'layer2': {
'layer3_1': [ long_list_of_stuff ],
'layer3_2': 'string'
}
}
My problem is that I'm printing everything out using pretty printing, as follows:
json.dumps(data_structure, indent=2)
Which is great, except I want to indent it all, except for the content in "layer3_1" — It's a massive dictionary listing coordinates, and as such, having a single value set on each one makes pretty printing create a file with thousands of lines, with an example as follows:
{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}
What I really want is something similar to the following:
{
"layer1": {
"layer2": {
"layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}],
"layer3_2": "string"
}
}
}
I hear it's possible to extend the json module: Is it possible to set it to only turn off indenting when inside the "layer3_1" object? If so, would somebody please tell me how?

(Note:
The code in this answer only works with json.dumps() which returns a JSON formatted string, but not with json.dump() which writes directly to file-like objects. There's a modified version of it that works with both in my answer to the question Write two-dimensional list to JSON file.)
Updated
Below is a version of my original answer that has been revised several times. Unlike the original, which I posted only to show how to get the first idea in J.F.Sebastian's answer to work, and which like his, returned a non-indented string representation of the object. The latest updated version returns the Python object JSON formatted in isolation.
The keys of each coordinate dict will appear in sorted order, as per one of the OP's comments, but only if a sort_keys=True keyword argument is specified in the initial json.dumps() call driving the process, and it no longer changes the object's type to a string along the way. In other words, the actual type of the "wrapped" object is now maintained.
I think not understanding the original intent of my post resulted in number of folks downvoting it—so, primarily for that reason, I have "fixed" and improved my answer several times. The current version is a hybrid of my original answer coupled with some of the ideas #Erik Allik used in his answer, plus useful feedback from other users shown in the comments below this answer.
The following code appears to work unchanged in both Python 2.7.16 and 3.7.4.
from _ctypes import PyObj_FromPtr
import json
import re
class NoIndent(object):
""" Value wrapper. """
def __init__(self, value):
self.value = value
class MyEncoder(json.JSONEncoder):
FORMAT_SPEC = '##{}##'
regex = re.compile(FORMAT_SPEC.format(r'(\d+)'))
def __init__(self, **kwargs):
# Save copy of any keyword argument values needed for use here.
self.__sort_keys = kwargs.get('sort_keys', None)
super(MyEncoder, self).__init__(**kwargs)
def default(self, obj):
return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent)
else super(MyEncoder, self).default(obj))
def encode(self, obj):
format_spec = self.FORMAT_SPEC # Local var to expedite access.
json_repr = super(MyEncoder, self).encode(obj) # Default JSON.
# Replace any marked-up object ids in the JSON repr with the
# value returned from the json.dumps() of the corresponding
# wrapped Python object.
for match in self.regex.finditer(json_repr):
# see https://stackoverflow.com/a/15012814/355230
id = int(match.group(1))
no_indent = PyObj_FromPtr(id)
json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys)
# Replace the matched id string with json formatted representation
# of the corresponding Python object.
json_repr = json_repr.replace(
'"{}"'.format(format_spec.format(id)), json_obj_repr)
return json_repr
if __name__ == '__main__':
from string import ascii_lowercase as letters
data_structure = {
'layer1': {
'layer2': {
'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3},
{"x":6,"y":9},
{k: v for v, k in enumerate(letters)}]),
'layer3_2': 'string',
'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4},
{"x":6,"y":9,"z":8}]),
'layer3_4': NoIndent(list(range(20))),
}
}
}
print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2))
Output:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}],
"layer3_2": "string",
"layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}],
"layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
}
}
}

A bodge, but once you have the string from dumps(), you can perform a regular expression substitution on it, if you're sure of the format of its contents. Something along the lines of:
s = json.dumps(data_structure, indent=2)
s = re.sub('\s*{\s*"(.)": (\d+),\s*"(.)": (\d+)\s*}(,?)\s*', r'{"\1":\2,"\3":\4}\5', s)

The following solution seems to work correctly on Python 2.7.x. It uses a workaround taken from Custom JSON encoder in Python 2.7 to insert plain JavaScript code to avoid custom-encoded objects ending up as JSON strings in the output by using a UUID-based replacement scheme.
class NoIndent(object):
def __init__(self, value):
self.value = value
class NoIndentEncoder(json.JSONEncoder):
def __init__(self, *args, **kwargs):
super(NoIndentEncoder, self).__init__(*args, **kwargs)
self.kwargs = dict(kwargs)
del self.kwargs['indent']
self._replacement_map = {}
def default(self, o):
if isinstance(o, NoIndent):
key = uuid.uuid4().hex
self._replacement_map[key] = json.dumps(o.value, **self.kwargs)
return "##%s##" % (key,)
else:
return super(NoIndentEncoder, self).default(o)
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in self._replacement_map.iteritems():
result = result.replace('"##%s##"' % (k,), v)
return result
Then this
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}])
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
produces the follwing output:
{
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]
}
}
}
It also correctly passes all options (except indent) e.g. sort_keys=True down to the nested json.dumps call.
obj = {
"layer1": {
"layer2": {
"layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]),
"layer3_2": "string",
}
}
}
print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)
correctly outputs:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}],
"layer3_2": "string"
}
}
}
It can also be combined with e.g. collections.OrderedDict:
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)]))
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
outputs:
{
"layer1": {
"layer2": {
"layer3_3": {"b": 1, "a": 2},
"layer3_2": "string"
}
}
}
UPDATE: In Python 3, there is no iteritems. You can replace encode with this:
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in iter(self._replacement_map.items()):
result = result.replace('"##%s##"' % (k,), v)
return result

This yields the OP's expected result:
import json
class MyJSONEncoder(json.JSONEncoder):
def iterencode(self, o, _one_shot=False):
list_lvl = 0
for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot):
if s.startswith('['):
list_lvl += 1
s = s.replace('\n', '').rstrip()
elif 0 < list_lvl:
s = s.replace('\n', '').rstrip()
if s and s[-1] == ',':
s = s[:-1] + self.item_separator
elif s and s[-1] == ':':
s = s[:-1] + self.key_separator
if s.endswith(']'):
list_lvl -= 1
yield s
o = {
"layer1":{
"layer2":{
"layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}],
"layer3_2":"string",
"layer3_3":["aaa\nbbb","ccc\nddd",{"aaa\nbbb":"ccc\nddd"}],
"layer3_4":"aaa\nbbb",
}
}
}
jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True,
cls=MyJSONEncoder)
print(jsonstr)
o2 = json.loads(jsonstr)
print('identical objects: {}'.format((o == o2)))

You could try:
mark lists that shouldn't be indented by replacing them with NoIndentList:
class NoIndentList(list):
pass
override json.Encoder.default method to produce a non-indented string representation for NoIndentList.
You could just cast it back to list and call json.dumps() without indent to get a single line
It seems the above approach doesn't work for the json module:
import json
import sys
class NoIndent(object):
def __init__(self, value):
self.value = value
def default(o, encoder=json.JSONEncoder()):
if isinstance(o, NoIndent):
return json.dumps(o.value)
return encoder.default(o)
L = [dict(x=x, y=y) for x in range(1) for y in range(2)]
obj = [NoIndent(L), L]
json.dump(obj, sys.stdout, default=default, indent=4)
It produces invalid output (the list is serialized as a string):
[
"[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]",
[
{
"y": 0,
"x": 0
},
{
"y": 1,
"x": 0
}
]
]
If you can use yaml then the method works:
import sys
import yaml
class NoIndentList(list):
pass
def noindent_list_presenter(dumper, data):
return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
yaml.add_representer(NoIndentList, noindent_list_presenter)
obj = [
[dict(x=x, y=y) for x in range(2) for y in range(1)],
[dict(x=x, y=y) for x in range(1) for y in range(2)],
]
obj[0] = NoIndentList(obj[0])
yaml.dump(obj, stream=sys.stdout, indent=4)
It produces:
- [{x: 0, y: 0}, {x: 1, y: 0}]
- - {x: 0, y: 0}
- {x: 0, y: 1}
i.e., the first list is serialized using [] and all items are on one line, the second list uses one line per item.

Here's a post-processing solution if you have too many different types of objects contributing to the JSON to attempt the JSONEncoder method and too many varying types to use a regex. This function collapses whitespace after a specified level, without needing to know the specifics of the data itself.
def collapse_json(text, indent=12):
"""Compacts a string of json data by collapsing whitespace after the
specified indent level
NOTE: will not produce correct results when indent level is not a multiple
of the json indent level
"""
initial = " " * indent
out = [] # final json output
sublevel = [] # accumulation list for sublevel entries
pending = None # holder for consecutive entries at exact indent level
for line in text.splitlines():
if line.startswith(initial):
if line[indent] == " ":
# found a line indented further than the indent level, so add
# it to the sublevel list
if pending:
# the first item in the sublevel will be the pending item
# that was the previous line in the json
sublevel.append(pending)
pending = None
item = line.strip()
sublevel.append(item)
if item.endswith(","):
sublevel.append(" ")
elif sublevel:
# found a line at the exact indent level *and* we have sublevel
# items. This means the sublevel items have come to an end
sublevel.append(line.strip())
out.append("".join(sublevel))
sublevel = []
else:
# found a line at the exact indent level but no items indented
# further, so possibly start a new sub-level
if pending:
# if there is already a pending item, it means that
# consecutive entries in the json had the exact same
# indentation and that last pending item was not the start
# of a new sublevel.
out.append(pending)
pending = line.rstrip()
else:
if pending:
# it's possible that an item will be pending but not added to
# the output yet, so make sure it's not forgotten.
out.append(pending)
pending = None
if sublevel:
out.append("".join(sublevel))
out.append(line)
return "\n".join(out)
For example, using this structure as input to json.dumps with an indent level of 4:
text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)
here's the output of the function at various indent levels:
>>> print collapse_json(text, indent=0)
{"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]}
>>> print collapse_json(text, indent=4)
{
"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]
}
>>> print collapse_json(text, indent=8)
{
"zero": [
"first",
{"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}
]
}
>>> print collapse_json(text, indent=12)
{
"zero": [
"first",
{
"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
>>> print collapse_json(text, indent=16)
{
"zero": [
"first",
{
"items": [
[1, 2, 3, 4],
[5, 6, 7, 8],
9,
10,
[11, [12, [13, [14, 15]]]]
],
"second": 2,
"fourth": 4,
"third": 3
}
]
}

Best performance code (10MB text costs 1s):
import json
def dumps_json(data, indent=2, depth=2):
assert depth > 0
space = ' '*indent
s = json.dumps(data, indent=indent)
lines = s.splitlines()
N = len(lines)
# determine which lines to be shortened
is_over_depth_line = lambda i: i in range(N) and lines[i].startswith(space*(depth+1))
is_open_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i+1)
is_close_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i-1)
#
def shorten_line(line_index):
if not is_open_bracket_line(line_index):
return lines[line_index]
# shorten over-depth lines
start = line_index
end = start
while not is_close_bracket_line(end):
end += 1
has_trailing_comma = lines[end][-1] == ','
_lines = [lines[start][-1], *lines[start+1:end], lines[end].replace(',','')]
d = json.dumps(json.loads(' '.join(_lines)))
return lines[line_index][:-1] + d + (',' if has_trailing_comma else '')
#
s = '\n'.join([
shorten_line(i)
for i in range(N) if not is_over_depth_line(i) and not is_close_bracket_line(i)
])
#
return s
UPDATE:
Here's my explanation:
First we use json.dumps to get json string has been indented.
Example:
>>> print(json.dumps({'0':{'1a':{'2a':None,'2b':None},'1b':{'2':None}}}, indent=2))
[0] {
[1] "0": {
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
[6] "1b": {
[7] "2": null
[8] }
[9] }
[10] }
If we set indent=2 and depth = 2, then too depth lines start with 6 white-spaces
We has 4 types of line:
Normal line
Open bracket line (2,6)
Exceed depth line (3,4,7)
Close bracket line (5,8)
We will try to merge a sequence of lines (type 2 + 3 + 4) into one single line.
Example:
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
will be merged into:
[2] "1a": {"2a": null, "2b": null},
NOTE: Close bracket line may has trailing comma

Answer for me and Python 3 users
import re
def jsonIndentLimit(jsonString, indent, limit):
regexPattern = re.compile(f'\n({indent}){{{limit}}}(({indent})+|(?=(}}|])))')
return regexPattern.sub('', jsonString)
if __name__ == '__main__':
jsonString = '''{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}'''
print(jsonIndentLimit(jsonString, ' ', 3))
'''print
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
"layer3_2": "string"
}
}
}'''

This solution is not so elegant and generic as the others and you will not learn much from it but it's quick and simple.
def custom_print(data_structure, indent):
for key, value in data_structure.items():
print "\n%s%s:" % (' '*indent,str(key)),
if isinstance(value, dict):
custom_print(value, indent+1)
else:
print "%s" % (str(value)),
Usage and output:
>>> custom_print(data_structure,1)
layer1:
layer2:
layer3_2: string
layer3_1: [{'y': 7, 'x': 1}, {'y': 4, 'x': 0}, {'y': 3, 'x': 5}, {'y': 9, 'x': 6}]

As a side note, this website has a built-in JavaScript that will avoid line feeds in JSON strings when lines are shorter than 70 chars:
http://www.csvjson.com/json_beautifier
(was implemented using a modified version of JSON-js)
Select "Inline short arrays"
Great for quickly viewing data that you have in the copy buffer.

Indeed, one of things YAML is better than JSON.
I can't get NoIndentEncoder to work..., but I can use regex on JSON string...
def collapse_json(text, list_length=5):
for length in range(list_length):
re_pattern = r'\[' + (r'\s*(.+)\s*,' * length)[:-1] + r'\]'
re_repl = r'[' + ''.join(r'\{}, '.format(i+1) for i in range(length))[:-2] + r']'
text = re.sub(re_pattern, re_repl, text)
return text
The question is, how do I perform this on a nested list?
Before:
[
0,
"any",
[
2,
3
]
]
After:
[0, "any", [2, 3]]

Related

Get all parents keys in nested dictionary for all items

I want to get all parent keys for all items in a nested python dictionary with unlimited levels. Take an analogy, if you think of a nested dictionary as a directory containing sub-directories, the behaviour I want is similar to what glob.glob(dir, recursive=True) does.
For example, suppose we have the following dictionary:
sample_dict = {
"key_1": {
"sub_key_1": 1,
"sub_key_2": 2,
},
"key_2": {
"sub_key_1": 3,
"sub_key_2": {
"sub_sub_key_1": 4,
},
},
}
I want to get the full "path" of every value in the dictionary:
["key_1", "sub_key_1", 1]
["key_1", "sub_key_2", 2]
["key_2", "sub_key_1", 3]
["key_2", "sub_key_2", "sub_sub_key_1", 4]
Just wondering if there is a clean way to do that?
Using generators can often simplify the code for these type of tasks and make them much more readable while avoiding passing explicit state arguments to the function. You get a generator instead of a list, but this is a good thing because you can evaluate lazily if you want to. For example:
def getpaths(d):
if not isinstance(d, dict):
yield [d]
else:
yield from ([k] + w for k, v in d.items() for w in getpaths(v))
result = list(getpaths(sample_dict))
Result will be:
[['key_1', 'sub_key_1', 1],
['key_1', 'sub_key_2', 2],
['key_2', 'sub_key_1', 3],
['key_2', 'sub_key_2', 'sub_sub_key_1', 4]]
You can solve it recursively
sample_dict = {
"key_1": {
"sub_key_1": 1,
"sub_key_2": 2,
},
"key_2": {
"sub_key_1": 3,
"sub_key_2": {
"sub_sub_key_1": 4,
},
}
}
def full_paths(sample_dict, paths=[], parent_keys=[]):
for key in sample_dict.keys():
if type(sample_dict[key]) is dict:
full_paths(sample_dict[key], paths=paths, parent_keys=(parent_keys + [key]))
else:
paths.append(parent_keys + [key] + [sample_dict[key]])
return paths
print(full_paths(sample_dict))
You can use this solution.
sample_dict = {
"key_1": {
"sub_key_1": 1,
"sub_key_2": 2,
},
"key_2": {
"sub_key_1": 3,
"sub_key_2": {
"sub_sub_key_1": 4,
},
},
}
def key_find(sample_dict, li=[]):
for key, val in sample_dict.items():
if isinstance(val, dict):
key_find(val, li=li + [key])
else:
print(li + [key] + [val])
key_find(sample_dict)

Use json.dumps() to append to json multi dimensional array with custom formatting? [duplicate]

So I'm using Python 2.7, using the json module to encode the following data structure:
'layer1': {
'layer2': {
'layer3_1': [ long_list_of_stuff ],
'layer3_2': 'string'
}
}
My problem is that I'm printing everything out using pretty printing, as follows:
json.dumps(data_structure, indent=2)
Which is great, except I want to indent it all, except for the content in "layer3_1" — It's a massive dictionary listing coordinates, and as such, having a single value set on each one makes pretty printing create a file with thousands of lines, with an example as follows:
{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}
What I really want is something similar to the following:
{
"layer1": {
"layer2": {
"layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}],
"layer3_2": "string"
}
}
}
I hear it's possible to extend the json module: Is it possible to set it to only turn off indenting when inside the "layer3_1" object? If so, would somebody please tell me how?
(Note:
The code in this answer only works with json.dumps() which returns a JSON formatted string, but not with json.dump() which writes directly to file-like objects. There's a modified version of it that works with both in my answer to the question Write two-dimensional list to JSON file.)
Updated
Below is a version of my original answer that has been revised several times. Unlike the original, which I posted only to show how to get the first idea in J.F.Sebastian's answer to work, and which like his, returned a non-indented string representation of the object. The latest updated version returns the Python object JSON formatted in isolation.
The keys of each coordinate dict will appear in sorted order, as per one of the OP's comments, but only if a sort_keys=True keyword argument is specified in the initial json.dumps() call driving the process, and it no longer changes the object's type to a string along the way. In other words, the actual type of the "wrapped" object is now maintained.
I think not understanding the original intent of my post resulted in number of folks downvoting it—so, primarily for that reason, I have "fixed" and improved my answer several times. The current version is a hybrid of my original answer coupled with some of the ideas #Erik Allik used in his answer, plus useful feedback from other users shown in the comments below this answer.
The following code appears to work unchanged in both Python 2.7.16 and 3.7.4.
from _ctypes import PyObj_FromPtr
import json
import re
class NoIndent(object):
""" Value wrapper. """
def __init__(self, value):
self.value = value
class MyEncoder(json.JSONEncoder):
FORMAT_SPEC = '##{}##'
regex = re.compile(FORMAT_SPEC.format(r'(\d+)'))
def __init__(self, **kwargs):
# Save copy of any keyword argument values needed for use here.
self.__sort_keys = kwargs.get('sort_keys', None)
super(MyEncoder, self).__init__(**kwargs)
def default(self, obj):
return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent)
else super(MyEncoder, self).default(obj))
def encode(self, obj):
format_spec = self.FORMAT_SPEC # Local var to expedite access.
json_repr = super(MyEncoder, self).encode(obj) # Default JSON.
# Replace any marked-up object ids in the JSON repr with the
# value returned from the json.dumps() of the corresponding
# wrapped Python object.
for match in self.regex.finditer(json_repr):
# see https://stackoverflow.com/a/15012814/355230
id = int(match.group(1))
no_indent = PyObj_FromPtr(id)
json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys)
# Replace the matched id string with json formatted representation
# of the corresponding Python object.
json_repr = json_repr.replace(
'"{}"'.format(format_spec.format(id)), json_obj_repr)
return json_repr
if __name__ == '__main__':
from string import ascii_lowercase as letters
data_structure = {
'layer1': {
'layer2': {
'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3},
{"x":6,"y":9},
{k: v for v, k in enumerate(letters)}]),
'layer3_2': 'string',
'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4},
{"x":6,"y":9,"z":8}]),
'layer3_4': NoIndent(list(range(20))),
}
}
}
print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2))
Output:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}],
"layer3_2": "string",
"layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}],
"layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
}
}
}
A bodge, but once you have the string from dumps(), you can perform a regular expression substitution on it, if you're sure of the format of its contents. Something along the lines of:
s = json.dumps(data_structure, indent=2)
s = re.sub('\s*{\s*"(.)": (\d+),\s*"(.)": (\d+)\s*}(,?)\s*', r'{"\1":\2,"\3":\4}\5', s)
The following solution seems to work correctly on Python 2.7.x. It uses a workaround taken from Custom JSON encoder in Python 2.7 to insert plain JavaScript code to avoid custom-encoded objects ending up as JSON strings in the output by using a UUID-based replacement scheme.
class NoIndent(object):
def __init__(self, value):
self.value = value
class NoIndentEncoder(json.JSONEncoder):
def __init__(self, *args, **kwargs):
super(NoIndentEncoder, self).__init__(*args, **kwargs)
self.kwargs = dict(kwargs)
del self.kwargs['indent']
self._replacement_map = {}
def default(self, o):
if isinstance(o, NoIndent):
key = uuid.uuid4().hex
self._replacement_map[key] = json.dumps(o.value, **self.kwargs)
return "##%s##" % (key,)
else:
return super(NoIndentEncoder, self).default(o)
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in self._replacement_map.iteritems():
result = result.replace('"##%s##"' % (k,), v)
return result
Then this
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}])
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
produces the follwing output:
{
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]
}
}
}
It also correctly passes all options (except indent) e.g. sort_keys=True down to the nested json.dumps call.
obj = {
"layer1": {
"layer2": {
"layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]),
"layer3_2": "string",
}
}
}
print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)
correctly outputs:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}],
"layer3_2": "string"
}
}
}
It can also be combined with e.g. collections.OrderedDict:
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)]))
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
outputs:
{
"layer1": {
"layer2": {
"layer3_3": {"b": 1, "a": 2},
"layer3_2": "string"
}
}
}
UPDATE: In Python 3, there is no iteritems. You can replace encode with this:
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in iter(self._replacement_map.items()):
result = result.replace('"##%s##"' % (k,), v)
return result
This yields the OP's expected result:
import json
class MyJSONEncoder(json.JSONEncoder):
def iterencode(self, o, _one_shot=False):
list_lvl = 0
for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot):
if s.startswith('['):
list_lvl += 1
s = s.replace('\n', '').rstrip()
elif 0 < list_lvl:
s = s.replace('\n', '').rstrip()
if s and s[-1] == ',':
s = s[:-1] + self.item_separator
elif s and s[-1] == ':':
s = s[:-1] + self.key_separator
if s.endswith(']'):
list_lvl -= 1
yield s
o = {
"layer1":{
"layer2":{
"layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}],
"layer3_2":"string",
"layer3_3":["aaa\nbbb","ccc\nddd",{"aaa\nbbb":"ccc\nddd"}],
"layer3_4":"aaa\nbbb",
}
}
}
jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True,
cls=MyJSONEncoder)
print(jsonstr)
o2 = json.loads(jsonstr)
print('identical objects: {}'.format((o == o2)))
You could try:
mark lists that shouldn't be indented by replacing them with NoIndentList:
class NoIndentList(list):
pass
override json.Encoder.default method to produce a non-indented string representation for NoIndentList.
You could just cast it back to list and call json.dumps() without indent to get a single line
It seems the above approach doesn't work for the json module:
import json
import sys
class NoIndent(object):
def __init__(self, value):
self.value = value
def default(o, encoder=json.JSONEncoder()):
if isinstance(o, NoIndent):
return json.dumps(o.value)
return encoder.default(o)
L = [dict(x=x, y=y) for x in range(1) for y in range(2)]
obj = [NoIndent(L), L]
json.dump(obj, sys.stdout, default=default, indent=4)
It produces invalid output (the list is serialized as a string):
[
"[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]",
[
{
"y": 0,
"x": 0
},
{
"y": 1,
"x": 0
}
]
]
If you can use yaml then the method works:
import sys
import yaml
class NoIndentList(list):
pass
def noindent_list_presenter(dumper, data):
return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
yaml.add_representer(NoIndentList, noindent_list_presenter)
obj = [
[dict(x=x, y=y) for x in range(2) for y in range(1)],
[dict(x=x, y=y) for x in range(1) for y in range(2)],
]
obj[0] = NoIndentList(obj[0])
yaml.dump(obj, stream=sys.stdout, indent=4)
It produces:
- [{x: 0, y: 0}, {x: 1, y: 0}]
- - {x: 0, y: 0}
- {x: 0, y: 1}
i.e., the first list is serialized using [] and all items are on one line, the second list uses one line per item.
Here's a post-processing solution if you have too many different types of objects contributing to the JSON to attempt the JSONEncoder method and too many varying types to use a regex. This function collapses whitespace after a specified level, without needing to know the specifics of the data itself.
def collapse_json(text, indent=12):
"""Compacts a string of json data by collapsing whitespace after the
specified indent level
NOTE: will not produce correct results when indent level is not a multiple
of the json indent level
"""
initial = " " * indent
out = [] # final json output
sublevel = [] # accumulation list for sublevel entries
pending = None # holder for consecutive entries at exact indent level
for line in text.splitlines():
if line.startswith(initial):
if line[indent] == " ":
# found a line indented further than the indent level, so add
# it to the sublevel list
if pending:
# the first item in the sublevel will be the pending item
# that was the previous line in the json
sublevel.append(pending)
pending = None
item = line.strip()
sublevel.append(item)
if item.endswith(","):
sublevel.append(" ")
elif sublevel:
# found a line at the exact indent level *and* we have sublevel
# items. This means the sublevel items have come to an end
sublevel.append(line.strip())
out.append("".join(sublevel))
sublevel = []
else:
# found a line at the exact indent level but no items indented
# further, so possibly start a new sub-level
if pending:
# if there is already a pending item, it means that
# consecutive entries in the json had the exact same
# indentation and that last pending item was not the start
# of a new sublevel.
out.append(pending)
pending = line.rstrip()
else:
if pending:
# it's possible that an item will be pending but not added to
# the output yet, so make sure it's not forgotten.
out.append(pending)
pending = None
if sublevel:
out.append("".join(sublevel))
out.append(line)
return "\n".join(out)
For example, using this structure as input to json.dumps with an indent level of 4:
text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)
here's the output of the function at various indent levels:
>>> print collapse_json(text, indent=0)
{"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]}
>>> print collapse_json(text, indent=4)
{
"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]
}
>>> print collapse_json(text, indent=8)
{
"zero": [
"first",
{"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}
]
}
>>> print collapse_json(text, indent=12)
{
"zero": [
"first",
{
"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
>>> print collapse_json(text, indent=16)
{
"zero": [
"first",
{
"items": [
[1, 2, 3, 4],
[5, 6, 7, 8],
9,
10,
[11, [12, [13, [14, 15]]]]
],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
Best performance code (10MB text costs 1s):
import json
def dumps_json(data, indent=2, depth=2):
assert depth > 0
space = ' '*indent
s = json.dumps(data, indent=indent)
lines = s.splitlines()
N = len(lines)
# determine which lines to be shortened
is_over_depth_line = lambda i: i in range(N) and lines[i].startswith(space*(depth+1))
is_open_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i+1)
is_close_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i-1)
#
def shorten_line(line_index):
if not is_open_bracket_line(line_index):
return lines[line_index]
# shorten over-depth lines
start = line_index
end = start
while not is_close_bracket_line(end):
end += 1
has_trailing_comma = lines[end][-1] == ','
_lines = [lines[start][-1], *lines[start+1:end], lines[end].replace(',','')]
d = json.dumps(json.loads(' '.join(_lines)))
return lines[line_index][:-1] + d + (',' if has_trailing_comma else '')
#
s = '\n'.join([
shorten_line(i)
for i in range(N) if not is_over_depth_line(i) and not is_close_bracket_line(i)
])
#
return s
UPDATE:
Here's my explanation:
First we use json.dumps to get json string has been indented.
Example:
>>> print(json.dumps({'0':{'1a':{'2a':None,'2b':None},'1b':{'2':None}}}, indent=2))
[0] {
[1] "0": {
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
[6] "1b": {
[7] "2": null
[8] }
[9] }
[10] }
If we set indent=2 and depth = 2, then too depth lines start with 6 white-spaces
We has 4 types of line:
Normal line
Open bracket line (2,6)
Exceed depth line (3,4,7)
Close bracket line (5,8)
We will try to merge a sequence of lines (type 2 + 3 + 4) into one single line.
Example:
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
will be merged into:
[2] "1a": {"2a": null, "2b": null},
NOTE: Close bracket line may has trailing comma
Answer for me and Python 3 users
import re
def jsonIndentLimit(jsonString, indent, limit):
regexPattern = re.compile(f'\n({indent}){{{limit}}}(({indent})+|(?=(}}|])))')
return regexPattern.sub('', jsonString)
if __name__ == '__main__':
jsonString = '''{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}'''
print(jsonIndentLimit(jsonString, ' ', 3))
'''print
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
"layer3_2": "string"
}
}
}'''
This solution is not so elegant and generic as the others and you will not learn much from it but it's quick and simple.
def custom_print(data_structure, indent):
for key, value in data_structure.items():
print "\n%s%s:" % (' '*indent,str(key)),
if isinstance(value, dict):
custom_print(value, indent+1)
else:
print "%s" % (str(value)),
Usage and output:
>>> custom_print(data_structure,1)
layer1:
layer2:
layer3_2: string
layer3_1: [{'y': 7, 'x': 1}, {'y': 4, 'x': 0}, {'y': 3, 'x': 5}, {'y': 9, 'x': 6}]
As a side note, this website has a built-in JavaScript that will avoid line feeds in JSON strings when lines are shorter than 70 chars:
http://www.csvjson.com/json_beautifier
(was implemented using a modified version of JSON-js)
Select "Inline short arrays"
Great for quickly viewing data that you have in the copy buffer.
Indeed, one of things YAML is better than JSON.
I can't get NoIndentEncoder to work..., but I can use regex on JSON string...
def collapse_json(text, list_length=5):
for length in range(list_length):
re_pattern = r'\[' + (r'\s*(.+)\s*,' * length)[:-1] + r'\]'
re_repl = r'[' + ''.join(r'\{}, '.format(i+1) for i in range(length))[:-2] + r']'
text = re.sub(re_pattern, re_repl, text)
return text
The question is, how do I perform this on a nested list?
Before:
[
0,
"any",
[
2,
3
]
]
After:
[0, "any", [2, 3]]

Python - Print when a json object gets higher number

I want to create a script where I check a json file times to times using a while function. In there there is a json that looks like:
{
"names":[
{
"name":"hello",
"numbers":0
},
{
"name":"stack",
"numbers":1
},
{
"name":"over",
"numbers":2
},
{
"name":"flow",
"numbers":12
},
{
"name":"how",
"numbers":17
},
{
"name":"are",
"numbers":11
},
{
"name":"you",
"numbers":18
},
{
"name":"today",
"numbers":6
},
{
"name":"merry",
"numbers":4
},
{
"name":"x",
"numbers":1
},
{
"name":"mass",
"numbers":0
},
{
"name":"santa",
"numbers":4
},
{
"name":"hohoho",
"numbers":1
}
]
}
and what I want to do is that I want to check every number if numbers for each name has been increased than previous json look.
def script():
with open('data.json') as f:
old_data = json.load(f)
while True:
with open('data.json') as f:
new_data = json.load(f)
if old_data < new_data:
print("Bigger!!" + new_data['name'])
old_data = new_data
else:
randomtime = random.randint(5, 15)
print("Nothing increased")
old_data = new_data
time.sleep(randomtime)
Now I know that I have done it wrong and that's the reason I am here. I have no idea at this moment what I can do to make a sort of function where it checks numbers by numbers to see if its gotten bigger or not.
My question is:
How can I make it so it checks object by object to see if the numbers has gotten bigger from previous loop? and if it has not gotten bigger but lower, it should update the value of old_data and loops forever until the numbers has gotten bigger than previous loop?
EDIT:
Recommendation that I got from #Karl
{
'names': {
'hello': 0,
'stack': 0,
'over': 2,
'flow': 12,
'how': 17,
'are': 11,
'you': 18,
'today': 6,
'merry': 4,
'x': 1,
'mass': 0,
'santa': 4,
'hohoho': 1
}
}
Assuming your json is in this format:
{
"names": {
"hello": 0,
"stack": 1,
"over": 2,
"flow": 13,
"how": 17,
"are": 12,
"you": 18,
"today": 6,
"merry": 4,
"x": 1,
"mass": 0,
"santa": 4,
"hohoho": 1
}
}
I would do something along the following lines:
import json
import time
with open("data.json") as f:
old_data = json.load(f)["names"]
while True:
with open("data.json") as f:
new_data = json.load(f)["names"]
for name, number in new_data.items():
if number > old_data[name]:
print("Entry '{0}' has increased from {1} to {2}".format(name, old_data[name], number))
old_data = new_data
print("sleeping for 5 seconds")
time.sleep(5)
EDIT to answer question posted in comment "just curious, lets say if I want to add another value beside the numbers etc "stack": 1, yes (Yes and no to each of format), What would be needed to do in that case? (Just a script that I want to develop from this)".
In that case you should design your json input as follows:
{
"names": {
"hello": {
"number": 0,
"status": true
},
"stack": {
"number": 1,
"status": true
},
"over": {
"number": 2,
"status": false
},
...
}
}
You would need to change the lookups in the comparison script as follows:
for name, values in new_data.items():
if values["number"] > old_data[name]["number"]
(Note that for status you could also just have "yes" or "no" as inputs, but using booleans is must more useful when you have to represent a binary choice like this).
By the way, unless you aim to have objects other than names in this json, you can leave out that level and just make it:
{
"hello": {
"number": 0,
"status": true
},
"stack": {
"number": 1,
"status": true
},
"over": {
"number": 2,
"status": false
},
...
}
In that case, replace old_data = json.load(f)["names"] with old_data = json.load(f) and new_data= json.load(f)["names"] with new_data= json.load(f)
I took your original .json which you edited and presented in your question and re-factored your code to the below example. It appears to be working.
import time
import random
import json
path_to_file = r"C:\path\to\.json"
def script():
with open(path_to_file) as f:
d = json.load(f)
old_data = 0
for a_list in d.values():
for i in a_list:
print()
for d_keys, d_values in i.items():
print(d_keys, d_values)
if type(d_values) == int and d_values > old_data:
print("Bigger!!" + i['name'])
old_data = d_values
elif type(d_values) == int and d_values < old_data:
print("Nothing increased")
old_data = d_values
randomtime = random.randint(5, 15)
time.sleep(randomtime)
script()
This is the output I receive:
name hello numbers 0
name stack numbers 1 Bigger!!stack
name over numbers 2 Bigger!!over
name flow numbers 12 Bigger!!flow
name how numbers 17 Bigger!!how
name are numbers 11 Nothing increased
name you numbers 18 Bigger!!you
name today numbers 6 Nothing increased
name merry numbers 4 Nothing increased
name x numbers 1 Nothing increased
name mass numbers 0 Nothing increased
name santa numbers 4 Bigger!!santa
name hohoho numbers 1 Nothing increased

How to implement custom indentation when pretty-printing with the JSON module?

So I'm using Python 2.7, using the json module to encode the following data structure:
'layer1': {
'layer2': {
'layer3_1': [ long_list_of_stuff ],
'layer3_2': 'string'
}
}
My problem is that I'm printing everything out using pretty printing, as follows:
json.dumps(data_structure, indent=2)
Which is great, except I want to indent it all, except for the content in "layer3_1" — It's a massive dictionary listing coordinates, and as such, having a single value set on each one makes pretty printing create a file with thousands of lines, with an example as follows:
{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}
What I really want is something similar to the following:
{
"layer1": {
"layer2": {
"layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}],
"layer3_2": "string"
}
}
}
I hear it's possible to extend the json module: Is it possible to set it to only turn off indenting when inside the "layer3_1" object? If so, would somebody please tell me how?
(Note:
The code in this answer only works with json.dumps() which returns a JSON formatted string, but not with json.dump() which writes directly to file-like objects. There's a modified version of it that works with both in my answer to the question Write two-dimensional list to JSON file.)
Updated
Below is a version of my original answer that has been revised several times. Unlike the original, which I posted only to show how to get the first idea in J.F.Sebastian's answer to work, and which like his, returned a non-indented string representation of the object. The latest updated version returns the Python object JSON formatted in isolation.
The keys of each coordinate dict will appear in sorted order, as per one of the OP's comments, but only if a sort_keys=True keyword argument is specified in the initial json.dumps() call driving the process, and it no longer changes the object's type to a string along the way. In other words, the actual type of the "wrapped" object is now maintained.
I think not understanding the original intent of my post resulted in number of folks downvoting it—so, primarily for that reason, I have "fixed" and improved my answer several times. The current version is a hybrid of my original answer coupled with some of the ideas #Erik Allik used in his answer, plus useful feedback from other users shown in the comments below this answer.
The following code appears to work unchanged in both Python 2.7.16 and 3.7.4.
from _ctypes import PyObj_FromPtr
import json
import re
class NoIndent(object):
""" Value wrapper. """
def __init__(self, value):
self.value = value
class MyEncoder(json.JSONEncoder):
FORMAT_SPEC = '##{}##'
regex = re.compile(FORMAT_SPEC.format(r'(\d+)'))
def __init__(self, **kwargs):
# Save copy of any keyword argument values needed for use here.
self.__sort_keys = kwargs.get('sort_keys', None)
super(MyEncoder, self).__init__(**kwargs)
def default(self, obj):
return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent)
else super(MyEncoder, self).default(obj))
def encode(self, obj):
format_spec = self.FORMAT_SPEC # Local var to expedite access.
json_repr = super(MyEncoder, self).encode(obj) # Default JSON.
# Replace any marked-up object ids in the JSON repr with the
# value returned from the json.dumps() of the corresponding
# wrapped Python object.
for match in self.regex.finditer(json_repr):
# see https://stackoverflow.com/a/15012814/355230
id = int(match.group(1))
no_indent = PyObj_FromPtr(id)
json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys)
# Replace the matched id string with json formatted representation
# of the corresponding Python object.
json_repr = json_repr.replace(
'"{}"'.format(format_spec.format(id)), json_obj_repr)
return json_repr
if __name__ == '__main__':
from string import ascii_lowercase as letters
data_structure = {
'layer1': {
'layer2': {
'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3},
{"x":6,"y":9},
{k: v for v, k in enumerate(letters)}]),
'layer3_2': 'string',
'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4},
{"x":6,"y":9,"z":8}]),
'layer3_4': NoIndent(list(range(20))),
}
}
}
print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2))
Output:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}],
"layer3_2": "string",
"layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}],
"layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
}
}
}
A bodge, but once you have the string from dumps(), you can perform a regular expression substitution on it, if you're sure of the format of its contents. Something along the lines of:
s = json.dumps(data_structure, indent=2)
s = re.sub('\s*{\s*"(.)": (\d+),\s*"(.)": (\d+)\s*}(,?)\s*', r'{"\1":\2,"\3":\4}\5', s)
The following solution seems to work correctly on Python 2.7.x. It uses a workaround taken from Custom JSON encoder in Python 2.7 to insert plain JavaScript code to avoid custom-encoded objects ending up as JSON strings in the output by using a UUID-based replacement scheme.
class NoIndent(object):
def __init__(self, value):
self.value = value
class NoIndentEncoder(json.JSONEncoder):
def __init__(self, *args, **kwargs):
super(NoIndentEncoder, self).__init__(*args, **kwargs)
self.kwargs = dict(kwargs)
del self.kwargs['indent']
self._replacement_map = {}
def default(self, o):
if isinstance(o, NoIndent):
key = uuid.uuid4().hex
self._replacement_map[key] = json.dumps(o.value, **self.kwargs)
return "##%s##" % (key,)
else:
return super(NoIndentEncoder, self).default(o)
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in self._replacement_map.iteritems():
result = result.replace('"##%s##"' % (k,), v)
return result
Then this
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}])
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
produces the follwing output:
{
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]
}
}
}
It also correctly passes all options (except indent) e.g. sort_keys=True down to the nested json.dumps call.
obj = {
"layer1": {
"layer2": {
"layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]),
"layer3_2": "string",
}
}
}
print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)
correctly outputs:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}],
"layer3_2": "string"
}
}
}
It can also be combined with e.g. collections.OrderedDict:
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)]))
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
outputs:
{
"layer1": {
"layer2": {
"layer3_3": {"b": 1, "a": 2},
"layer3_2": "string"
}
}
}
UPDATE: In Python 3, there is no iteritems. You can replace encode with this:
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in iter(self._replacement_map.items()):
result = result.replace('"##%s##"' % (k,), v)
return result
This yields the OP's expected result:
import json
class MyJSONEncoder(json.JSONEncoder):
def iterencode(self, o, _one_shot=False):
list_lvl = 0
for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot):
if s.startswith('['):
list_lvl += 1
s = s.replace('\n', '').rstrip()
elif 0 < list_lvl:
s = s.replace('\n', '').rstrip()
if s and s[-1] == ',':
s = s[:-1] + self.item_separator
elif s and s[-1] == ':':
s = s[:-1] + self.key_separator
if s.endswith(']'):
list_lvl -= 1
yield s
o = {
"layer1":{
"layer2":{
"layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}],
"layer3_2":"string",
"layer3_3":["aaa\nbbb","ccc\nddd",{"aaa\nbbb":"ccc\nddd"}],
"layer3_4":"aaa\nbbb",
}
}
}
jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True,
cls=MyJSONEncoder)
print(jsonstr)
o2 = json.loads(jsonstr)
print('identical objects: {}'.format((o == o2)))
You could try:
mark lists that shouldn't be indented by replacing them with NoIndentList:
class NoIndentList(list):
pass
override json.Encoder.default method to produce a non-indented string representation for NoIndentList.
You could just cast it back to list and call json.dumps() without indent to get a single line
It seems the above approach doesn't work for the json module:
import json
import sys
class NoIndent(object):
def __init__(self, value):
self.value = value
def default(o, encoder=json.JSONEncoder()):
if isinstance(o, NoIndent):
return json.dumps(o.value)
return encoder.default(o)
L = [dict(x=x, y=y) for x in range(1) for y in range(2)]
obj = [NoIndent(L), L]
json.dump(obj, sys.stdout, default=default, indent=4)
It produces invalid output (the list is serialized as a string):
[
"[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]",
[
{
"y": 0,
"x": 0
},
{
"y": 1,
"x": 0
}
]
]
If you can use yaml then the method works:
import sys
import yaml
class NoIndentList(list):
pass
def noindent_list_presenter(dumper, data):
return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
yaml.add_representer(NoIndentList, noindent_list_presenter)
obj = [
[dict(x=x, y=y) for x in range(2) for y in range(1)],
[dict(x=x, y=y) for x in range(1) for y in range(2)],
]
obj[0] = NoIndentList(obj[0])
yaml.dump(obj, stream=sys.stdout, indent=4)
It produces:
- [{x: 0, y: 0}, {x: 1, y: 0}]
- - {x: 0, y: 0}
- {x: 0, y: 1}
i.e., the first list is serialized using [] and all items are on one line, the second list uses one line per item.
Here's a post-processing solution if you have too many different types of objects contributing to the JSON to attempt the JSONEncoder method and too many varying types to use a regex. This function collapses whitespace after a specified level, without needing to know the specifics of the data itself.
def collapse_json(text, indent=12):
"""Compacts a string of json data by collapsing whitespace after the
specified indent level
NOTE: will not produce correct results when indent level is not a multiple
of the json indent level
"""
initial = " " * indent
out = [] # final json output
sublevel = [] # accumulation list for sublevel entries
pending = None # holder for consecutive entries at exact indent level
for line in text.splitlines():
if line.startswith(initial):
if line[indent] == " ":
# found a line indented further than the indent level, so add
# it to the sublevel list
if pending:
# the first item in the sublevel will be the pending item
# that was the previous line in the json
sublevel.append(pending)
pending = None
item = line.strip()
sublevel.append(item)
if item.endswith(","):
sublevel.append(" ")
elif sublevel:
# found a line at the exact indent level *and* we have sublevel
# items. This means the sublevel items have come to an end
sublevel.append(line.strip())
out.append("".join(sublevel))
sublevel = []
else:
# found a line at the exact indent level but no items indented
# further, so possibly start a new sub-level
if pending:
# if there is already a pending item, it means that
# consecutive entries in the json had the exact same
# indentation and that last pending item was not the start
# of a new sublevel.
out.append(pending)
pending = line.rstrip()
else:
if pending:
# it's possible that an item will be pending but not added to
# the output yet, so make sure it's not forgotten.
out.append(pending)
pending = None
if sublevel:
out.append("".join(sublevel))
out.append(line)
return "\n".join(out)
For example, using this structure as input to json.dumps with an indent level of 4:
text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)
here's the output of the function at various indent levels:
>>> print collapse_json(text, indent=0)
{"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]}
>>> print collapse_json(text, indent=4)
{
"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]
}
>>> print collapse_json(text, indent=8)
{
"zero": [
"first",
{"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}
]
}
>>> print collapse_json(text, indent=12)
{
"zero": [
"first",
{
"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
>>> print collapse_json(text, indent=16)
{
"zero": [
"first",
{
"items": [
[1, 2, 3, 4],
[5, 6, 7, 8],
9,
10,
[11, [12, [13, [14, 15]]]]
],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
Best performance code (10MB text costs 1s):
import json
def dumps_json(data, indent=2, depth=2):
assert depth > 0
space = ' '*indent
s = json.dumps(data, indent=indent)
lines = s.splitlines()
N = len(lines)
# determine which lines to be shortened
is_over_depth_line = lambda i: i in range(N) and lines[i].startswith(space*(depth+1))
is_open_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i+1)
is_close_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i-1)
#
def shorten_line(line_index):
if not is_open_bracket_line(line_index):
return lines[line_index]
# shorten over-depth lines
start = line_index
end = start
while not is_close_bracket_line(end):
end += 1
has_trailing_comma = lines[end][-1] == ','
_lines = [lines[start][-1], *lines[start+1:end], lines[end].replace(',','')]
d = json.dumps(json.loads(' '.join(_lines)))
return lines[line_index][:-1] + d + (',' if has_trailing_comma else '')
#
s = '\n'.join([
shorten_line(i)
for i in range(N) if not is_over_depth_line(i) and not is_close_bracket_line(i)
])
#
return s
UPDATE:
Here's my explanation:
First we use json.dumps to get json string has been indented.
Example:
>>> print(json.dumps({'0':{'1a':{'2a':None,'2b':None},'1b':{'2':None}}}, indent=2))
[0] {
[1] "0": {
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
[6] "1b": {
[7] "2": null
[8] }
[9] }
[10] }
If we set indent=2 and depth = 2, then too depth lines start with 6 white-spaces
We has 4 types of line:
Normal line
Open bracket line (2,6)
Exceed depth line (3,4,7)
Close bracket line (5,8)
We will try to merge a sequence of lines (type 2 + 3 + 4) into one single line.
Example:
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
will be merged into:
[2] "1a": {"2a": null, "2b": null},
NOTE: Close bracket line may has trailing comma
Answer for me and Python 3 users
import re
def jsonIndentLimit(jsonString, indent, limit):
regexPattern = re.compile(f'\n({indent}){{{limit}}}(({indent})+|(?=(}}|])))')
return regexPattern.sub('', jsonString)
if __name__ == '__main__':
jsonString = '''{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}'''
print(jsonIndentLimit(jsonString, ' ', 3))
'''print
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
"layer3_2": "string"
}
}
}'''
This solution is not so elegant and generic as the others and you will not learn much from it but it's quick and simple.
def custom_print(data_structure, indent):
for key, value in data_structure.items():
print "\n%s%s:" % (' '*indent,str(key)),
if isinstance(value, dict):
custom_print(value, indent+1)
else:
print "%s" % (str(value)),
Usage and output:
>>> custom_print(data_structure,1)
layer1:
layer2:
layer3_2: string
layer3_1: [{'y': 7, 'x': 1}, {'y': 4, 'x': 0}, {'y': 3, 'x': 5}, {'y': 9, 'x': 6}]
As a side note, this website has a built-in JavaScript that will avoid line feeds in JSON strings when lines are shorter than 70 chars:
http://www.csvjson.com/json_beautifier
(was implemented using a modified version of JSON-js)
Select "Inline short arrays"
Great for quickly viewing data that you have in the copy buffer.
Indeed, one of things YAML is better than JSON.
I can't get NoIndentEncoder to work..., but I can use regex on JSON string...
def collapse_json(text, list_length=5):
for length in range(list_length):
re_pattern = r'\[' + (r'\s*(.+)\s*,' * length)[:-1] + r'\]'
re_repl = r'[' + ''.join(r'\{}, '.format(i+1) for i in range(length))[:-2] + r']'
text = re.sub(re_pattern, re_repl, text)
return text
The question is, how do I perform this on a nested list?
Before:
[
0,
"any",
[
2,
3
]
]
After:
[0, "any", [2, 3]]

Python pprint issues

I'm using the User object from the Google App Engine environment, and just tried the following:
pprint(user)
print vars(user)
The results:
pprint(user)
users.User(email='test#example.com',_user_id='18580000000000')
print vars(user)
{'_User__federated_identity': None, '_User__auth_domain': 'gmail.com',
'_User__email': 'test#example.com', '_User__user_id': '1858000000000',
'_User__federated_provider': None}
Several issues here (sorry for the multipart):
How come I'm not seeing all the variables in my object. It's not showing auth_domain, which has a value?
Is there a way to have it list properties that are = None? None is a legitimate value, why does it treat those properties like they don't exist?
Is there a way to get pprint to line-break between properties?
pprint is printing the repr of the instance, while vars simply returns the instance's __dict__, whose repr is then printed. Here's an example:
>>> class Foo(object):
... def __init__(self, a, b):
... self.a = a
... self.b = b
... def __repr__(self):
... return 'Foo(a=%s)' % self.a
...
>>> f = Foo(a=1, b=2)
>>> vars(f)
{'a': 1, 'b': 2}
>>> pprint.pprint(f)
Foo(a=1)
>>> vars(f) is f.__dict__
True
You see that the special method __repr__ here (called by pprint(), the print statement, repr(), and others) explicitly only includes the a member, while the instance's __dict__ contains both a and b, and is reflected by the dictionary returned by vars().
There are a couple ways to get different line breaks in an object print-dump of this kind.
Sample data:
d = dict(a=1, b=2, c=dict(d=3, e=[4, 5, 6], f=dict(g=7)), h=[8,9,10])
Standard print with no friendly spacing:
>>> print d
{'a': 1, 'h': [8, 9, 10], 'c': {'e': [4, 5, 6], 'd': 3, 'f': {'g': 7}}, 'b': 2}
Two possible solutions:
(1) Using pprint with width=1 gives you one leaf node per line, but possibly >1 keys per line:
>>> import pprint
>>> pprint.pprint(d, width=1)
{'a': 1,
'b': 2,
'c': {'d': 3,
'e': [4,
5,
6],
'f': {'g': 7}},
'h': [8,
9,
10]}
(2) Using json.dumps gives you max one key per line, but some lines with just a closing bracket:
>>> import json
>>> print json.dumps(d, indent=4)
{
"a": 1,
"h": [
8,
9,
10
],
"c": {
"e": [
4,
5,
6
],
"d": 3,
"f": {
"g": 7
}
},
"b": 2
}
In reference to question 3, "Is there a way to get pprint to line-break between properties?":
The Python Docs make this description:
The formatted representation keeps objects on a single line if it can, and breaks them onto multiple lines if they don’t fit within the allowed width.
The property "width" (passable in init) is where you specify what is allowable. I set mine to width=1, and that seems to do the trick.
As an example:
pretty = pprint.PrettyPrinter(indent=2)
results in...
{ 'acbdf': { 'abdf': { 'c': { }}, 'cbdf': { 'bdf': { 'c': { }}, 'cbd': { }}},
'cef': { 'abd': { }}}
whereas
pretty = pprint.PrettyPrinter(indent=2,width=1)
results in...
{ 'acbdf': { 'abdf': { 'c': { }},
'cbdf': { 'bdf': { 'c': { }},
'cbd': { }}},
'cef': { 'abd': { }}}
Hope that helps.

Categories