How to custom indent JSON? - python

I want to be able to indent a JSON file in a way where the first key is on the same line as the opening bracket, by deafult, a json.dump() function puts it in a new line.
For example, if the original file was:
[
{
"statusName": "CO_FILTER",
"statusBar": {}
I want it to start like this:
[
{ "statusName": "CO_FILTER",
"statusBar":{}

I believe if you wanted to custom format it, you will most likely have to make/add onto the JSON library, just like the Google Json formatter and many others. But for the most part, its not like a value you can change on demand, it will either be included in another Json library apart from the default library from python. Maybe search for different python Json libraries that dumps Json the way you like.

Related

Splitting .txt file into elements

Since we can not read directly from a Json File, I'm using a .txt one.
It looks like that with more elements seperated by ",".
[
{
"Item_Identifier": "FDW58",
"Outlet_Size": "Medium"
},
{
"Item_Identifier": "FDW14",
"Outlet_Size": "Small"
},
]
I want to count the number of elements, here I would get 2.
The problem is that I can't seperate the text into elements separated with a comma ','.
I get every line alone even if I convert it into a json format.
lines = p | 'receive_data' >> beam.io.ReadFromText(
known_args.input)\
| 'jsondumps' >> beam.Map(lambda x: json.dumps(x))\
| 'jsonloads' >> beam.Map(lambda x: json.loads(x))\
| 'print' >> beam.ParDo(PrintFn()) \
I don't believe this is a safe approach. I haven't used the python sdk (I use java) but io.TextIO on the java side is pretty clear that it will emit a PCollection where each element is one line of input from the source file. Hierarchical data formats (json, xml, etc.) are not amendable to being split apart in this way.
If you file is as well-formatted, and non-nested as the json you've included, you can get away with:
reading the file by line (as I believe you're doing)
filtering for only lines containing }
counting the resulting pcollection size
To integrate with json more generally, though, we took a different approach:
start with a PCollection of Strings where each value is the path to a file
use native libraries to access the file and parse it in a streaming fashion (we use scala which has a few streaming json parsing libraries available)
alternatively, use Beam's APIs to obtain a ReadableFile instance from a MatchResult and access the file through that
My understanding has been that not all file formats are good matches for distributed processors. Gzip for example cannot be 'split' or chunked easily. Same with JSON. CSV has an issue where the values are nonsense unless you also have the opening line handy.
A .json file is simply a text file whose contents are in JSON-parseable format.
Your JSON is invalid because it has trailing commas. This should work:
import json
j = r"""
[
{
"Item_Identifier": "FDW58",
"Outlet_Size": "Medium"
},
{
"Item_Identifier": "FDW14",
"Outlet_Size": "Small"
}
]
"""
print(json.loads(j))

YAML vs Python configuration/parameter files (but perhaps also vs JSON vs XML)

I see Python used to do a fair amount of code generation for C/C++ header and source files. Usually, the input files which store parameters are in JSON or YAML format, although most of what I see is YAML. However, why not just use Python files directly? Why use YAML at all in this case?
That also got me thinking: since Python is a scripted language, its files, when containing only data and data structures, could literally be used the same as XML, JSON, YAML, etc. Do people do this? Is there a good use case for it?
What if I want to import a configuration file into a C or C++ program? What about into a Python program? In the Python case it seems to me there is no sense in using YAML at all, as you can just store your configuration parameters and variables in pure Python files. In the C or C++ case, it seems to me you could still store your data in Python files and then just have a Python script import that and auto-generate header and source files for you as part of the build process. Again, perhaps there's no need for YAML or JSON in this case at all either.
Thoughts?
Here's an example of storing some nested key/value hash table pairs in a YAML file:
my_params.yml:
---
dict_key1:
dict_key2:
dict_key3a: my string message
dict_key3b: another string message
And the same exact thing in a pure Python file:
my_params.py
data = {
"dict_key1": {
"dict_key2": {
"dict_key3a": "my string message",
"dict_key3b": "another string message",
}
}
}
And to read in both the YAML and Python data and print it out:
import_config_file.py:
import yaml # Module for reading in YAML files
import json # Module for pretty-printing Python dictionary types
# See: https://stackoverflow.com/a/34306670/4561887
# 1) import .yml file
with open("my_params.yml", "r") as f:
data_yml = yaml.load(f)
# 2) import .py file
from my_params import data as data_py
# OR: Alternative method of doing the above:
# import my_params
# data_py = my_params.data
# 3) print them out
print("data_yml = ")
print(json.dumps(data_yml, indent=4))
print("\ndata_py = ")
print(json.dumps(data_py, indent=4))
Reference for using json.dumps: https://stackoverflow.com/a/34306670/4561887
SAMPLE OUTPUT of running python3 import_config_file.py:
data_yml =
{
"dict_key1": {
"dict_key2": {
"dict_key3a": "my string message",
"dict_key3b": "another string message"
}
}
}
data_py =
{
"dict_key1": {
"dict_key2": {
"dict_key3a": "my string message",
"dict_key3b": "another string message"
}
}
}
Yes people do this, and have been doing this for years.
But many make the mistake you do and make it unsafe to by using import my_params.py. That would be the same as loading YAML using YAML(typ='unsafe') in ruamel.yaml (or yaml.load() in PyYAML, which is unsafe).
What you should do is using the ast package that comes with Python to parse your "data" structure, to make such an import safe. My package pon has code to update these kind of structures, and in each of my __init__.py files there is such an piece of data named _package_data that is read by some code (function literal_eval) in the setup.py for the package. The ast based code in setup.py takes around ~100 lines.
The advantage of doing this in a structured way are the same as with using YAML: you can programmatically update the data structure (version numbers!), although I consider PON, (Python Object Notation), less readable than YAML and slightly less easy to manually update.

json.load changing the string that is input

Hi I am working on a simple program that takes data from a json file (input through an html form with flask handling the data) and uses this data to make calls to an API.
So I have some JSON like this:
[{"id": "ßLÙ", "server": "NA"}]
and I want to send the id to an api call like this example:
http://apicallnamewhatever+id=ßLÙ
however when i load the json file into my app.py with the following command
ids = json.load(open('../names.json'))
json.load seems to alter the id from 'ßLÙ' to 'ßLÙ'
im not sure why this happens during json.load, but i need to find a way to get 'ßLÙ' into the api call instead of the deformed 'ßLÙ'
It looks as if your names.json is encoded in "utf-8", but you are opening it as "windows-1252" [*] or something like that. Try
json.load(open('names.json', encoding="utf-8"))
and you probably should also URL-encode the id instead of concatenating it directly with that server address, something along these lines:
urllib2.quote(idExtractedFromJson.encode("utf-8")
[*] Thanks #jDo for pointing that out, I initially guessed the wrong codepage.

what is different loads, dumps vs read json file?

I have a simple json file 'stackoverflow.json"
{
"firstname": "stack",
"lastname": "overflow"
}
what is different between 2 below functions:
def read_json_file_1():
with open('stackoverflow.json') as fs:
payload = ujson.loads(fs.read())
payload = ujson.dumps(payload)
return payload
def read_json_file_2():
with open ('stackoverflow.json') as fs:
return payload = fs.read()
Then I use 'requests' module to send post resquest with payload from 2 above funtions and it works for both.
Thanks.
the 'loads' function takes a json file and converts it into a dictionary or list depending on the exact json file.
the `dumps' take a python data structure, and converts it back into json.
so you first function is loading and validating json, converting it into a python structure, and then converting it back to json before returning, whereas your 2nd function just reads the content of the file, with no conversion or validation
The functions are therefore only equivalent if the json is valid - if there are json errors then the two functions will execute very differently.If you know that the file contains an error free json the two files should return equivalent output, however if the file contains error within the json, then the first function will fail with relevant tracebacks etc, and the 2nd function wont generate any errors at all. The 1st function is more error friendly, but is less efficient (since it converts json -> python -> json). The 2nd function is much more efficient but is much less error friendly (i.e. it wont fail if the json is broken).

Configuration file with list of key-value pairs in python

I have a python script that analyzes a set of error messages and checks for each message if it matches a certain pattern (regular expression) in order to group these messages. For example "file x does not exist" and "file y does not exist" would match "file .* does not exist" and be accounted as two occurrences of "file not found" category.
As the number of patterns and categories is growing, I'd like to put these couples "regular expression/display string" in a configuration file, basically a dictionary serialization of some sort.
I would like this file to be editable by hand, so I'm discarding any form of binary serialization, and also I'd rather not resort to xml serialization to avoid problems with characters to escape (& <> and so on...).
Do you have any idea of what could be a good way of accomplishing this?
Update: thanks to Daren Thomas and Federico Ramponi, but I cannot have an external python file with possibly arbitrary code.
I sometimes just write a python module (i.e. file) called config.py or something with following contents:
config = {
'name': 'hello',
'see?': 'world'
}
this can then be 'read' like so:
from config import config
config['name']
config['see?']
easy.
You have two decent options:
Python standard config file format
using ConfigParser
YAML using a library like PyYAML
The standard Python configuration files look like INI files with [sections] and key : value or key = value pairs. The advantages to this format are:
No third-party libraries necessary
Simple, familiar file format.
YAML is different in that it is designed to be a human friendly data serialization format rather than specifically designed for configuration. It is very readable and gives you a couple different ways to represent the same data. For your problem, you could create a YAML file that looks like this:
file .* does not exist : file not found
user .* not found : authorization error
Or like this:
{ file .* does not exist: file not found,
user .* not found: authorization error }
Using PyYAML couldn't be simpler:
import yaml
errors = yaml.load(open('my.yaml'))
At this point errors is a Python dictionary with the expected format. YAML is capable of representing more than dictionaries: if you prefer a list of pairs, use this format:
-
- file .* does not exist
- file not found
-
- user .* not found
- authorization error
Or
[ [file .* does not exist, file not found],
[user .* not found, authorization error]]
Which will produce a list of lists when yaml.load is called.
One advantage of YAML is that you could use it to export your existing, hard-coded data out to a file to create the initial version, rather than cut/paste plus a bunch of find/replace to get the data into the right format.
The YAML format will take a little more time to get familiar with, but using PyYAML is even simpler than using ConfigParser with the advantage is that you have more options regarding how your data is represented using YAML.
Either one sounds like it will fit your current needs, ConfigParser will be easier to start with while YAML gives you more flexibilty in the future, if your needs expand.
Best of luck!
I've heard that ConfigObj is easier to work with than ConfigParser. It is used by a lot of big projects, IPython, Trac, Turbogears, etc...
From their introduction:
ConfigObj is a simple but powerful config file reader and writer: an ini file round tripper. Its main feature is that it is very easy to use, with a straightforward programmer's interface and a simple syntax for config files. It has lots of other features though :
Nested sections (subsections), to any level
List values
Multiple line values
String interpolation (substitution)
Integrated with a powerful validation system
including automatic type checking/conversion
repeated sections
and allowing default values
When writing out config files, ConfigObj preserves all comments and the order of members and sections
Many useful methods and options for working with configuration files (like the 'reload' method)
Full Unicode support
I think you want the ConfigParser module in the standard library. It reads and writes INI style files. The examples and documentation in the standard documentation I've linked to are very comprehensive.
If you are the only one that has access to the configuration file, you can use a simple, low-level solution. Keep the "dictionary" in a text file as a list of tuples (regexp, message) exactly as if it was a python expression:
[
("file .* does not exist", "file not found"),
("user .* not authorized", "authorization error")
]
In your code, load it, then eval it, and compile the regexps in the result:
f = open("messages.py")
messages = eval(f.read()) # caution: you must be sure of what's in that file
f.close()
messages = [(re.compile(r), m) for (r,m) in messages]
and you end up with a list of tuples (compiled_regexp, message).
I typically do as Daren suggested, just make your config file a Python script:
patterns = {
'file .* does not exist': 'file not found',
'user .* not found': 'authorization error',
}
Then you can use it as:
import config
for pattern in config.patterns:
if re.search(pattern, log_message):
print config.patterns[pattern]
This is what Django does with their settings file, by the way.

Categories