Python search replace with multiple Json objects - python

I wasn't sure how to search for this but I am trying to make a script that dynamically launches programs. I will have a couple of JSON files and I want to be able to do a search replace sort of thing.
So I'll setup an example:
config.json
{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
}
Then process.json
{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
Now I want to be able to load both of these JSON objects and be able to use the values there to update. So like "CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive" will become CONFIG_ARCHIVE_DIR": "/app/config/archive"
Does anyone know a good way to do this recursively because I'm running into issues when I'm trying to use something like CONFIG_DIR which requires BASEDIR first.
I have this function that loads all the data:
#Recursive function, loops and loads all values into data
def _load_data(data,obj):
for i in obj.keys():
if isinstance(obj[i],str):
data[i]=obj[i]
if isinstance(obj[i],dict):
data=_load_data(data,obj[i])
return data
Then I have this function:
def _update_data(data,data_str=""):
if not data_str:
data_str=json.dumps(data)
for i in data.keys():
if isinstance(data[i],str):
data_str=data_str.replace("{"+i+"}",data[i])
if isinstance(data[i],dict):
data=_update_data(data,data_str)
return json.loads(data_str)
So this works for one level but I don't know if this is the best way to do it. It stops working when I hit a case like the CONFIG_DIR because it would need to loop over the data multiple times. First it needs to update the BASEDIR then once more to update CONFIG_DIR. suggestion welcome.
The end goal of this script is to create a start/stop/status script to manage all of our binaries. They all use different binaries to start and I want one Processes file for multiple servers. Each process will have a servers array to tell the start/stop script what to run on given server. Maybe there's something like this already out there so if there is, please point me in the direction.
I will be running on Linux and prefer to use Python. I want something smart and easy for someone else to pickup and use/modify.

I made something that works with the example files you provided. Note that I didn't handle multiple keys or non-dictionaries in the data. This function accepts a list of the dictionaries obtained after JSON parsing your input files. It uses the fact that re.sub can accept a function for the replacement value and calls that function with each match. I am sure there are plenty of improvements that could be made to this, but it should get you started at least.
def make_config(configs):
replacements = {}
def find_defs(config):
# Find leaf nodes of the dictionary.
defs = {}
for k, v in config.items():
if isinstance(v, dict):
# Nested dictionary so recurse.
defs.update(find_defs(v))
else:
defs[k] = v
return defs
for config in configs:
replacements.update(find_defs(config))
def make_replacement(m):
# Construct the replacement string.
name = m.group(0).strip('{}')
if name in replacements:
# Replace replacement strings in the replacement string.
new = re.sub('\{[^}]+\}', make_replacement, replacements[name])
# Cache result
replacements[name] = new
return new
raise Exception('Replacement string for {} not found'.format(name))
finalconfig = {}
for name, value in replacements.items():
finalconfig[name] = re.sub('\{[^}]+\}', make_replacement, value)
return finalconfig
With this input:
[
{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
},
{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
]
It gives this output:
{
'BASEDIR': '/app',
'CONFIG_ARCHIVE_DIR': '/app/config/archive',
'CONFIG_DIR': '/app/config',
'LOG_DIR': '/app/log',
'binary': 'java',
'executable': 'DummyProcess-0.1.0.jar',
'launch_args': '-Dspring.config.location=/app/config/application.yml -Dlogging.config=/app/config/logback-spring.xml -jar DummyProcess-0.1.0.jar',
'name': 'Dummy_Process',
'startup_log': '/app/log/startup_Dummy_Process.out'
}

As an alternative to the answer by #FamousJameous and if you don't mind changing to ini format, you can also use the python built-in configparser which already has support to expand variables.

I implemented a solution with a class (Config) with a couple of functions:
_load: simply convert from JSON to a Python object;
_extract_params: loop over the document (output of _load) and add them to a class object (self.params);
_loop: loop over the object returned from _extract_params and, if the values contains any {param}, call the _transform method;
_transform: replace the {param} in the values with the correct values, if there is any '{' in the value linked to the param that needs to be replaced, call again the function
I hope I was clear enough, here is the code:
import json
import re
config = """{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
}"""
process = """{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
"""
class Config(object):
def __init__(self, documents):
self.documents = documents
self.params = {}
self.output = {}
# Loads JSON to dictionary
def _load(self, document):
obj = json.loads(document)
return obj
# Extracts the config parameters in a dictionary
def _extract_params(self, document):
for k, v in document.items():
if isinstance(v, dict):
# Recursion for inner dictionaries
self._extract_params(v)
else:
# if not a dict set params[k] as v
self.params[k] = v
return self.params
# Loop on the configs dictionary
def _loop(self, params):
for key, value in params.items():
# if there is any parameter inside the value
if len(re.findall(r'{([^}]*)\}', value)) > 0:
findings = re.findall(r'{([^}]*)\}', value)
# call the transform function
self._transform(params, key, findings)
return self.output
# Replace all the findings with the correct value
def _transform(self, object, key, findings):
# Iterate over the found params
for finding in findings:
# if { -> recursion to set all the needed values right
if '{' in object[finding]:
self._transform(object, finding, re.findall(r'{([^}]*)\}', object[finding]))
# Do de actual replace
object[key] = object[key].replace('{'+finding+'}', object[finding])
self.output = object
return self.output
# Entry point
def process_document(self):
params = {}
# _load the documents and extract the params
for document in self.documents:
params.update(self._extract_params(self._load(document)))
# _loop over the params
return self._loop(params)
# return self.output
if __name__ == '__main__':
config = Config([config, process])
print(config.process_document())
I am sure there are many other better ways to reach your goal, but I still hope this can bu useful to you.

Related

Massaging XML to JSON output for front-end parsing

Using the xmltodict (v0.12.0) on python, I have an xml that will get parsed and converted into a json format. For example:
XML:
<test temp="temp" temp2="temp2">This is a test</test>
Will get converted to the following json:
"test": {
"#temp": "temp",
"#temp2": "temp2",
"#text": "This is a test"
}
I have a front end parser that reads JSON objects and converts them into XML. Unfortunately, the tags are required to be shaped in a different way.
What the front end parser expects:
{
test: {
"#": {
temp: "temp",
temp2: "temp2"
},
"#": "This is a test"
}
}
I feel like this formatting is better served to be modified on Python but I am having a bit of trouble iterating a much larger dictionary, where we don't know how deep an xml would go, and collecting all of the keys that start with "#" and giving that it's own object within the overall tag object. What are some ways I could approach shaping this data?
For anyone who is curious, this is how I ended up solving the issue. Like #furas stated, I decided that recursion was my best bet. I ended up iterating through my original xml data I converted to JSON with the incorrect formatting of attributes, then creating a copy while finding any attribute markers:
def structure_xml(data):
curr_dict = {}
for key,value in data.items():
if isinstance(value, dict):
curr_dict[key] = structure_xml(value)
elif isinstance(value, list):
value_list = []
for val in value:
if isinstance(val,dict) or isinstance(val,list):
value_list.append(structure_xml(val))
curr_dict[key] = value_list
else:
if '#' in key:
new_key = key.split("#",1)[1]
new_obj = { new_key: value }
if "#" in curr_dict:
curr_dict["#"][new_key] = value
else:
curr_dict["#"] = new_obj
elif '#text' in key:
curr_dict['#'] = data[key]
else:
curr_dict[key] = data[key]
return curr_dict

How to Parse YAML Using PyYAML if there are '!' within the YAML

I have a YAML file that I'd like to parse the description variable only; however, I know that the exclamation points in my CloudFormation template (YAML file) are giving PyYAML trouble.
I am receiving the following error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!Equals'
The file has many !Ref and !Equals. How can I ignore these constructors and get a specific variable I'm looking for -- in this case, the description variable.
If you have to deal with a YAML document with multiple different tags, and
are only interested in a subset of them, you should still
handle them all. If the elements you are intersted in are nested
within other tagged constructs you at least need to handle all of the "enclosing" tags
properly.
There is however no need to handle all of the tags individually, you
can write a constructor routine that can handle mappings, sequences
and scalars register that to PyYAML's SafeLoader using:
import yaml
inp = """\
MyEIP:
Type: !Join [ "::", [AWS, EC2, EIP] ]
Properties:
InstanceId: !Ref MyEC2Instance
"""
description = []
def any_constructor(loader, tag_suffix, node):
if isinstance(node, yaml.MappingNode):
return loader.construct_mapping(node)
if isinstance(node, yaml.SequenceNode):
return loader.construct_sequence(node)
return loader.construct_scalar(node)
yaml.add_multi_constructor('', any_constructor, Loader=yaml.SafeLoader)
data = yaml.safe_load(inp)
print(data)
which gives:
{'MyEIP': {'Type': ['::', ['AWS', 'EC2', 'EIP']], 'Properties': {'InstanceId': 'MyEC2Instance'}}}
(inp can also be a file opened for reading).
As you see above will also continue to work if an unexpected !Join tag shows up in your code,
as well as any other tag like !Equal. The tags are just dropped.
Since there are no variables in YAML, it is a bit of guesswork what
you mean by "like to parse the description variable only". If that has
an explicit tag (e.g. !Description), you can filter out the values by adding 2-3 lines
to the any_constructor, by matching the tag_suffix parameter.
if tag_suffix == u'!Description':
description.append(loader.construct_scalar(node))
It is however more likely that there is some key in a mapping that is a scalar description,
and that you are interested in the value associated with that key.
if isinstance(node, yaml.MappingNode):
d = loader.construct_mapping(node)
for k in d:
if k == 'description':
description.append(d[k])
return d
If you know the exact position in the data hierarchy, You can of
course also walk the data structure and extract anything you need
based on keys or list positions. Especially in that case you'd be better of
using my ruamel.yaml, was this can load tagged YAML in round-trip mode without
extra effort (assuming the above inp):
from ruamel.yaml import YAML
with YAML() as yaml:
data = yaml.load(inp)
You can define a custom constructors using a custom yaml.SafeLoader
import yaml
doc = '''
Conditions:
CreateNewSecurityGroup: !Equals [!Ref ExistingSecurityGroup, NONE]
'''
class Equals(object):
def __init__(self, data):
self.data = data
def __repr__(self):
return "Equals(%s)" % self.data
class Ref(object):
def __init__(self, data):
self.data = data
def __repr__(self):
return "Ref(%s)" % self.data
def create_equals(loader,node):
value = loader.construct_sequence(node)
return Equals(value)
def create_ref(loader,node):
value = loader.construct_scalar(node)
return Ref(value)
class Loader(yaml.SafeLoader):
pass
yaml.add_constructor(u'!Equals', create_equals, Loader)
yaml.add_constructor(u'!Ref', create_ref, Loader)
a = yaml.load(doc, Loader)
print(a)
Outputs:
{'Conditions': {'CreateNewSecurityGroup': Equals([Ref(ExistingSecurityGroup), 'NONE'])}}

How to optimize the best in python for this program with good performance?

I have a BIG JSON file (as sample shown below) for my application having various variables with values as strings and integers. I would like to read this file and store in the different class variables for further processing. These class variables shall change based on the functionality. I would to know any ideas for further optimizing the below code. In the below code, I am explicitly copying the data without any list comprehensions or any best technique. Any ideas to avoid copying data as config.ID =str(self.data["id"]), config.ACTIVE=int(self.data["isActive"]) and do an efficient way (If I have 1000 variables, need to write 1000 lines.
read_con.py
-----------
import json
class config:
ID=None
ACTIVE=None
AGE=None
NAME=None
GEN=None
COM=None
EMAIL=None
def __init__(self):
self.data = {}
def read_config_data(self, cfile):
try:
with open(cfile, 'r') as cd:
self.data = json.load(cd)
except Exception:
print("Error in Read file")
self.data = {}
else:
# HOW TO AVOID COPY OF DATA AS BELOW.
config.ID =str(self.data["id"])
config.ACTIVE=int(self.data["isActive"])
config.AGE=int(self.data["age"])
config.NAME=str(self.data["name"])
config.GEN=str(self.data["gender"])
config.COM=str(self.data["company"])
config.EMAIL=str(self.data["email"])
def use_variables_modify_based_on_request(self):
config.AGE=45
config.ACTIVE=8
config.EMAIL="x#gmail.com"
def printvalues(self):
print config.ID, config.ACTIVE, config.AGE, config.NAME, config.EMAIL
if __name__ == "__main__":
obj = config()
obj.read_config_data("sample.json")
obj.printvalues()
# Modifying the values of class variables in different functions.
obj.use_variables_modify_based_on_request()
obj.printvalues()
sample.json file
-----------------
{
"id": "59761c233d8d0",
"isActive": 1,
"age": 24,
"name": "Kirsten Sellers",
"gender": "female",
"company": "EMERGENT",
"email": "kirstensellers#emergent.com"
}
Instead of this:
...
else:
# HOW TO AVOID COPY OF DATA AS BELOW.
config.ID =str(self.data["id"])
config.ACTIVE=int(self.data["isActive"])
config.AGE=int(self.data["age"])
config.NAME=str(self.data["name"])
config.GEN=str(self.data["gender"])
config.COM=str(self.data["company"])
config.EMAIL=str(self.data["email"])
...
Do this:
...
else:
for key, value in self.data.items():
setattr(config, key.upper(), value)
...
(there is no need for the str and int calls since the values are already the appropriate type)

python iterate json file where the json structure and key values are unknown

consider the sample JSON below.
{
"widget": {
"test": "on",
"window": {
"title": "myWidget1",
"name": "main_window"
},
"image": {
"src": "Images/wid1.png",
"name": "wid1"
}
},
"os":{
"name": "ios"
}
}
Consider the case where we dont know the structure of the JSON and any of the keys. What I need to implement is a python function which iterates through all the keys and sub-keys and prints the key. That is by only knowing the JSON file name, I should be able to iterate the entire keys and sub-keys. The JSON can be of any structure.What I have tried is given below.
JSON_PATH = "D:\workspace\python\sampleJSON.json"
os.path.expanduser(JSON_PATH)
def iterateAllKeys(e):
for key in e.iterkeys():
print key
for child in key.get(key):
iterateAllKeys(child)
with open(JSON_PATH) as data_file:
data = json.load(data_file)
iterateAllKeys(data)
Here, the iterateAllKeys() function is supposed to print all the keys present in the JSON file. But if only the outer loop is present, ie
def iterateAllKeys(e):
for key in e.iterkeys():
print key
It will print the keys "widget" and "os". But,
def iterateAllKeys(e):
for key in e.iterkeys():
print key
for child in key.get(key):
iterateAllKeys(child)
returns an error - AttributeError: 'unicode' object has no attribute 'get'. My understanding is - since the value of 'child' is not a dict object, we cannot apply the 'key.get()'. But is there any alternate way by which I can iterate the JSON file without specifying any of the key names. Thank you.
You can use recursion to iterate through multi level dictionaries like this:
def iter_dict(dic):
for key in dic:
print(key)
if isinstance(dic[key], dict):
iter_dict(dic[key])
The keys of the first dictionary are iterated and every key is printed, if the item is an instance of dict class, we can use recursion to also iterate through the dictionaries we encounter as items.
You can do this thru auxiliary package like flatten_json.
pip install flatten_json
from flatten_json import flatten
for key in flatten(your_dict).keys():
print(key)
Output:
widget_test
widget_window_title
widget_window_name
widget_image_src
widget_image_name
os_name
If you want to show only key without whole path then you can do like that:
print(key.split('_')[-1])
First of all your last function:
def iterateAllKeys(e):
for key in e.iterkeys():
print key
for child in key.get(key):
iterateAllKeys(child)
key is just the key_value of the dictionary. So if anything you should be using e.get(key) or e[key].
for child in e.get(key):
Now this would not solve your problem, one work-around is using try except, as follows:
def iterateAllKeys(e):
for key in e.iterkeys():
print key
try:
iterateAllKeys(e[key])
except:
print "---SKIP---"
This is maybe not the best work-around, but it certainly works.
With your Data it prints the following:
widget
test
---SKIP---
window
name
---SKIP---
title
---SKIP---
image
src
---SKIP---
name
---SKIP---
os
name
---SKIP---

ndb.Key filter for MapReduce input_reader

Playing with new Google App Engine MapReduce library filters for input_reader I would like to know how can I filter by ndb.Key.
I read this post and I've played with datetime, string, int, float, in filters tuples, but How I can filter by ndb.Key?
When I try to filter by a ndb.Key I get this error:
BadReaderParamsError: Expected Key, got u"Key('Clients', 406)"
Or this error:
TypeError: Key('Clients', 406) is not JSON serializable
I tried to pass a ndb.Key object and string representation of the ndb.Key.
Here are my two filters tuples:
Sample 1:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", ndb.Key('Clients', 406))]
}
Sample 2:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", "%s" % ndb.Key('Clients', 406))]
}
This is a bit tricky.
If you look at the code on Google Code you can see that mapreduce.model defines a JSON_DEFAULTS dict which determines the classes that get special-case handling in JSON serialization/deserialization: by default, just datetime. So, you can monkey-patch the ndb.Key class into there, and provide it with functions to do that serialization/deserialization - something like:
from mapreduce import model
def _JsonEncodeKey(o):
"""Json encode an ndb.Key object."""
return {'key_string': o.urlsafe()}
def _JsonDecodeKey(d):
"""Json decode a ndb.Key object."""
return ndb.Key(urlsafe=d['key_string'])
model.JSON_DEFAULTS[ndb.Key] = (_JsonEncodeKey, _JsonDecodeKey)
model._TYPE_IDS['Key'] = ndb.Key
You may also need to repeat those last two lines to patch mapreduce.lib.pipeline.util as well.
Also note if you do this, you'll need to ensure that this gets run on any instance that runs any part of a mapreduce: the easiest way to do this is to write a wrapper script that imports the above registration code, as well as mapreduce.main.APP, and override the mapreduce URL in your app.yaml to point to your wrapper.
Make your own input reader based on DatastoreInputReader, which knows how to decode key-based filters:
class DatastoreKeyInputReader(input_readers.DatastoreKeyInputReader):
"""Augment the base input reader to accommodate ReferenceProperty filters"""
def __init__(self, *args, **kwargs):
try:
filters = kwargs['filters']
decoded = []
for f in filters:
value = f[2]
if isinstance(value, list):
value = db.Key.from_path(*value)
decoded.append((f[0], f[1], value))
kwargs['filters'] = decoded
except KeyError:
pass
super(DatastoreKeyInputReader, self).__init__(*args, **kwargs)
Run this function on your filters before passing them in as options:
def encode_filters(filters):
if filters is not None:
encoded = []
for f in filters:
value = f[2]
if isinstance(value, db.Model):
value = value.key()
if isinstance(value, db.Key):
value = value.to_path()
entry = (f[0], f[1], value)
encoded.append(entry)
filters = encoded
return filters
Are you aware of the to_old_key() and from_old_key() methods?
I had the same problem and came up with a workaround with computed properties.
You can add to your Sales model a new ndb.ComputedProperty with the Key id. Ids are just strings, so you wont have any JSON problems.
client_id = ndb.ComputedProperty(lambda self: self.client.id())
And then add that condition to your mapreduce query filters
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client_id","=", '406']
}
The only drawback is that Computed properties are not indexed and stored until you call the put() parameter, so you will have to traverse all the Sales entities and save them:
for sale in Sales.query().fetch():
sale.put()

Categories