Clean solution for missing values in python list comprehensions - python

Is there any way to check every element of a list comprehension in a clean and elegant way?
For example, if I have some db result which may or may not have a 'loc' attribute, is there any way to have the following code run without crashing?
db_objs = SQL("query")
top_scores = [{"name":obj.name, "score":obj.score, "latitude":obj.loc.lat, "longitude":obj.loc.lon} for obj in db_objs]
If there is any way to fill these fields in either as None or the empty string or anything, that would be much very nice. Python tends to be a magical thing, so if any of you have sage advice it would be much appreciated.

Clean and unified solution:
from operator import attrgetter as _attrgetter
def attrgetter(attrname, default=None):
getter = _attrgetter(attrname)
def wrapped(obj):
try:
return getter(obj)
except AttributeError:
return default
return wrapped
GETTER_MAP = {
"name":attrgetter('name'),
"score":attrgetter('score'),
"latitude":attrgetter('loc.lat'),
"longitude":attrgetter('loc.lon'),
}
def getdict(obj):
return dict(((k,v(obj)) for (k,v) in GETTER_MAP.items()))
if __name__ == "__main__":
db_objs = SQL("query")
top_scores = [getdict(obj) for obj in db_objs]
print top_scores

Try this:
top_scores = [{"name":obj.name,
"score":obj.score,
"latitude": obj.loc.lat if hasattr(obj.loc, lat) else 0
"longitude":obj.loc.lon if hasattr(obj.loc, lon) else 0}
for obj in db_objs]
Or, in your query set a default value.

It's not pretty, but getattr() should work:
top_scores = [
{
"name": obj.name,
"score": obj.score,
"latitude": getattr(getattr(obj, "loc", None), "lat", None),
"longitude": getattr(getattr(obj, "loc", None), "lon", None),
}
for obj in db_objs
]
This will set the dict item with key "latitude" to obj.loc.lat (and so on) if it exists; if it doesn't (and even if obj.loc doesn't exist), it'll be set to None.

Related

Python - if StringSet has a String in it

I have searched quite thoroughly and have not found a suitable solution. I am new to Python/Programming, so I appreciate any advice I can get:
I am trying to search a string from StringSet, here is what i am trying to do but not getting the value.
string_set = {'"123", "456", "789"'}
value = '123'
values_list = []
def fun():
for i in string_set:
if i in value:
output=LookupTables.get('dynamo-table', i, {})
return output
fun()
Using the above if it value is in the stringset then it will return the value which is in my dynmodb table.
Nothe: There could be more than 5000 values in my table so i wanted to get earliest possible return.
maybe you should romove the extra '' firstly
string_set = {'"123", "456", "789"'} # this set has just one value '"123", "456", "789"'
string_set_fixed = {"123", "456", "789"}
im assuming you're just checking if 123 is in "123", "456", "789" since you had it wrapped in single quotes:
to represent that lets use:
strset = {"123", "456", "789"}
what if you have to use that weird variable?
this should render it useable
strset = {'"123", "456", "789"'}
removed = next(iter(strset))
strset.update((removed).split())
strset.remove(removed)
strset = set([i.strip(",").strip('"') for i in strset])
another cleaner way:
strset = {'"123", "456", "789"'}
exec(f"strset = {next(iter(strset))}")
print("123" in strset)
now to check if value is in there:
if value in strset:
#do code here
Try this:
string_set = {"123", "456", "789"}
value = '123'
values_list = []
def fun():
if value in string_set:
output = LookupTables.get('dynamo-table', value, {})
return output
fun()
Explanation:
Your definition of string_set contains an extraneous pair of ' ';
When you are testing i in value, you are comparing i against all substrings of value, rather than against the whole string.

Python search replace with multiple Json objects

I wasn't sure how to search for this but I am trying to make a script that dynamically launches programs. I will have a couple of JSON files and I want to be able to do a search replace sort of thing.
So I'll setup an example:
config.json
{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
}
Then process.json
{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
Now I want to be able to load both of these JSON objects and be able to use the values there to update. So like "CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive" will become CONFIG_ARCHIVE_DIR": "/app/config/archive"
Does anyone know a good way to do this recursively because I'm running into issues when I'm trying to use something like CONFIG_DIR which requires BASEDIR first.
I have this function that loads all the data:
#Recursive function, loops and loads all values into data
def _load_data(data,obj):
for i in obj.keys():
if isinstance(obj[i],str):
data[i]=obj[i]
if isinstance(obj[i],dict):
data=_load_data(data,obj[i])
return data
Then I have this function:
def _update_data(data,data_str=""):
if not data_str:
data_str=json.dumps(data)
for i in data.keys():
if isinstance(data[i],str):
data_str=data_str.replace("{"+i+"}",data[i])
if isinstance(data[i],dict):
data=_update_data(data,data_str)
return json.loads(data_str)
So this works for one level but I don't know if this is the best way to do it. It stops working when I hit a case like the CONFIG_DIR because it would need to loop over the data multiple times. First it needs to update the BASEDIR then once more to update CONFIG_DIR. suggestion welcome.
The end goal of this script is to create a start/stop/status script to manage all of our binaries. They all use different binaries to start and I want one Processes file for multiple servers. Each process will have a servers array to tell the start/stop script what to run on given server. Maybe there's something like this already out there so if there is, please point me in the direction.
I will be running on Linux and prefer to use Python. I want something smart and easy for someone else to pickup and use/modify.
I made something that works with the example files you provided. Note that I didn't handle multiple keys or non-dictionaries in the data. This function accepts a list of the dictionaries obtained after JSON parsing your input files. It uses the fact that re.sub can accept a function for the replacement value and calls that function with each match. I am sure there are plenty of improvements that could be made to this, but it should get you started at least.
def make_config(configs):
replacements = {}
def find_defs(config):
# Find leaf nodes of the dictionary.
defs = {}
for k, v in config.items():
if isinstance(v, dict):
# Nested dictionary so recurse.
defs.update(find_defs(v))
else:
defs[k] = v
return defs
for config in configs:
replacements.update(find_defs(config))
def make_replacement(m):
# Construct the replacement string.
name = m.group(0).strip('{}')
if name in replacements:
# Replace replacement strings in the replacement string.
new = re.sub('\{[^}]+\}', make_replacement, replacements[name])
# Cache result
replacements[name] = new
return new
raise Exception('Replacement string for {} not found'.format(name))
finalconfig = {}
for name, value in replacements.items():
finalconfig[name] = re.sub('\{[^}]+\}', make_replacement, value)
return finalconfig
With this input:
[
{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
},
{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
]
It gives this output:
{
'BASEDIR': '/app',
'CONFIG_ARCHIVE_DIR': '/app/config/archive',
'CONFIG_DIR': '/app/config',
'LOG_DIR': '/app/log',
'binary': 'java',
'executable': 'DummyProcess-0.1.0.jar',
'launch_args': '-Dspring.config.location=/app/config/application.yml -Dlogging.config=/app/config/logback-spring.xml -jar DummyProcess-0.1.0.jar',
'name': 'Dummy_Process',
'startup_log': '/app/log/startup_Dummy_Process.out'
}
As an alternative to the answer by #FamousJameous and if you don't mind changing to ini format, you can also use the python built-in configparser which already has support to expand variables.
I implemented a solution with a class (Config) with a couple of functions:
_load: simply convert from JSON to a Python object;
_extract_params: loop over the document (output of _load) and add them to a class object (self.params);
_loop: loop over the object returned from _extract_params and, if the values contains any {param}, call the _transform method;
_transform: replace the {param} in the values with the correct values, if there is any '{' in the value linked to the param that needs to be replaced, call again the function
I hope I was clear enough, here is the code:
import json
import re
config = """{
"global_vars": {
"BASEDIR": "/app",
"CONFIG_DIR": "{BASEDIR}/config",
"LOG_DIR": "{BASEDIR}/log",
"CONFIG_ARCHIVE_DIR": "{CONFIG_DIR}/archive"
}
}"""
process = """{
"name": "Dummy_Process",
"binary": "java",
"executable": "DummyProcess-0.1.0.jar",
"launch_args": "-Dspring.config.location={CONFIG_DIR}/application.yml -Dlogging.config={CONFIG_DIR}/logback-spring.xml -jar {executable}",
"startup_log": "{LOG_DIR}/startup_{name}.out"
}
"""
class Config(object):
def __init__(self, documents):
self.documents = documents
self.params = {}
self.output = {}
# Loads JSON to dictionary
def _load(self, document):
obj = json.loads(document)
return obj
# Extracts the config parameters in a dictionary
def _extract_params(self, document):
for k, v in document.items():
if isinstance(v, dict):
# Recursion for inner dictionaries
self._extract_params(v)
else:
# if not a dict set params[k] as v
self.params[k] = v
return self.params
# Loop on the configs dictionary
def _loop(self, params):
for key, value in params.items():
# if there is any parameter inside the value
if len(re.findall(r'{([^}]*)\}', value)) > 0:
findings = re.findall(r'{([^}]*)\}', value)
# call the transform function
self._transform(params, key, findings)
return self.output
# Replace all the findings with the correct value
def _transform(self, object, key, findings):
# Iterate over the found params
for finding in findings:
# if { -> recursion to set all the needed values right
if '{' in object[finding]:
self._transform(object, finding, re.findall(r'{([^}]*)\}', object[finding]))
# Do de actual replace
object[key] = object[key].replace('{'+finding+'}', object[finding])
self.output = object
return self.output
# Entry point
def process_document(self):
params = {}
# _load the documents and extract the params
for document in self.documents:
params.update(self._extract_params(self._load(document)))
# _loop over the params
return self._loop(params)
# return self.output
if __name__ == '__main__':
config = Config([config, process])
print(config.process_document())
I am sure there are many other better ways to reach your goal, but I still hope this can bu useful to you.

Simplest ElasticSearch query

I have the following document:
obj = {
"ID": 4,
"GUID": 4,
"Type": "Movie",
"Type": "Margin Call",
}
Is there a simple "all-type" query that can be done, for example something like:
>>> es.search(index="avails", term="margin")
Or -
>>> es.search(index="avails", term="Movie")
Or -
>>> es.search(index="avails", term="4")
Or, do I need to use the specialized ElasticSearch syntax differently for each of these searches? Basically, I'm just looking to approximate results and make sure object-creation is working working digging into the query language.
Here is the simplest way I could figure out to do a search in ElasticSearch:
def search(self, index=None, **kwargs):
"""
This will use the & syntax with all the kwargs provided.
For example: es.search(name="margin", id=4) ==> ?name=margin&id=4.
Use as a simplified search to gut check very basic things.
"""
index = index or self.index
qs = urllib.urlencode(kwargs)
try:
res = self.es.search(index=index, q=qs)
except TransportError, e:
print '>>>', e
res = None
return res

Jsonifying an array

I have the following:
def jsonify(ar):
json.dumps(ar._data)
jsonify(getFromTable())
getFromTable returns an array of boto objects. Each of those objects has a _data element. However ar._data does not work. It does not have the attribute _data.
How can I make a singular json from multiple objects. Or is it impossible?
My work around for this is:
def jsonify(ar):
str=""
for i in ar:
str+=json.dumps(i._data)
print str
return str
jsonify(getFromTable())
However I would still preffer to print them all in one json blob. Does anyone know how?
Solved below with help from mGilson
Also just as an fyi:
I'm using boto, dynamodb2, python, and pulling from a lazy evaluation resultSet returned by querying my table.
#mGilson, thank you. That was the correct way to solve that.
For anyone curious here is the implementation that I used.
def getFromTable():
global table
#t = table.scan(thirdKey__eq="Anon")
t = table.scan()
arr = []
for a in t:
arr.append(a)
return arr
def jsonify(ar):
str = json.dumps(ar)
print str
return str
def createListFromBotoObj(obj):
myList = []
for o in obj:
myList.append(o._data)
return myList
jsonify(createListFromBotoObj(getFromTable()))
Which prints the expected result:
[{"secondKey": "Doe", "thirdKey": "Anon", "firstKey": "John"}, {"secondKey": "G", "thirdKey": "Company", "firstKey": "P"}, {"secondKey": "T", "thirdKey": "Engineer3", "firstKey": "allen"}, {"secondKey": "John", "last_name": "Doe", "firstKey": "booperface"}, {"secondKey": "The Builder", "thirdKey": "Sadness", "firstKey": "Bob"}]
In case anyone is wondering I'm using this to test how I will implement my actual database.

Python - is there an elegant way to avoid dozens try/except blocks while getting data out of a json object?

I'm looking for ways to write functions like get_profile(js) but without all the ugly try/excepts.
Each assignment is in a try/except because occasionally the json field doesn't exist. I'd be happy with an elegant solution which defaulted everything to None even though I'm setting some defaults to [] and such, if doing so would make the overall code much nicer.
def get_profile(js):
""" given a json object, return a dict of a subset of the data.
what are some cleaner/terser ways to implement this?
There will be many other get_foo(js), get_bar(js) functions which
need to do the same general type of thing.
"""
d = {}
try:
d['links'] = js['entry']['gd$feedLink']
except:
d['links'] = []
try:
d['statisitcs'] = js['entry']['yt$statistics']
except:
d['statistics'] = {}
try:
d['published'] = js['entry']['published']['$t']
except:
d['published'] = ''
try:
d['updated'] = js['entry']['updated']['$t']
except:
d['updated'] = ''
try:
d['age'] = js['entry']['yt$age']['$t']
except:
d['age'] = 0
try:
d['name'] = js['entry']['author'][0]['name']['$t']
except:
d['name'] = ''
return d
Replace each of your try catch blocks with chained calls to the dictionary get(key [,default]) method. All calls to get before the last call in the chain should have a default value of {} (empty dictionary) so that the later calls can be called on a valid object, Only the last call in the chain should have the default value for the key that you are trying to look up.
See the python documentation for dictionairies http://docs.python.org/library/stdtypes.html#mapping-types-dict
For example:
d['links'] = js.get('entry', {}).get('gd$feedLink', [])
d['published'] = js.get('entry', {}).get('published',{}).get('$t', '')
Use get(key[, default]) method of dictionaries
Code generate this boilerplate code and save yourself even more trouble.
Try something like...
import time
def get_profile(js):
def cas(prev, el):
if hasattr(prev, "get") and prev:
return prev.get(el, prev)
return prev
def getget(default, *elements):
return reduce(cas, elements[1:], js.get(elements[0], default))
d = {}
d['links'] = getget([], 'entry', 'gd$feedLink')
d['statistics'] = getget({}, 'entry', 'yt$statistics')
d['published'] = getget('', 'entry', 'published', '$t')
d['updated'] = getget('', 'entry', 'updated', '$t')
d['age'] = getget(0, 'entry', 'yt$age', '$t')
d['name'] = getget('', 'entry', 'author', 0, 'name' '$t')
return d
print get_profile({
'entry':{
'gd$feedLink':range(4),
'yt$statistics':{'foo':1, 'bar':2},
'published':{
"$t":time.strftime("%x %X"),
},
'updated':{
"$t":time.strftime("%x %X"),
},
'yt$age':{
"$t":"infinity years",
},
'author':{0:{'name':{'$t':"I am a cow"}}},
}
})
It's kind of a leap of faith for me to assume that you've got a dictionary with a key of 0 instead of a list but... You get the idea.
You need to familiarise yourself with dictionary methods Check here for how to handle what you're asking.
Two possible solutions come to mind, without knowing more about how your data is structured:
if k in js['entry']:
something = js['entry'][k]
(though this solution wouldn't really get rid of your redundancy problem, it is more concise than a ton of try/excepts)
or
js['entry'].get(k, []) # or (k, None) depending on what you want to do
A much shorter version is just something like...
for k,v in js['entry']:
d[k] = v
But again, more would have to be said about your data.

Categories