Conditional checks for JSON key-value pairs being ignored - python

I am building a point feature class from a web call that returns JSON. The JSON is a bit sketchy in that sometimes keys do not exist in the record. I am trying to do this, once I have a valid JSON object:
#requests stuff above this
for i in jsonObj:
try:
if i['properties']['country']:
country = i['properties']['country']
else:
country = 'UNK'
print('Country name not found, using {}'.format(country))
except KeyError, e:
print('Key error: reason: {}'.format(str(e)))
pass
#then do all the arcpy FC creation stuff
The result is a whole bunch of key errors with "reason: 'country'" and instead of building those rows with the generic 'country' value of 'UNK', it will simply ignore them and build the feature class, leaving out those points.
I have taken out the try and left it as a conditional check, but it fails at the first row that lacks a 'country' key.
In summary, I'm just trying to check if a key-value pair exists; if it doesn't, assign a generic value of 'UNK' to the country variable.
It seems like part of the problem might be that if i['properties']['countries'] is checking for a value, but not the existence of the key itself? How might I more efficiently check for the existence of the key?
I have read Check if a given key already exists in a dictionary and have modified my code to both of these, and neither yield the expected outcome:
for i in jsonObj:
try:
# get coordinates first
if i['geometry']['coordinates']:
ycoord = float(i['geometry']['coordinates'][1])
xcoord = float(i['geometry']['coordinates'][0])
if i['properties']['city'] in i:
city = i['properties']['city']
else:
city = 'UNK'
if i['properties']['country'] in i:
country = i['properties']['country']
else:
country = 'UNK'
and
for i in jsonObj:
try:
# get coordinates first
if i['geometry']['coordinates']:
ycoord = float(i['geometry']['coordinates'][1])
xcoord = float(i['geometry']['coordinates'][0])
if 'city' in i:
city = i['properties']['city']
else:
city = 'UNK'
if 'country' in i:
country = i['properties']['country']
else:
country = 'UNK'
I do have the 'properties' key in every record/dictionary, but whether I have a 'country' key is not guaranteed. Some rows in the json response have it, some rows don't

Your last try:
if 'country' in i:
country = i['properties']['country']
else:
country = 'UNK'
was close, but you're managing a dict of dicts, and 'country' has better chance to be a key of the sub-dict, so the fix would be:
if 'country' in i['properties']:
country = i['properties']['country']
else:
country = 'UNK'
or even better & shorter using get with a default value (I recommend that last one over the quickfix):
country = i['properties'].get('country','UNK')

It seems like you don't fully understand json implementation.
x = i['geometry']['coordinates'] is basically y = i['geometry']; x = y['coordinates'], so you need safety check for each layer, becaue i['geometry'] not only will throw exception when 'geometry' field is not found but also the returned object must also implement [] for ['coordinates'] to work (so in this case it must be another json object, and not string,bool,None etc.)
I also believe your json object is implemented using python dictionary, so you can check if certain field, e.g. 'geometry' exists using x.get('geometry'), which will return either its value or None object. You can also use city = x.get('city', 'UKN') to set default value (json object, string, bool, None etc.) if it was not found in dict (python dict.get method implements default handler).
So in the end you should have something like this:
geo = i.get('geometry')
if not geo: return
coords = geo.get('coordinates')
if not coords: return
xcoord, ycoord = float(coords[0]), float(coords[1])
props = i.get('properties')
if not props: return
city = props.get('city', 'UNK')
country = props.get('country', 'UNK')
This is a draft and I did not test it, also this strongly asserts that your json object is based on python dict object.

Related

Django Find difference in the same model instance

I've been searching around and I don't think I've found my answer quite yet. But I'm looking to be able to find differences in data and have a list of column names that show that.
Take for instance I have a model just called my_model that has some columns.
object = my_model.objects.get(id=1)
# Perform edit some values.
old_object = my_model.objects.get(id=1)
object.save()
# Check for differences
model_fields = [field.name for field in my_model._meta.get_fields()]
filtered_list = filter(lambda field: getattr(object, field, None) != getattr(old_object, field, None), model_fields)
Purpose of this is to notify the user after they make an update on their end to send an email to that user to just give them a reminder that they changed whatever values they changed.
Was able to answer my own question. Converting the two objects to dictionaries. I was able to end up with something like
dict1, dict2 = obj1.__dict__, obj2._dict__
changed_fields = {
'column_names': []
}
excluded_keys = '_state'
for k,v in dict1.items():
if k in excluded_keys:
continue
try:
if v != dict2[k]
changed_fields['column_names'].append(k)
except KeyError:
# Put error handling
continue

Pythonic way to test single-element lists to avoid IndexErrors

Throughout my codebase there is a design pattern in which we have two lists of objects and try to whittle it down to one object.
Say we have two classes,
class Employee():
def __init__(self, _id, name):
self._id = _id
self.name = name
class Shift():
def __init__(self, employee_id, shift_id):
self.employee_id = employee_id
self.shift_id = shift_id
We have lists of objects of these classes. I want to find the employee who has a shift with their id attached to it. Suppose I have the list employees containing Employee objects, and the list shifts containing Shift objects, then I do:
for shift in shifts:
# Find employee who is assigned to this shift
employee = [e for e in employees if e._id == shift.employee_id]
So far so good. There's no guarantee that the employees_with_shift will contain an employee though. If there is an employee, there's only one. Currently it's being handled like this:
if employee:
employee = employee[0]
... do something
But I don't think this is Pythonic. A simple solution would be to:
for e in employee:
...do something
I don't know if this is unpythonic, but it does handle the case smoothly. Is it wrong to use a for-loop on lists that have either zero or one elements?
The other one is to go by AFNP and do this:
try:
employee = employee[0]
... do something
except IndexError:
pass
I don't like this though because there is quite a lot of coding to do on the employee, and the error handling would get extremely complicated.
Which of my solutions (if any) is the most pythonic?
EDIT:
This question is not answered by the one in the close suggestion, because this question looks for the most pythonic way to handle the element of a list that contains either 0 or 1 elements.
Instead of first creating a list that will contain either 0 or 1 items and then unpacking the list again, I would use next to find the first employee matching the condition. If there is none, this will raise StopIteration, because the iteration of the generator expression passed into next is exhausted:
# Find employee who is assigned to this shift
try:
employee = next(e for e in employees if e._id == shift.employee_id)
except StopIteration:
# no such employee
However, why don't you just have a dictionary mapping employees by their ID?
Then you could simply write:
try:
employee = employees[shift.employee_id]
except KeyError:
# no such employee
And then you should ask yourself how it could happen that a shift was assigned an employee that doesn't exist. Maybe it's because no employee was assigned to the shift and shift.employee_id is None? Then LBYL would in fact be clearer IMO:
if shift.employee_id is None:
# no employee was assigned yet
return # or continue, break, ...
# now this must succeed, if not there is a bug
assert shift.employee_id in employees
employee = employees[shift.employee_id]
If you are searching for a single object in a list, and you expect it to either be in the list or not (and not multiple possible values), then don't create a list in the first place. The Pythonic thing to do would be simply:
for employee in employees:
if e._id == shift.employee_id:
# handle employee
break
else:
# handle the case where no employee is found, else clause not necessery
# if you simply want to pass
Probably the better design overall is to have a dictionary mapping employee id's to employees so you can handle it like this:
try:
employee = employee_dict[shift.employee_id]
except KeyError:
# handle not found case
Borrowing from #mkrieger1's answer, next() takes a default return value, so you can just do
employee = next(e for e in employees if e._id == shift.employee_id,None)
This will default employee to None for example.
A dictionary mapping would indeed also be great:
You could avoid the try..except by using .get():
employee = employees.get(shift.employee_id,None)
You don't say what you want employee to default to if it's absent in the shift list though.
I suspect you're already at the most pythonic point, barring an architecture restructuring, but you could always turn it into one line:
employee = employee[0] if bool(employee) else None
# or this is perhaps more pythonic
employee = employee[0] if employee else None
# I dislike using implicit booleans because of type ambiguity.
It's not much different to what you're already doing, but it would look better, and that might be enough here!
EDIT: I agree with the other answers about using a dictionary. My answer applies if you really can't do that

Simplifying a list into categories

I am a new Python developer and was wondering if someone can help me with this. I have a dataset that has one column that describes a company type. I noticed that the column has, for example, surgical, surgery listed. It has eyewear, eyeglasses and optometry listed. So instead of having a huge list in this column, i want to simply the category to say that if you find a word that contains "eye," "glasses" or "opto" then just change it to "eyewear." My initial code looks like this:
def map_company(row):
company = row['SIC_Desc']
if company in 'Surgical':
return 'Surgical'
elif company in ['Eye', 'glasses', 'opthal', 'spectacles', 'optometers']:
return 'Eyewear'
elif company in ['Cotton', 'Bandages', 'gauze', 'tape']:
return 'First Aid'
elif company in ['Dental', 'Denture']:
return 'Dental'
elif company in ['Wheelchairs', 'Walkers', 'braces', 'crutches', 'ortho']:
return 'Mobility equipments'
else:
return 'Other'
df['SIC_Desc'] = df.apply(map_company,axis=1)
This is not correct though because it is changing every item into "Other," so clearly my syntax is wrong. Can someone please help me simplify this column that I am trying to relabel?
Thank you
It is hard to answer without having the exact content of your data set, but I can see one mistake. According to your description, it seems you are looking at this the wrong way. You want one of the words to be in your company description, so it should look like that:
if any(test in company for test in ['Eye', 'glasses', 'opthal', 'spectacles', 'optometers'])
However you might have a case issue here so I would recommend:
company = row['SIC_Desc'].lower()
if any(test.lower() in company for test in ['Eye', 'glasses', 'opthal', 'spectacles', 'optometers']):
return 'Eyewear'
You will also need to make sure company is a string and 'SIC_Desc' is a correct column name.
In the end your function will look like that:
def is_match(company,names):
return any(name in company for name in names)
def map_company(row):
company = row['SIC_Desc'].lower()
if 'surgical' in company:
return 'Surgical'
elif is_match(company,['eye','glasses','opthal','spectacles','optometers']):
return 'Eyewear'
elif is_match(company,['cotton', 'bandages', 'gauze', 'tape']):
return 'First Aid'
else:
return 'Other'
Here is an option using a reversed dictionary.
Code
import pandas as pd
# Sample DataFrame
s = pd.Series(["gauze", "opthal", "tape", "surgical", "eye", "spectacles",
"glasses", "optometers", "bandages", "cotton", "glue"])
df = pd.DataFrame({"SIC_Desc": s})
df
LOOKUP = {
"Eyewear": ["eye", "glasses", "opthal", "spectacles", "optometers"],
"First Aid": ["cotton", "bandages", "gauze", "tape"],
"Surgical": ["surgical"],
"Dental": ["dental", "denture"],
"Mobility": ["wheelchairs", "walkers", "braces", "crutches", "ortho"],
}
REVERSE_LOOKUP = {v:k for k, lst in LOOKUP.items() for v in lst}
def map_company(row):
company = row["SIC_Desc"].lower()
return REVERSE_LOOKUP.get(company, "Other")
df["SIC_Desc"] = df.apply(map_company, axis=1)
df
Details
We define a LOOKUP dictionary with (key, value) pairs of expected output and associated words, respectively. Note, the values are lowercase to simplify searching. Then we use a reversed dictionary to automatically invert the key value pairs and improve the search performance, e.g.:
>>> REVERSE_LOOKUP
{'bandages': 'First Aid',
'cotton': 'First Aid',
'eye': 'Eyewear',
'gauze': 'First Aid',
...}
Notice these reference dictionaries are created outside the mapping function to avoid rebuilding dictionaries for every call to map_company(). Finally the mapping function quickly returns the desired output using the reversed dictionary by calling .get(), a method that returns the default argument "Other" if no entry is found.
See #Flynsee's insightful answer for an explanation of what is happening in your code. The code is cleaner compared a bevy of conditional statements.
Benefits
Since we have used dictionaries, the search time should be relatively fast, O(1) compared to a O(n) complexity using in. Moreover, the main LOOKUP dictionary is adaptable and liberated from manually implementing extensive conditional statements for new entries.

Adding items to sets in a dictionary

I have a list of dictionaries that maps different IDs to a central ID. I have a document with these different IDs associated with terms. I have created a function that now has a key the central ID from the different IDs in the document. The goFile is the document where in the first column there's an ID and in the second one there's a GOterm. The mappingList is a list containing dictionaries in which the ID in the goFile is mapped to a main ID.
My expected output is a dictionary with a main ID as a key and a set with the go terms associated with it as value.
def parseGO(mappingList, goFile):
# open the file
file = open(goFile)
# this will be the dictionary that this function returns
# entries will have as a key an Ensembl ID
# and the value will be a set of GO terms
GOdict = {}
GOset = set()
for line in file:
splitline = line.split(' ')
GO_term = splitline[1]
value_ID = splitline[0]
for dict in mappingList:
if value_ID in dict:
ENSB_term = dict[value_ID]
#my best try
for dict in mappingList:
for key in GOdict.keys():
if value_ID in dict and key == dict[value_ID]:
GOdict[ENSB_term].add(GO_term)
GOdict[ENSB_term] = GOset
return GOdict
My problem is that now I have to add to the central ID in my GOdict the terms that are associated in the document to the different IDs. To avoid duplicates i use a set (GOset). How do I do it? All my try end having all the terms mapped to all the main IDs.
Some sample:
mappingList = [{'1234': 'mainID1', '456': 'mainID2'}, {'789': 'mainID2'}]
goFile:
1234 GOTERM1
1234 GOTERM2
456 GOTERM1
456 GOTERM3
789 GOTERM1
expected output:
GOdict = {'mainID1': set([GOTERM1, GOTERM2]), 'mainID2': set([GOTERM1, GOTERM3])}
First off, you shouldn't use the variable name 'dict', as it shadows the built-in dict class, and will cause you problems at some point.
The following should work for you:
from collections import defaultdict
def parse_go(mapping_list, go_file):
go_dict = defaultdict(set)
with open(go_file) as f: # Better garbage handling using 'with'
for line in f:
(value_id, go_term) = line.split() # Feel free to change the split behaviour
# work better for you.
for map_dict in mapping_list:
if value_id in map_dict:
go_dict[map_dict[value_id]].add(go_term)
return go_dict
The code is fairly straightforward, but here's a breakdown anyway.
We use a default dictionary instead of a normal dictionary so we can eliminate all that if in or setdefault() boilerplate.
For each line in the file, we check if the first item (value_id) is a key in any of the mapping dictionaries, and if so, adds the lines second item (go_term) to that value_id's set in the dictionary.
EDIT: Request for doing this without defaultdict(). Assume that go_dict is just a normal dictionary (go_dict = {}), your for loop would look like:
for map_dict in mapping_list:
if value_id in map_dict:
esnb_entry = go_dict.setdefault(map_dict[value_id], set())
esnb_entry.add(go_term)

GQL does not work for GET paramters for keys

I am trying to compare the key to filter results in GQL in Python but the direct comparison nor typecasting to int works. Therefore, I am forced to make a work around as mentioned in the uncommented lines below. Any clues?
row = self.request.get("selectedrow")
#mydbobject = DbModel.gql("WHERE key=:1", row).fetch(1)
#mydbobject = DbModel.gql("WHERE key=:1", int(row)).fetch(1)#invalid literal for int() with base 10
#print mydbobject,row
que = db.Query(DbModel)
results = que.fetch(100)
mydbobject = None
for item in results:
if item.key().__str__() in row:
mydbobject = item
EDIT1- one more attempt that does not retrieve the record, the key exists in the Datastore along with the record
mydbobject = DbModel.gql("WHERE key = KEY('%s')"%row).fetch(1)
Am I correct in my assumption that you're basically just want to retrieve an object with a particular key? If so, the get and get_by_id methods may be of help:
mydbobject = DbModel.get_by_id(int(self.request.get("selectedrow")))
The error "invalid literal for int()" indicate that the paramater pass to int was not a string representing an integer. Try to print the value of "row" for debuging, I bet it is an empty string.
The correct way to retrieve an element from the key is simply by using the method "get" or "get_by_id".
In your case:
row = self.request.get("selectedrow")
mydbobject = DbModel.get(row)

Categories