This is more a general programming question than related to the actual code.
I have this ugly code that takes an input from JIRA and converts it from milliseconds to hours written out multiple times like below:
def convertMillis(ms):
hours = ms / 1000 / 60 / 60
return hours
try:
newaccsla_comp = convertMillis(issues.fields.customfield_10705.completedCycles[0].remainingTime.millis)
except:
newaccsla_comp = np.nan
try:
newaccsla_ongoing = convertMillis(issues.fields.customfield_10705.ongoingCycle.remainingTime.millis)
except:
newaccsla_ongoing = np.nan
try:
paymentssla_comp = convertMillis(issues.fields.customfield_10136.completedCycles[0].remainingTime.millis)
except:
paymentssla_comp = np.nan
try:
paymentssla_ongoing = convertMillis(issues.fields.customfield_10136.ongoingCycle.remainingTime.millis)
except:
paymentssla_ongoing = np.nan
try:
modifysla_comp = convertMillis(issues.fields.customfield_10713.completedCycles[0].remainingTime.millis)
except:
modifysla_comp = np.nan
try:
modifysla_ongoing = convertMillis(issues.fields.customfield_10713.ongoingCycle.remainingTime.millis)
except:
modifysla_ongoing = np.nan
try:
MFsla_comp = convertMillis(issues.fields.customfield_10711.completedCycles[0].remainingTime.millis)
except:
MFsla_comp = np.nan
try:
MFsla_ongoing = convertMillis(issues.fields.customfield_10711.ongoingCycle.remainingTime.millis)
except:
MFsla_ongoing = np.nan
try:
closeaccsla_comp = convertMillis(issues.fields.customfield_10140.completedCycles[0].remainingTime.millis)
except:
closeaccsla_comp = np.nan
try:
closeaccsla_ongoing = convertMillis(issues.fields.customfield_10140.ongoingCycle.remainingTime.millis)
except:
closeaccsla_ongoing = np.nan
try:
casla_comp = convertMillis(issues.fields.customfield_10213.completedCycles[0].remainingTime.millis)
except:
casla_comp = np.nan
try:
casla_ongoing = convertMillis(issues.fields.customfield_10213.ongoingCycle.remainingTime.millis)
except:
casla_ongoing = np.nan
try:
at_comp = convertMillis(issues.fields.customfield_10144.completedCycles[0].remainingTime.millis)
except:
at_comp = np.nan
try:
at_ongoing = convertMillis(issues.fields.customfield_10144.ongoingCycle.remainingTime.millis)
except:
at_ongoing = np.nan
try:
modfeesla_comp = convertMillis(issues.fields.customfield_10134.completedCycles[0].remainingTime.millis)
except:
modfeesla_comp = np.nan
try:
modfeesla_ongoing = convertMillis(issues.fields.customfield_10134.ongoingCycle.remainingTime.millis)
except:
modfeesla_ongoing = np.nan
try:
tdsla_comp = convertMillis(issues.fields.customfield_11200.completedCycles[0].remainingTime.millis)
except:
tdsla_comp = np.nan
try:
tdsla_ongoing = convertMillis(issues.fields.customfield_11200.ongoingCycle.remainingTime.millis)
except:
tdsla_ongoing = np.nan
try:
querysla_comp = convertMillis(issues.fields.customfield_10142.completedCycles[0].remainingTime.millis)
except:
querysla_comp = np.nan
try:
querysla_ongoing = convertMillis(issues.fields.customfield_10142.ongoingCycle.remainingTime.millis)
except:
querysla_ongoing = np.nan
try:
recsla_comp = convertMillis(issues.fields.customfield_15600.completedCycles[0].remainingTime.millis)
except:
recsla_comp = np.nan
try:
recsla_ongoing = convertMillis(issues.fields.customfield_15600.ongoingCycle.remainingTime.millis)
except:
recsla_ongoing = np.nan
try:
reportsla_comp = convertMillis(issues.fields.customfield_15601.completedCycles[0].remainingTime.millis)
except:
reportsla_comp = np.nan
try:
reportsla_ongoing = convertMillis(issues.fields.customfield_15601.ongoingCycle.remainingTime.millis)
except:
reportsla_ongoing = np.nan
I would be comfortable doing something like taking all the custom fields, putting them in one list then doing a for over the function like this:
field_list = ['customfield_10705','customfield_10136','customfield_10713','customfield_10711','customfield_10140','customfield_10213','customfield_10144','customfield_10134','customfield_11200','customfield_10142','customfield_15600','customfield_15601']
def get_jira_hours(field):
try:
newaccsla_comp = convertMillis(issues.fields.field.completedCycles[0].remainingTime.millis)
except:
newaccsla_comp = np.nan
try:
newaccsla_ongoing = convertMillis(issues.fields.field.ongoingCycle.remainingTime.millis)
except:
newaccsla_ongoing = np.nan
for field in field_list:
get_jira_hours(field)
However there is three variables that are linked to each function call that I need to iterate over - the customfield_10705 and the name to save each try/except too newaccsla_comp and newaccsla_ongoing.
Here's the variables in order.. ie. field_list[0] is linked to name_list[0]
field_list = ['customfield_10705','customfield_10136','customfield_10713','customfield_10711','customfield_10140','customfield_10213','customfield_10144','customfield_10134','customfield_11200','customfield_10142','customfield_15600','customfield_15601']
name_list = ['newaccsla','paymentssla','modifysla','MFsla','closeaccsla','casla','at','modfeesla','tdsla','querysla','recsla','reportssla']
Best way to iterate over these? Thanks.
First, you can turn each of those four-line blocks into a one-liner if you just edit your convertMillis function to return np.nan instead of raising—or, if you can't do that, wrap the function in another one:
def convertMillisOrNan(millis):
try:
return convertMillis(millis)
except:
return np.nan
newaccsla_comp = convertMillisOrNan(issues.fields.customfield_10705.completedCycles[0].remainingTime.millis)
newaccsla_ongoing = convertMillisOrNan(issues.fields.customfield_10705.ongoingCycle.remainingTime.millis)
# etc.
Or, maybe the exception you're trying to handle comes a bit farther up. You're always calling convertMillis on <something>.remainingTime.millis. What if, say, the field always exists, and always has an ongoingCycle, but that doesn't always have a remainingTime attribute? Then you can push that part into the try:, and also simplify things even further at the same time:
def convertCycle(cycle):
try:
return convertMillis(cycle.remainingTime.millis)
except:
return np.nan
newaccsla_comp = convertCycle(issues.fields.customfield_10705.completedCycles[0])
newaccsla_ongoing = convertCycle(issues.fields.customfield_10705.ongoingCycle)
If the exception comes even higher up—e.g., if the field doesn't always have an ongoingCycle—obviously you need to push more of the expression inside the try: block; I'm really just making a guess here at what you're trying to handle with that except:.
And, while you're at it, do you really want a bare except:? That will handle any exception, not just an AttributeError or ValueError or whatever kind of exception you were actually expecting.
Meanwhile, your existing jira_hours refactor doesn't work because you can't just use .field when field is a variable holding a string. One way to solve that is:
def get_jira_hours(field):
comp = convertCycle(field.completedCycles[0])
ongoing = convertCycle(field.ongoingCycle)
return comp, ongoing
newaccsla_comp, newaccsla_ongoing = get_jira_hours(issues.fields.customfield_10705)
paymentssla_comp, paymentssla_ongoing = get_jira_hours(issues.fields.customfield_10136)
# etc.
Another way to solve it is with getattr—which I'll show below.
But you can do even better. Do you really need these all to be independent variables, rather than, say, items in a dict?
fieldmap = {
'newaccsla': 'customfield_10136',
'paymentssla': 'customfield_10705',
# etc.
}
values = {}
for fieldname, customfieldname in fieldmap.items():
field = getattr(issues.fields, customfieldname)
comp, ongoing = get_jira_hours(field)
values[f'{fieldname}_comp'] = comp
values[f'{fieldname}_ongoing'] = ongoing
Now, instead of using newaccsla_comp, you have to use values['newaccsla_comp']. But I suspect your code is actually going to be doing a lot of code where you copy and paste the same thing for each variable, which you can replace with code that just loops over the dict.
But if you really do need these to be independent variables—which, again, you probably doing—you can do the same thing by just using globals() or locals() instead of values.
On the other hand, if you're going to be repeating yourself over comp/ongoing pairs of values, just store the pairs in the dict: values[fieldname] = comp, ongoing.
Also, since all of the custom field names seem to be customfield_NNNNN, you can simplify things even further, by mapping 'newaccsla': 10136, etc., and then doing getattr(issue.fields, f'customfield_{customfield}').
Related
Is there any pythonic way to do the same as code bellow, but in a pythonic way?
I created this code for web scraping a website, but I think there should be a better way for adding the contents to lists other than repeating the same code for each element.
here are the lists i will add elements to:
Proporcao_de_Sobras = []
liq_dir =[]
liq_sobras=[]
liq_reservas=[]
Encerramento=[]
n_emissao =[]
tp_ofert =[]
inv_minimo =[]
And here is the code I am using to add the elements to lists.
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]'):
Proporcao_de_Sobras.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[4]'):
liq_dir.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[6]'):
liq_sobras.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[8]'):
liq_reservas.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[10]'):
Encerramento.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[12]'):
n_emissao.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[14]'):
tp_ofert.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[16]'):
inv_minimo.append(x.text)
except:
pass
This goes on for more 5 or 6 times.
Here's another pythonic way using dictionaries:
def get_data(your_lists):
data = {}
for list_index, list_name in enumerate(your_lists):
try:
data[list_name] = [x for x in find_elements_by_xpath(f'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[{(list_index + 1) * 2}]')]
except:
pass
return data
your_lists = ['Proporcao_de_Sobras', 'liq_dir', 'loq_reservas', 'Encerramento', 'n_emissao', 'tp_ofert', 'inv_minimo']
all_data = get_data(your_lists)
Pythonic way N1, using mutability of lists:
def get_text(x_path, dest_list):
for x in driver.find_elements_by_xpath(x_path):
dest_list.append(x.text)
Proporcao_de_Sobras = []
get_text('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]', Proporcao_de_Sobras)
Pythonic way N2, using dicts:
paths = {
'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]': [],
'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[4]': [],
....
}
for k, v in paths.items():
for x in driver.find_elements_by_xpath(k):
v.append(x.text)
You can use a function for it. I would advise to catch specific exceptions though.
def fill_elem(fill_list, xpath):
try:
for x in driver.find_elements_by_xpath(xpath)
fill_list.append(x.text)
except SomeException:
pass
else:
return fill_list
proporcao_de_sobras = []
proporcao_de_sobras = fill_elem(proporcao_de_sobras, r'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]')
I would like to improve the way this code is written. Right now I have six methods that are almost copy-paste, only one line is changing. How can I make a generic method and depending on the property of the data input to change the calculations? I was thinking to use functional programming to achieve that, but I am not sure how to do it properly.
The method is getting a dict object. Then this object is transformed into JSON. The mid variable is storing a JSON with midrate for currency from external API, it must be before the for loop otherwise the API will be called in every iteration and this slows down the process a lot! Then in the for loop, I iterate through the data from the input. The only difference between methods is the calculation before inserting it in the list. .append(mid_current - bankMSell)
def margin_to_exchange_rate_sell(data):
j = data.to_JSON()
list_p = []
mid = midrate.get_midrate(j["fromCurrency"][0])
for idx, val in enumerate(j['toCurrency']):
try:
mid_current = 1/get_key(mid, j['toCurrency'][idx])
bankMSell = float(j['sellMargin'][idx])
list_p.append(mid_current - bankMSell)
except Exception as e:
list_p.append(0)
print(str(e))
return list_p
Another one of the methods:
def margin_to_exchange_rate_buy(data):
j = data.to_JSON()
list_p = []
mid = midrate.get_midrate(j["fromCurrency"][0])
for idx, val in enumerate(j['toCurrency']):
try:
mid_current = 1/get_key(mid, j['toCurrency'][idx])
bankMSell = float(j['sellMargin'][idx])
list_p.append(mid_current + bankMSell)
except Exception as e:
list_p.append(0)
print(str(e))
return list_p
Indeed, there is a way to reduce code here with lambdas:
def margin_to_exchange_rate_sell(data):
return margin_to_exchange_rate(data, lambda m, b: m - b)
def margin_to_exchange_rate_buy(data):
return margin_to_exchange_rate(data, lambda m, b: m + b)
def margin_to_exchange_rate(data, operation):
j = data.to_JSON()
list_p = []
mid = midrate.get_midrate(j["fromCurrency"][0])
for idx, val in enumerate(j['toCurrency']):
try:
mid_current = 1/get_key(mid, j['toCurrency'][idx])
bankMSell = float(j['sellMargin'][idx])
list_p.append(operation(mid_current, bankMSell))
except Exception as e:
list_p.append(0)
print(str(e))
return list_p
Suppose I have some (simplified) BeautifulSoup code like this, pulling data into a dictionary:
tournament_info = soup.find_all('li')
stats['Date'] = tournament_info[0].text
stats['Location'] = tournament_info[1].text
stats['Prize'] = tournament_info[3].text.split(':')[1].strip()
In the case where the initial find_all returns an exception, I want all the dictionary entries to be 'None'. And in the case of any of the individual dictionary assignments returning an exception, I want 'None' too.
Is there any nice way to write this, other than something horrible like below?
try:
tournament_info = soup.find_all('li')
except:
m_stats['Date'] = 'None'
m_stats['Location'] = 'None'
m_stats['Prize'] = 'None'
try:
m_stats['Date'] = tournament_info[0].text
except:
m_stats['Date'] = 'None'
try:
m_stats['Location'] = tournament_info[1].text
except:
m_stats['Location'] = 'None'
try:
m_stats['Prize'] = tournament_info[3].text.split(':')[1].strip()
except:
m_stats['Prize'] = 'None'
Create own class
class Stats(dict):
tournament_info = []
def __init__(self, tournament_info, **kwargs):
super(Stats, self).__init__(**kwargs)
self.tournament_info = tournament_info
self['Date'] = self.get_tournament_info_text(0)
self['Location'] = self.get_tournament_info_text(1)
prize = self.get_tournament_info_text(2)
if prize is not None:
prize = prize.split(':')[1].strip()
self['Prize'] = prize
def get_tournament_info_text(self, index):
try:
return self.tournament_info[index]['text']
except:
return None
tournament_info = [
{
'text': 'aaa'
},
{},
{
'text': 'bbb:ccc '
}
]
m_stats = Stats(tournament_info)
print m_stats
Here's what I can suggest for your code:
info = soup.find_all('li')
if not info:
m_stats = dict.fromkeys(m_stats, None)
return
mappings = {
'Date': 0,
'Location': 1,
'Prize': 3
}
for key in mappings:
value = None
try:
value = info[mappings[key]].text
if mappings[key] == 3:
value = value.split(':')[1].strip()
except IndexError:
pass
m_stats[key] = value
Alternatively, you can create a function that will handle the exceptions for you:
def get_value(idx):
value = None
try:
value = info[idx].text
except IndexError:
pass
return value
m_stats['Date'] = get_value(0)
m_stats['Location'] = get_value(1)
m_stats['Prize'] = get_value(3)
if m_stats['Prize']:
m_stats['Prize'].split(':')[1].strip()
The solution I went for was to create a blank template dictionary (actually a JSON) with all the keys set to 'None'.
Every time the page is scraped, m_stats is first initialised with this blank dictionary (loaded from the JSON). If an exception occurs, it is just simply passed (with some logging), and the value is left as 'None'. There is then no need to explicitly assign 'None' every single time.
Not sure if it's correct to mark this as the "answer", as it is quite specific to my needs, but that's what I did anyway.
I have a dictionary list of size ~250k in python (i.e 250k dictionaries in a list), which I try to process as shown below. The aim is to clean up the dictionary and return an iterable at the end. So, I have something like this:
def check_qs(dict_list_in):
try:
del_id=[]
for i in dict_list_in:
tmp=i["get_url"][0]
if i["from"][0]=="var0":
try:
URLValidator()(tmp)
except:
del_id.append( i["id"] )
elif i["from"][0]=="var1":
try:
URLValidator()( tmp.split("\"")[1] )
except:
del_id.append( i["id"] )
elif i["from"][0]=="var2":
try:
URLValidator()( tmp.split("\'")[1] )
except:
del_id.append( i["id"] )
else:
del_id.append( i["id"] )
gc.collect()
result = filter(lambda x: x['id'] not in del_id,dict_list_in)
return result
except:
return dict_list_in
What I am doing above, is checking each dictionary in ths list for some condition, and if this fails, I get the id and then use filter to delete those dictionaries specific from the list.
At the moment, this takes a long time to run - and I was wondering if there were any obvious optimizations I am missing out on. I think at the moment the above code is too naive.
I made a couple changes. I put the validation instance out of the loop so that you don't have to initialize it every time. If it's required to instantiate every time, just move it into the try accept block. I also changed from deleting items in the original list, to appending the items to a new list that you want, removing the need for a filter. I also moved the validation out of the if statements so that if you hit the else statement you don't have to run the validation. Look at the logic of the if statements, it is the same as yours. It appears that you are using django, but if you aren't change the except to except Exception.
from django.core.exceptions import ValidationError
def check_qs(dict_list_in):
new_dict_list = []
validate = URLValidator()
for i in dict_list_in:
test_url = i["get_url"][0]
if i["from"][0] == "var0":
pass
elif i["from"][0] == "var1":
test_url = test_url.split("\"")[1]
elif i["from"][0] == "var2":
test_url = test_url.split("\'")[1]
else:
continue
try:
validate(test_url)
# If you aren't using django you can change this to 'Exception'
except ValidationError:
continue
new_dict_list.append(i)
return new_dict_list
I have this long list of try except statement:
try:
uri = entry_obj['media$group']['media$content'][0]['url']
except (KeyError, IndexError):
uri = None
try:
position = entry_obj['yt$position']['$t']
except KeyError:
position = None
try:
description = entry_obj['content']['$t']
except KeyError:
description = None
try:
seconds = entry_obj['media$group']['yt$duration']['seconds']
except KeyError:
seconds = None
try:
thumbnails = entry_obj['media$group']['media$thumbnail']
except KeyError:
thumbnails = None
Is there a more concise way to write this?
If you tire of figuring out what to use for default values in get() calls, just write a helper function:
def resolve(root, *keys):
for key in keys:
try:
root = root[key]
except (KeyError, IndexError):
return None
return root
Then you just write, e.g.:
uri = resolve(entry_obj, 'media$group', 'media$content', 0, 'url')
To simplify the calls a little, you might beef up the helper function to take a single string for the keys and split on spaces; that way you don't have to type so many quotes, and we can also add a default value argument:
def resolve(root, keys, default=None):
for key in keys.split():
try:
root = root[key]
except (TypeError, KeyError):
try:
root = root[int(key)]
except (IndexError, ValueError, KeyError):
return default
uri = resolve(entry_obj, 'media$group media$content 0 url', '')
I thought of another good way to do this, not sure how it compares to kindall's method. We first define a method property:
def res(self, property):
try:
return property()
except (KeyError, IndexError):
return None
Then replace the try-except statements with:
url = res(lambda: entry_obj['media$group']['media$content'][0]['url'])
position = res(lambda: entry_obj['yt$position']['$t'])
description = res(lambda: entry_obj['content']['$t'])
duration = res(lambda: entry_obj['media$group']['yt$duration']['seconds'])
thumbnails = res(lambda: entry_obj['media$group']['media$thumbnail'])
Use the get method of dictionaries instead:
position = entry_object.get('yt$position').get('$t')
get will handle the case of a key not existing for you, and give you a (changable) fallback value instead in that case. You'll still need to handle the first IndexError manually, but all the ones that are just except KeyError: will disappear.