Is there any pythonic way to do the same as code bellow, but in a pythonic way?
I created this code for web scraping a website, but I think there should be a better way for adding the contents to lists other than repeating the same code for each element.
here are the lists i will add elements to:
Proporcao_de_Sobras = []
liq_dir =[]
liq_sobras=[]
liq_reservas=[]
Encerramento=[]
n_emissao =[]
tp_ofert =[]
inv_minimo =[]
And here is the code I am using to add the elements to lists.
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]'):
Proporcao_de_Sobras.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[4]'):
liq_dir.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[6]'):
liq_sobras.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[8]'):
liq_reservas.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[10]'):
Encerramento.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[12]'):
n_emissao.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[14]'):
tp_ofert.append(x.text)
except:
pass
try:
for x in driver.find_elements_by_xpath('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[16]'):
inv_minimo.append(x.text)
except:
pass
This goes on for more 5 or 6 times.
Here's another pythonic way using dictionaries:
def get_data(your_lists):
data = {}
for list_index, list_name in enumerate(your_lists):
try:
data[list_name] = [x for x in find_elements_by_xpath(f'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[{(list_index + 1) * 2}]')]
except:
pass
return data
your_lists = ['Proporcao_de_Sobras', 'liq_dir', 'loq_reservas', 'Encerramento', 'n_emissao', 'tp_ofert', 'inv_minimo']
all_data = get_data(your_lists)
Pythonic way N1, using mutability of lists:
def get_text(x_path, dest_list):
for x in driver.find_elements_by_xpath(x_path):
dest_list.append(x.text)
Proporcao_de_Sobras = []
get_text('//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]', Proporcao_de_Sobras)
Pythonic way N2, using dicts:
paths = {
'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]': [],
'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[4]': [],
....
}
for k, v in paths.items():
for x in driver.find_elements_by_xpath(k):
v.append(x.text)
You can use a function for it. I would advise to catch specific exceptions though.
def fill_elem(fill_list, xpath):
try:
for x in driver.find_elements_by_xpath(xpath)
fill_list.append(x.text)
except SomeException:
pass
else:
return fill_list
proporcao_de_sobras = []
proporcao_de_sobras = fill_elem(proporcao_de_sobras, r'//*[#id="tablepress-6"]/tbody/tr[*]/td/span[2]')
Related
I would like to improve the way this code is written. Right now I have six methods that are almost copy-paste, only one line is changing. How can I make a generic method and depending on the property of the data input to change the calculations? I was thinking to use functional programming to achieve that, but I am not sure how to do it properly.
The method is getting a dict object. Then this object is transformed into JSON. The mid variable is storing a JSON with midrate for currency from external API, it must be before the for loop otherwise the API will be called in every iteration and this slows down the process a lot! Then in the for loop, I iterate through the data from the input. The only difference between methods is the calculation before inserting it in the list. .append(mid_current - bankMSell)
def margin_to_exchange_rate_sell(data):
j = data.to_JSON()
list_p = []
mid = midrate.get_midrate(j["fromCurrency"][0])
for idx, val in enumerate(j['toCurrency']):
try:
mid_current = 1/get_key(mid, j['toCurrency'][idx])
bankMSell = float(j['sellMargin'][idx])
list_p.append(mid_current - bankMSell)
except Exception as e:
list_p.append(0)
print(str(e))
return list_p
Another one of the methods:
def margin_to_exchange_rate_buy(data):
j = data.to_JSON()
list_p = []
mid = midrate.get_midrate(j["fromCurrency"][0])
for idx, val in enumerate(j['toCurrency']):
try:
mid_current = 1/get_key(mid, j['toCurrency'][idx])
bankMSell = float(j['sellMargin'][idx])
list_p.append(mid_current + bankMSell)
except Exception as e:
list_p.append(0)
print(str(e))
return list_p
Indeed, there is a way to reduce code here with lambdas:
def margin_to_exchange_rate_sell(data):
return margin_to_exchange_rate(data, lambda m, b: m - b)
def margin_to_exchange_rate_buy(data):
return margin_to_exchange_rate(data, lambda m, b: m + b)
def margin_to_exchange_rate(data, operation):
j = data.to_JSON()
list_p = []
mid = midrate.get_midrate(j["fromCurrency"][0])
for idx, val in enumerate(j['toCurrency']):
try:
mid_current = 1/get_key(mid, j['toCurrency'][idx])
bankMSell = float(j['sellMargin'][idx])
list_p.append(operation(mid_current, bankMSell))
except Exception as e:
list_p.append(0)
print(str(e))
return list_p
This is more a general programming question than related to the actual code.
I have this ugly code that takes an input from JIRA and converts it from milliseconds to hours written out multiple times like below:
def convertMillis(ms):
hours = ms / 1000 / 60 / 60
return hours
try:
newaccsla_comp = convertMillis(issues.fields.customfield_10705.completedCycles[0].remainingTime.millis)
except:
newaccsla_comp = np.nan
try:
newaccsla_ongoing = convertMillis(issues.fields.customfield_10705.ongoingCycle.remainingTime.millis)
except:
newaccsla_ongoing = np.nan
try:
paymentssla_comp = convertMillis(issues.fields.customfield_10136.completedCycles[0].remainingTime.millis)
except:
paymentssla_comp = np.nan
try:
paymentssla_ongoing = convertMillis(issues.fields.customfield_10136.ongoingCycle.remainingTime.millis)
except:
paymentssla_ongoing = np.nan
try:
modifysla_comp = convertMillis(issues.fields.customfield_10713.completedCycles[0].remainingTime.millis)
except:
modifysla_comp = np.nan
try:
modifysla_ongoing = convertMillis(issues.fields.customfield_10713.ongoingCycle.remainingTime.millis)
except:
modifysla_ongoing = np.nan
try:
MFsla_comp = convertMillis(issues.fields.customfield_10711.completedCycles[0].remainingTime.millis)
except:
MFsla_comp = np.nan
try:
MFsla_ongoing = convertMillis(issues.fields.customfield_10711.ongoingCycle.remainingTime.millis)
except:
MFsla_ongoing = np.nan
try:
closeaccsla_comp = convertMillis(issues.fields.customfield_10140.completedCycles[0].remainingTime.millis)
except:
closeaccsla_comp = np.nan
try:
closeaccsla_ongoing = convertMillis(issues.fields.customfield_10140.ongoingCycle.remainingTime.millis)
except:
closeaccsla_ongoing = np.nan
try:
casla_comp = convertMillis(issues.fields.customfield_10213.completedCycles[0].remainingTime.millis)
except:
casla_comp = np.nan
try:
casla_ongoing = convertMillis(issues.fields.customfield_10213.ongoingCycle.remainingTime.millis)
except:
casla_ongoing = np.nan
try:
at_comp = convertMillis(issues.fields.customfield_10144.completedCycles[0].remainingTime.millis)
except:
at_comp = np.nan
try:
at_ongoing = convertMillis(issues.fields.customfield_10144.ongoingCycle.remainingTime.millis)
except:
at_ongoing = np.nan
try:
modfeesla_comp = convertMillis(issues.fields.customfield_10134.completedCycles[0].remainingTime.millis)
except:
modfeesla_comp = np.nan
try:
modfeesla_ongoing = convertMillis(issues.fields.customfield_10134.ongoingCycle.remainingTime.millis)
except:
modfeesla_ongoing = np.nan
try:
tdsla_comp = convertMillis(issues.fields.customfield_11200.completedCycles[0].remainingTime.millis)
except:
tdsla_comp = np.nan
try:
tdsla_ongoing = convertMillis(issues.fields.customfield_11200.ongoingCycle.remainingTime.millis)
except:
tdsla_ongoing = np.nan
try:
querysla_comp = convertMillis(issues.fields.customfield_10142.completedCycles[0].remainingTime.millis)
except:
querysla_comp = np.nan
try:
querysla_ongoing = convertMillis(issues.fields.customfield_10142.ongoingCycle.remainingTime.millis)
except:
querysla_ongoing = np.nan
try:
recsla_comp = convertMillis(issues.fields.customfield_15600.completedCycles[0].remainingTime.millis)
except:
recsla_comp = np.nan
try:
recsla_ongoing = convertMillis(issues.fields.customfield_15600.ongoingCycle.remainingTime.millis)
except:
recsla_ongoing = np.nan
try:
reportsla_comp = convertMillis(issues.fields.customfield_15601.completedCycles[0].remainingTime.millis)
except:
reportsla_comp = np.nan
try:
reportsla_ongoing = convertMillis(issues.fields.customfield_15601.ongoingCycle.remainingTime.millis)
except:
reportsla_ongoing = np.nan
I would be comfortable doing something like taking all the custom fields, putting them in one list then doing a for over the function like this:
field_list = ['customfield_10705','customfield_10136','customfield_10713','customfield_10711','customfield_10140','customfield_10213','customfield_10144','customfield_10134','customfield_11200','customfield_10142','customfield_15600','customfield_15601']
def get_jira_hours(field):
try:
newaccsla_comp = convertMillis(issues.fields.field.completedCycles[0].remainingTime.millis)
except:
newaccsla_comp = np.nan
try:
newaccsla_ongoing = convertMillis(issues.fields.field.ongoingCycle.remainingTime.millis)
except:
newaccsla_ongoing = np.nan
for field in field_list:
get_jira_hours(field)
However there is three variables that are linked to each function call that I need to iterate over - the customfield_10705 and the name to save each try/except too newaccsla_comp and newaccsla_ongoing.
Here's the variables in order.. ie. field_list[0] is linked to name_list[0]
field_list = ['customfield_10705','customfield_10136','customfield_10713','customfield_10711','customfield_10140','customfield_10213','customfield_10144','customfield_10134','customfield_11200','customfield_10142','customfield_15600','customfield_15601']
name_list = ['newaccsla','paymentssla','modifysla','MFsla','closeaccsla','casla','at','modfeesla','tdsla','querysla','recsla','reportssla']
Best way to iterate over these? Thanks.
First, you can turn each of those four-line blocks into a one-liner if you just edit your convertMillis function to return np.nan instead of raising—or, if you can't do that, wrap the function in another one:
def convertMillisOrNan(millis):
try:
return convertMillis(millis)
except:
return np.nan
newaccsla_comp = convertMillisOrNan(issues.fields.customfield_10705.completedCycles[0].remainingTime.millis)
newaccsla_ongoing = convertMillisOrNan(issues.fields.customfield_10705.ongoingCycle.remainingTime.millis)
# etc.
Or, maybe the exception you're trying to handle comes a bit farther up. You're always calling convertMillis on <something>.remainingTime.millis. What if, say, the field always exists, and always has an ongoingCycle, but that doesn't always have a remainingTime attribute? Then you can push that part into the try:, and also simplify things even further at the same time:
def convertCycle(cycle):
try:
return convertMillis(cycle.remainingTime.millis)
except:
return np.nan
newaccsla_comp = convertCycle(issues.fields.customfield_10705.completedCycles[0])
newaccsla_ongoing = convertCycle(issues.fields.customfield_10705.ongoingCycle)
If the exception comes even higher up—e.g., if the field doesn't always have an ongoingCycle—obviously you need to push more of the expression inside the try: block; I'm really just making a guess here at what you're trying to handle with that except:.
And, while you're at it, do you really want a bare except:? That will handle any exception, not just an AttributeError or ValueError or whatever kind of exception you were actually expecting.
Meanwhile, your existing jira_hours refactor doesn't work because you can't just use .field when field is a variable holding a string. One way to solve that is:
def get_jira_hours(field):
comp = convertCycle(field.completedCycles[0])
ongoing = convertCycle(field.ongoingCycle)
return comp, ongoing
newaccsla_comp, newaccsla_ongoing = get_jira_hours(issues.fields.customfield_10705)
paymentssla_comp, paymentssla_ongoing = get_jira_hours(issues.fields.customfield_10136)
# etc.
Another way to solve it is with getattr—which I'll show below.
But you can do even better. Do you really need these all to be independent variables, rather than, say, items in a dict?
fieldmap = {
'newaccsla': 'customfield_10136',
'paymentssla': 'customfield_10705',
# etc.
}
values = {}
for fieldname, customfieldname in fieldmap.items():
field = getattr(issues.fields, customfieldname)
comp, ongoing = get_jira_hours(field)
values[f'{fieldname}_comp'] = comp
values[f'{fieldname}_ongoing'] = ongoing
Now, instead of using newaccsla_comp, you have to use values['newaccsla_comp']. But I suspect your code is actually going to be doing a lot of code where you copy and paste the same thing for each variable, which you can replace with code that just loops over the dict.
But if you really do need these to be independent variables—which, again, you probably doing—you can do the same thing by just using globals() or locals() instead of values.
On the other hand, if you're going to be repeating yourself over comp/ongoing pairs of values, just store the pairs in the dict: values[fieldname] = comp, ongoing.
Also, since all of the custom field names seem to be customfield_NNNNN, you can simplify things even further, by mapping 'newaccsla': 10136, etc., and then doing getattr(issue.fields, f'customfield_{customfield}').
Is there a smart way to write the following code in three or four lines?
a=l["artist"]
if a:
b=a["projects"]
if b:
c=b["project"]
if c:
print c
So I thought for something like pseudocode:
a = l["artist"] if True:
How about:
try:
print l["artist"]["projects"]["project"]
except KeyError:
pass
except TypeError:
pass # None["key"] raises TypeError.
This will try to print the value, but if a KeyError is raised, the except block will be run. pass means to do nothing. This is known and EAFP: it’s Easier to Ask Forgiveness than Permission.
I don't necessarily think that this is better but you could do:
try:
c = l["artist"]["projects"]["project"]
except (KeyError, TypeError) as e:
print e
pass
p = l.get('artist') and l['artist'].get('projects') and l['artist']['projects'].get('project')
if p:
print p
You can also make a more general function for this purpose:
def get_attr(lst, attr):
current = lst
for a in attr:
if current.get(a) is not None:
current = current.get(a)
else:
break
return current
>>> l = {'artist':{'projects':{'project':1625}}}
>>> get_attr(l,['artist','projects','project'])
1625
One-liner (as in the title) without exceptions:
if "artist" in l and l["artist"] and "projects" in l["artist"] and l["artist"]["projects"] and "project" in l["artist"]["projects"]: print l["artist"]["projects"]["project"]
Since you're dealing with nested dictionaries, you might find this generic one-liner useful because it will allow you to access values at any level just by passing it more keys arguments:
nested_dict_get = lambda item, *keys: reduce(lambda d, k: d.get(k), keys, item)
l = {'artist': {'projects': {'project': 'the_value'}}}
print( nested_dict_get(l, 'artist', 'projects', 'project') ) # -> the_value
Note: In Python 3, you'd need to add a from functools import reduce at the top.
I intend to write in a single file (for each function), but inside the "loop in the loop" I got trapped.
It's working except the storage/ save part,
now writes a file for each inner loop: ## def t2(): ##
But I wish to improve and also work with the current 'dic' or 'list' in the next pool/ funtion t'x'(): and so on, to avoid have to open the csv in the jorney.
what's the lesson over here? :p
It's my 1st data scrape, I'm new to python!
import
def t0(url): # url
soup ('http://www.foo.net')
return soup
def t1(): # 1st_pool
soup = t0()
dic = {}
with open('dic.csv', 'w') as f:
for x in range(15):
try:
collect
dic[name] = link
f.write('{0};{1}\n'.format(name, link))
except:
pass
return dic
def t2(): # 2nd_pool
dic = t1()
dic2 = {}
for k,v in dic.items():
time.sleep(3)
with open(k+'_dic.csv', 'w') as f:
for x in range(13):
try:
collect
dic2[name] = link
f.write('{0};{1}\n'.format(name, link))
except:
pass
return ###############
def t3(): ... # 3rd_pool
def t4(): ... # 4th_pool
def t5(): ... # 5th_pool
def t6(): ... # full_path /to /details
As I mention early, the "problem" resides only in the fact that was creating a individual *.csv (to not overwrite the previous loop) for each loop, so now I figured out how to create a single file.csv for each function:
def t2(): # 2nd_pool
dic = t1()
dic2 = {}
for k,v in dic.items():
time.sleep(3)
##
for x in range(13):
try:
collect
dic2[name] = link
##
except:
pass
##
with open('dic2.csv', 'w') as f:
for n,j in dic2.items():
f.write('{0};{1}\n'.format(n, j))
##
return dic2
I simply moved the "*.csv operation" ( ## represent the chages) to the end of the function, outside of the "double loop", and the dictionary it's also available in the next function # t3():# and so on.
I was trying to achieve that without write the extra loop, so if someone can provide a better alternative, I would like to learn!
I have a function which returns a list of objects (I used the code below for example). Each object has attribute called text:
def mylist():
mylist = []
for i in range(5):
elem = myobject(i)
mylist.append(elem)
return mylist
for obj in mylist():
print obj.text
How can I rewrite this code so mylist() returned each iteration new value and I iterate over iterator? In other words how can I reuse here a mylist in python so use it like xrange()?
If I understood right, you're looking for generators:
def mylist():
for i in range(5):
elem = myobject(i)
yield elem
Complete code for you to play with:
class myobject:
def __init__(self, i):
self.text = 'hello ' + str(i)
def mylist():
for i in range(5):
elem = myobject(i)
yield elem
for obj in mylist():
print obj.text
You can also use a generator expression:
mylist = (myobject(i) for i in range(5))
This will give you an actual generator but without having to declare a function beforehand.
Please note the usage of parentheses instead of brackets to denote a generator comprehension instead of a list comprehension
What georg said, or you can return the iter of that list
def mylist():
mylist = []
for i in range(5):
mylist.append(myobject(i))
return iter(mylist)
probably not a good idea to use your function name as a variable name, though :)
llist = [0,4,5,6]
ii = iter(llist)
while (True):
try:
print(next(ii))
except StopIteration:
print('End of iteration.')
break