Whats the best way to refactor this code to clean it up:
1) Selects from the db adding a column for the percentage difference between two columns
2) Loops through the values of the columns
3) If the date is in the past
4) If the price is greater than 500 and the percentage difference is less than 1st argument set flag to 1
5) Else if the price is less than 500 and the percentage difference is less than
2nd argument set flag to 1
6) Otherwise keep the flag as 0
def calculateEmployeeSpend(read_cursor, flag_higher_amount, flag_lower_budget):
read_cursor.execute("SELECT distinct b.employee_id, b.amount, "
"s.spend, b.date, b.amount - s.spend as spend_left, "
"100.0*(b.amount - s.spend) / b.amount As PercentDiff FROM employee_budget_upload "
"As b JOIN employee_budget_spent As s ON b.employee_id = s.employee_id where b.amount != 0")
for employee_id, amount, spend, date, spend_left, percent_diff in read_cursor:
flag=0
date_of_amount = dt.strptime(date, "%d/%m/%Y")
if date_of_amount <= dt.now():
if amount > 500 and percent_diff < int(flag_higher_amount):
flag=1
if amount < 500 and percent_diff < int(flag_lower_amount):
flag=1
Edit:
I have changed the ifs to one if:
if amount > 500 and percent_diff < int(flag_higher_amount) or amount < 500 and percent_diff < int(flag_lower_amount):
flag=1
Extract out the SQL command into a file.sql. Give the function either the path to the sql file or the text of the sql file.
Rename the flag to its purpose.
your if and elif both sets the flag to 1 so why the differences? Combine it to one condition
'500' should be a variable of the function.
Write a short description of what the function does in """ """. You can specify what every parameter is if you want.
Since there seems to be several instances of keeping track of state, it might help in the long run to decouple the logic into classes and methods.
First of all the definition of 'best' depend on your goal, like: readability, efficiency, performance and so on.
In many cases I wold prefer to solve task like this by reading the whole dataset in pandas DataFrame and utilise one (or set off) convenient and expressive pandas idioms.
Or, by writing more sophisticated SQL statement which allow to solve the end to end task on database side.
As for common best practice for refactoring, I would recommend externalise the magic values like "500" or "%d/%m/%Y" to a constant or method parameter.
Give a "flag" more self-spoken name.
If case with amount exactly equal to 500 is purposely should lead to flag equal to zero, then it should be better explicitly reflected in comments.
In order to avoid code duplication (flag=1) it is better to combine the if statements, like this:
if amount > 500 and percent_diff < int(flag_higher_amount) or \
amount < 500 and percent_diff < int(flag_lower_amount):
flag=1
You also can create a function with self-spoken name, and move whole condition inside such a function:
if is_percent_inside_amount_bounds(
percent_diff, amount, flag_lower_amount, flag_higher_amount):
flag = 1
or just
flag = is_percent_inside_amount_bounds(
percent_diff, amount, flag_lower_amount, flag_higher_amount)
In case of amount equal exactly 500 could be interpreted like amount<=500 the condition could be transformed to more laconic:
flag = percent_diff < int(
flag_lower_amount if amount>500 else flag_higher_amount)
but I would not recommend to use ternary operator in production code in cases like this, because it usually reduce readability.
Related
# total payments = the sum of monthly payments
# object-level method for calculation in Loan class
def totalPayments(self):
# the monthly payment might be different depending on the period
t = 0 # initialize the period
m_sum = 0 # initialize the sum
while t < self._term: # run until we reach the total term
m_sum += self.monthlyPayment(t) # sum up each monthly payment
t += 1 # go to next period
return m_sum
monthly payment might be different depending on different period, so instead of simply multiplying it by term, I chose to sum up each payment individually. Is there a easier way of doing this?
I thought to do this at first
sum(payment for payment in self.monthlyPayment(t) if term <= t)
But t is not initialized and won't be incremented to calculate each payment. So I was wondering if there is any easier approach that could possibly achieve the above functionality in a single line or so?
Your variable t increments by 1 each time, so why don't you use a range object?
for t in range(0, self._term): # You can omitt the 0
...
So, if you want to mantain your comprehension, the best way should be this:
sum(self.monthlyPayment(t) for t in range(self._term))
You're close, but you need to iterate over ts here, and range lets you bake in the end condition:
sum(self.monthlyPayment(t) for t in range(self._term))
or if you like using map (slightly less verbose since you've already got a method doing what you want, if less familiar to some, and perhaps trivially faster by avoiding bytecode execution during the loop):
sum(map(self.monthlyPayment, range(self._term)))
I think the proper statement would be
sum(self.monthlyPayment(t) for t in range(self._term))
self.monthlyPayment(t) doesn't return a sequence that you can iterate over. You need to loop over the range of arguments to this function and call it for each.
sum(self.monthyPayment(t) for t in range(self._term))
That should do it.
m_sum = sum(self.monthlyPayment(t) for t in range(self._term))
How to make code similar to the one below run faster.
I know you can use a dictionary for equality if-statements but no sure with this one.
Delta = 3
if (x - y) >= Delta:
pass
elif y < Delta:
pass
else:
pass
Here's an example of how a dictionary lookup would look here, if you really wanted to use one:
def do_something():
pass
def do_something_else_1():
pass
def do_something_else_2():
pass
{
y < Delta: do_something_else_1,
x - y >= Delta: do_something
}.get(True, do_something_else_2)()
But I can guarantee you this will run slower (mainly because all the conditions are greedily evaluated now, instead of lazily). The reason you cannot optimize your existing code with a dictionary lookup is because a dictionary lookup excels where computing a hash followed by computing equality with the narrowed search space is faster than computing equality with the entire search space. Because of this benefit, you have to pay the upfront cost of constructing the hash table in the first place.
However, you aren't checking equality here. You're using the inequality functions < and >=, which don't play nice with the concept of a hash table. The hash of a bool (the result of this inequality function) is no quicker to compute compared to using the bool itself, which means that constructing the hash table here will outweigh any time savings you get by using the constructed hash table immediately afterwards. Seeing as x and y may change each time, there's no way for you to cache this hash table, meaning you suffer the cost of construction each time.
Keep the code as it is.
Optimization usually takes advantage of some common expression or common code. You have none here. At the register level, your comparisons look like:
load r1, x
sub r1, y
sub r1, 3
brlt ELIF # if less than 0, branch to ELIF
# TRUE-clause stuff, aka "Do something"
br ENDIF
ELIF:
load r1, y
sub r1, 3
brge ELSE # if >= 0, branch to ELSE
# ELIF-clause stuff, aka "Do something else #1"
ELSE:
# ELSE-caluse stuff, aka "Do something else #2"
ENDIF:
# Remainder of program
The only commonality her in either data or flow is loading y into a register. Any reasonably aware optimization level will do this for you -- it will alter the first expression to load y into r2, a trivial cost in micro-instruction utilization.
There appears to be nothing else to optimize here. The normal flow analysis will recognize that 3 is a constant within this block, and substitute the immediate operand for Delta.
What's the most Pythonic way of giving flags values that are user-friendly and self-evident for whoever is reading the code?
Assume I got the following method for a class, which depending on the input will calculate a certain multiplier and set a cracking_mode flag for that class instance:
def evaluate_first_matrix_crack(self, base_sigma22, base_tau12):
# Mixed transverse tension-shear
if base_sigma22 > 0 and base_tau12 != 0:
self.cracking_mode = 2
multiplier = sqrt(1. / ((base_sigma22 / self.Yt_is) ** 2 + (base_tau12 / self.S_is) ** 2))
# Pure transverse tension
elif base_sigma22 > 0 and base_tau12 == 0:
self.cracking_mode = 0
multiplier = self.Yt_is / base_sigma22
# Pure shear
elif base_sigma22 <= 0 and base_tau12 != 0:
self.cracking_mode = 1
multiplier = self.Yt_is / base_sigma22
return multiplier
The "cracking_mode" flag will affect which methods are called in other sections of the code. I want the flag values to be as user-friendly as possible, so that when they are checked through if statements in other sections of the code, the reader can immediately tell which flag value corresponds to which option.
So rather than having self.cracking_mode = 2 I would ideally have self.cracking_mode = "mixed_transverse_tension_shear".
However, I don't think that's a Pythonic way of doing things, aside from the fact that comparing strings takes longer than comparing integers.
So what would be the most Pythonic (and user-friendly) way of solving the issue?
This is exactly the purpose of enum in other languages. In my opinion, the most "pythonic" way to do things is simply the clearest - so string comparison isn't that bad. Probably a little better would be to create constants that represent the values (MIXED_TRANVERSE_TENSION_SHEAR = 2) as static class members, and then compare to those constants, which is a more efficent integer comparison.
I have:
an ordered list of dc objects that have a float field result in it.
limit value for sum of result.
pack (not a better name ever) is a value of decreasing.
Problem:
I need to sequentially decrease results for each dc until sum of all results will be less or equal limit (without assigning result values below 0).
After some profiling I got this code:
while(self.sum > self.limit):
for dc in self.dc:
if dc.result > 0:
# max() too slow here
result = (
dc.result - self.pack
if dc.result - self.pack > 0
else 0
)
# Prevent sum() count for all list on each iteration
self.sum -= dc.result - result
dc.result = result
if self.sum <= self.limit:
break
But it has a low performance for small self.pack values (the code is doing too many iterations).
Is there a way to make this method faster?
If you are not too concerned how much you remove from the pack as long as it ensures it is less than sum, then you could just implement DC as a max heap (priority queue), and pop it every time until sum is <= self.limit. That would significantly speed up processing time especially in big data sets.
Edit:
Since dc is an ordered list, just treat it like a stack and pop from the back and remove from the pack (since the "heaviest" things are at the back).
Given a set [2004, 2008], what is the fastest way to find if this set is intersected with other sets?
Actually I am dealing a problem with database, the table has 2 columns, one is the lower bound, the other one is the higher bound. The task is to find all the intersected rows with the given 2 tuple(like [2004,2008]).
I am using mongodb, is this intrinsically supported(I mean have keywords to do it).
I have large user base, so I want this task to be completed as fast as possible.
EDIT: To stat more clear, a database table contains following rows:
20 30
10 50
60 90
...
Given the input (25 40) range, I want to return the rows which represent a range, have intersection with the given range.
so return is: (20 30),(10 50)
I don't know MongoDB at all, but you're basically looking for
SELECT * from the_table where not (lower_bound > 2008 or upper_bound < 2004).
Try this, assuming low and high are your bound fields:
db.ranges.find({'low': {'$lt': self.high}, 'high': {'$gt': self.low}})
Substitute $lte and $gte if you want your query to be inclusive rather than exclusive.
MongoDB does not support intersection. Perform intersection on the Python level using the intersection() API of sets.
Since you're dealing with lower bounds and upper bounds, you can just check bounds.
def intersects(this, others):
upper, lower = this
return [[u, l] for u, l in others
if (l < upper < u) or (l < lower < u)]
I don't know MongoDB but if you could implement that logic in the database, I can't help but think that it would be faster.
You could use a mongodb query with a Javascript expression (assuming lowerbounds and upperbounds are the limits of the set being intersected):
f = function() { return this.lower <= upperbounds && this.upper >= lowerbounds; }
db.ranges.find(f);
This should handle all cases including when [this.lower, this.upper] is a superset or proper subset of [lowerbounad, upperbounds].