Improve the perfomance of a code that uses for-loops - python

I am trying to create a list based on some data, but the code I am using is very slow when I run it on large data. So I suspect I am not using all of the Python power for this task. Is there a more efficient and faster way of doing this in Python?
Here an explanantion of the code:
You can think of this problem as a list of games (list_type) each with a list of participating teams and the scores for each team in the game (list_xx).For each of the pairs in the current game it first calculate the sum of the differences in score from the previous competitions (win_comp_past_difs); including only the pairs in the current game. Then it update each pair in the current game with the difference in scores. Using a defaultdict keeps track of the scores for each pair in each game and update this score as each game is played.
In the example below, based on some data, there are for-loops used to create a new variable list_zz.
The data and the for-loop code:
import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import permutations
list_type = [['A', 'B'], ['B'], ['A', 'B', 'C', 'D', 'E'], ['B'], ['A', 'B', 'C'], ['A'], ['B', 'C'], ['A', 'B'], ['C', 'A', 'B'], ['A'], ['B', 'C']]
list_xx = [[1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], [2.0, 7.0], [3.0, 6.0, 8.0], [2.0], [7.0, 9.0]]
list_zz= []
#for-loop
wd = defaultdict(float)
for i, x in zip(list_type, list_xx):
# staff 1
if len(i) == 1:
#print('NaN')
list_zz.append(np.nan)
continue
# Pairs and difference generator for current game (i)
pairs = list(permutations(i, 2))
dgen = (value[0] - value[1] for value in permutations(x, 2))
# Sum of differences from previous games incluiding only pair of teams in the current game
for team, result in zip(i, x):
win_comp_past_difs = sum(wd[key] for key in pairs if key[0] == team)
#print(win_comp_past_difs)
list_zz.append(win_comp_past_difs)
# Update pair differences for current game
for pair, diff in zip(pairs, dgen):
wd[pair] += diff
print(list_zz)
Which looks like this:
[0.0,
0.0,
nan,
-4.0,
4.0,
0.0,
0.0,
0.0,
nan,
-10.0,
13.0,
-3.0,
nan,
3.0,
-3.0,
-6.0,
6.0,
-10.0,
-10.0,
20.0,
nan,
14.0,
-14.0]
If you could elaborate on the code to make it more efficient and execute faster, I would really appreciate it.

Without reviewing the overall design of your code, one improvement pops out at me: move your code to a function.
As currently written, all of the variables you use are global variables. Due to the dynamic nature of the global namespace, Python must look up each global variable you use each and every time you use access it.(1) In CPython, this corresponds to a hash table lookup, which can be expensive, particularly if hash collisions are present.
In contrast, local variables can be known at compile time, and so are stored in a fixed-size array. Accessing these variables therefore only involves dereferencing a pointer, which is comparatively much faster.
With this principal in mind, you should be able to boost your performance (somewhere around a 40% drop in run time) by moving all you your code into a "main" function:
def main():
...
# Your code here
if __name__ == '__main__':
main()
(1) Source

Related

Is it possible to append different values to different keys of a dictionary?

I have a dictionary:
groups = {'group1': array([450, 449.]), 'group2': array([490, 489.]), 'group3': array([568, 567.])}
I have to iterate over a txt file that I have loaded using numpy.loadtxt() with many values:
subjects =
[1.0, -1.0
2.0, 1.0
3.0, 2.0
...
565.0, 564.0
566.0, 565.0
567.0, 566.0
568.0, 567.0]
What I want to do is to check if the value in the first column of "subject" is equal to the value of the second column of each array in my dictionary.
So basically when the condition is met the line of "subjects" should be added to the appropriate array of the dictionary...
The output that I expect is this:
groups = {'group1': array([450, 449.], [449, 448]), 'group2': array([490, 489.], [489, 488]), 'group3': array([568, 567.], [567, 566])}
You have to make it by parts
Note: I've done partitally. Analyze this code. Expected result is colse. I've observed array is splitted. I'm redaing dcoumention & looking way to split. Meanwhile explore this logic.Modify accordingly.
import numpy as np
from numpy import array
groups = {'group1': array([450, 449.]), 'group2': array([490, 489.]), 'group3': array([568, 567.])}
lv = np.array(list(groups.values()), dtype=object)
lv1 =lv.copy()
lv[:, [1, 0]] = lv[:, [0, 1]]
lv=np.concatenate((lv1, lv), axis=1)
groups = {k : [] for k in groups}
groups = {e: lv[i] for i, e in enumerate(groups)}
print(groups)
output #
{'group1': array([450.0, 449.0, 449.0, 450.0], dtype=object), 'group2': array([490.0, 489.0, 489.0, 490.0], dtype=object), 'group3': array([568.0, 567.0, 567.0, 568.0], dtype=object)}

Replacing items in a list with items from another list

I'm trying to create a calculator for a User's weighted GPA. I'm using PyautoGUI to ask the user for their grades and type of class they're taking. But I want to be able to take that User input and essentially remap it to a different value.
class GPA():
grades = []
classtypes = []
your_format = confirm(text='Choose your grade format: ', title='',
buttons=['LETTERS', 'PERCENTAGE', 'QUIT'])
classnum = int(prompt("Enter the number of classes you have: "))
for i in range(classnum):
grade = prompt(text='Enter your grade for the course
:'.format(name)).lower()
classtype = prompt(text='Enter the type of Course (Ex. Regular, AP, Honors): ').lower()
classtypes.append(classtype)
grades.append(grade)
def __init__(self):
self.gradeMap = {'a+': 4.0, 'a': 4.0, 'a-': 3.7, 'b+': 3.3, 'b': 3.0,'b-': 2.7,
'c+': 2.3, 'c': 2.0, 'c-': 1.7, 'd+': 1.3, 'd': 1.0, 'f': 0.0}
self.weightMap = {'advanced placement': 1.0, 'ap': 1.0, 'honors': 0.5,'regular': 0.0}
Based on the gradeMap dictionary you have defined you could do something with what's called a list comprehension.
An example of what I'm talking about done using the Python interpreter:
>>> grades = ['a', 'c-', 'c']
>>> gradeMap = {'a+': 4.0, 'a': 4.0, 'a-': 3.7, 'b+': 3.3, 'b': 3.0,'b-': 2.7,
... 'c+': 2.3, 'c': 2.0, 'c-': 1.7, 'd+': 1.3, 'd': 1.0, 'f': 0.0}
>>> [gradeMap[grade] for grade in grades] #here's the list comprehension
[4.0, 1.7, 2.0]
I think the downside with this approach might be making sure the user only gives you a grade you have defined in your gradeMap otherwise it is going to give you a KeyError.
Another alternative would be to use map. map is slightly different in that it expects a function and an input list, and then applys that function over the input list.
An example with a very simple function that only works with a few grades:
>>> def convert_grade_to_points(grade):
... if grade == 'a':
... return 4.0
... elif grade == 'b':
... return 3.0
... else:
... return 0
...
>>> grades = ['a', 'b', 'b']
>>> map(convert_grade_to_points, grades)
[4.0, 3.0, 3.0]
This also suffers from the downside I mentioned earlier that the function you define has to handle the case where the user input an invalid grade.
You can replace items of the list in place.
for grade in gradeList:
if type is "PERCENTAGE":
grade = grade × some_factor # use your logic
elif type is "LETTERS":
grade="some other logic"

Separating nested for loops in list comprehensions

Starting from this dataframe
import pandas as pd
df2 = pd.DataFrame({'t': ['a', 'a', 'a', 'b', 'b', 'b'],
'x': [1.1, 2.2, 3.3, 1.1, 2.2, 3.3],
'y': [1.0, 2.0, 3.0, 2.0, 3.0, 4.0]})
it's possible to simplify these nested for loops:
for t, df in df2.groupby('t'):
print("t:", t)
for d in df.to_dict(orient='records'):
print({'x': d['x'], 'y': d['y']})
by separating the inner loop into a function:
def handle(df):
for d in df.to_dict(orient='records'):
print({'x': d['x'], 'y': d['y']})
for t, df in df2.groupby('t'):
print("t:", t)
handle(df)
How might I similarly separate a nested list comprehension :
mydict = {
t: [{'x': d['x'], 'y': d['y']} for d in df.to_dict(orient='records')]
for t, df in df2.groupby(['t'])
}
into two separate loops?
I'm asking the question with just two levels of nesting, yet with just two nested loops the need is hardly critical. The motivations are:
By the time there are a few levels, the code becomes tough to read.
Developing and testing smaller blocks guards against (present and future) mistakes at more than the outer level.

Build dict from list of tuples combining two multi index dfs and column index

I have two multi-index dataframes: mean and std
arrays = [['A', 'A', 'B', 'B'], ['Z', 'Y', 'X', 'W']]
mean=pd.DataFrame(data={0.0:[np.nan,2.0,3.0,4.0], 60.0: [5.0,np.nan,7.0,8.0], 120.0:[9.0,10.0,np.nan,12.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
mean.columns.name='Times'
std=pd.DataFrame(data={0.0:[10.0,10.0,10.0,10.0], 60.0: [10.0,10.0,10.0,10.0], 120.0:[10.0,10.0,10.0,10.0]},
index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
std.columns.name='Times'
My task is to combine them in a dictionary with '{id:' as first level, followed by second level dictionary with '{comp:' and then for each comp a list of tuples, which combines the (time-points, mean, std). So, the result should look like that:
{'A': {
'Z': [(60.0,5.0,10.0),
(120.0,9.0,10.0)],
'Y': [(0.0,2.0,10.0),
(120.0,10.0,10.0)]
},
'B': {
'X': [(0.0,3.0,10.0),
(60.0,7.0,10.0)],
'W': [(0.0,4.0,10.0),
(60.0,8.0,10.0),
(120.0,12.0,10.0)]
}
}
Additionally, when there is NaN in data, the triplets are left out, so value A,Z at time 0, A,Y at time 60 B,X at time 120.
How do I get there? I constructed already a dict of dict of list of tuples for a single line:
iter=0
{mean.index[iter][0]:{mean.index[iter][1]:list(zip(mean.columns, mean.iloc[iter], std.iloc[iter]))}}
>{'A': {'Z': [(0.0, 1.0, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]}}
Now, I need to extend to a dictionary with a loop over each line {inner dict) and adding the ids each {outer dict}. I started with iterrows and dic comprehension, but here I have problems, indexing with the iter ('A','Z') which i get from iterrows(), and building the whole dict, iteratively.
{mean.index[iter[1]]:list(zip(mean.columns, mean.loc[iter[1]], std.loc[iter[1]])) for (iter,row) in mean.iterrows()}
creates errors, and I would only have the inner loop
KeyError: 'the label [Z] is not in the [index]'
Thanks!
EDIT: I exchanged the numbers to float in this example, because here integers were generated before which was not consistent with my real data, and which would fail in following json dump.
Here is a solution using a defaultdict:
from collections import defaultdict
mean_as_dict = mean.to_dict(orient='index')
std_as_dict = std.to_dict(orient='index')
mean_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in mean_as_dict.items()}
std_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in std_as_dict.items()}
sol = {k: [j + (std_clean_sorted[k][i][1],) for i, j in enumerate(v) if not np.isnan(j[1])] for k, v in mean_clean_sorted.items()}
solution = defaultdict(dict)
for k, v in sol.items():
solution[k[0]][k[1]] = v
Resulting dict will be defaultdict object that you can change to dict easily:
solution = dict(solution)
con = pd.concat([mean, std])
primary = dict()
for i in set(con.index.values):
if i[0] not in primary.keys():
primary[i[0]] = dict()
primary[i[0]][i[1]] = list()
for x in con.columns:
primary[i[0]][i[1]].append((x, tuple(con.loc[i[0]].loc[i[1][0].values)))
Here is sample output
I found a very comprehensive way of putting up this nested dict:
mean_dict_items=mean.to_dict(orient='index').items()
{k[0]:{u[1]:list(zip(mean.columns, mean.loc[u], std.loc[u]))
for u,v in mean_dict_items if (k[0],u[1]) == u} for k,l in mean_dict_items}
creates:
{'A': {'Y': [(0.0, 2.0, 10.0), (60.0, nan, 10.0), (120.0, 10.0, 10.0)],
'Z': [(0.0, nan, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]},
'B': {'W': [(0.0, 4.0, 10.0), (60.0, 8.0, 10.0), (120.0, 12.0, 10.0)],
'X': [(0.0, 3.0, 10.0), (60.0, 7.0, 10.0), (120.0, nan, 10.0)]}}

python code works on one file but fails on other

Hi all so I have this code, which prints out the minimum cost and restaurant id for the item/items. The customer doesnt want to visit multiple restaurants. So for example if he asks for "A,B" then the code should print shop which offers them both , instead of scattering the user requirement around different restaurants (even if some restaurant is offering it cheap).
Also if suppose the user asks for burger.Then if a certain restaurant 'X' is giving a "burger" for 4$, whereas another restaurant 'Y' is giving "burger+tuna+tofu" for $3, then we will tell the user to got for RESTAURANT 'Y', even if it has extra items apart from the 'burger' which user asked for, but we are happy to give them extra items as long as its cheap.
Everythings fine, but the code is strangely behaving differently on two input files(fails on input.csv but runs on input-2.csv) which are of same format, its giving correct output for one whereas fails for another. This is the only minute error I need your help to fix. Please help me , I guess I have hit the wall , cant think beyond it all.
def build_shops(shop_text):
shops = {}
for item_info in shop_text:
shop_id,cost,items = item_info.replace('\n', '').split(',')
cost = float(cost)
items = items.split('+')
if shop_id not in shops:
shops[shop_id] = {}
shop_dict = shops[shop_id]
for item in items:
if item not in shop_dict:
shop_dict[item] = []
shop_dict[item].append([cost,items])
return shops
def solve_one_shop(shop, items):
if len(items) == 0:
return [0.0, []]
all_possible = []
first_item = items[0]
if first_item in shop:
print "SHOP",shop.get(first_item)
for (price,combo) in shop[first_item]:
#print "items,combo=",items,combo
sub_set = [x for x in items if x not in combo]
#print "sub_set=",sub_set
price_sub_set,solution = solve_one_shop(shop, sub_set)
solution.append([price,combo])
all_possible.append([price+price_sub_set, solution])
cheapest = min(all_possible, key=(lambda x: x[0]))
return cheapest
def solver(input_data, required_items):
shops = build_shops(input_data)
#print shops
result_all_shops = []
for shop_id,shop_info in shops.iteritems():
(price, solution) = solve_one_shop(shop_info, required_items)
result_all_shops.append([shop_id, price, solution])
shop_id,total_price,solution = min(result_all_shops, key=(lambda x: x[1]))
print('SHOP_ID=%s' % shop_id)
sln_str = [','.join(items)+'(%0.2f)'%price for (price,items) in solution]
sln_str = '+'.join(sln_str)
print(sln_str + ' = %0.2f' % total_price)
shop_text = open('input-1.csv','rb')
solver(shop_text,['burger'])
=====input-1.csv=====restaurant_id, price, item
1,2.00,burger
1,1.25,tofulog
1,2.00,tofulog
1,1.00,chef_salad
1,1.00,A+B
1,1.50,A+CCC
1,2.50,A
2,3.00,A
2,1.00,B
2,1.20,CCC
2,1.25,D
=====output & error====:
{'1': {'A': [[1.0, ['A', 'B']], [1.5, ['A', 'CCC']], [2.5, ['A', 'D']]], 'B': [[1.0, ['A', 'B']]], 'D': [[2.5, ['A', 'D']]], 'chef_salad': [[1.0, ['chef_salad']]], 'burger': [[2.0, ['burger']]], 'tofulog': [[1.25, ['tofulog']], [2.0, ['tofulog']]], 'CCC': [[1.5, ['A', 'CCC']]]}, '2': {'A': [[3.0, ['A']]], 'B': [[1.0, ['B']]], 'D': [[1.25, ['D']]], 'CCC': [[1.2, ['CCC']]]}}
SHOP [[2.0, ['burger']]]
Traceback (most recent call last):
File "work.py", line 55, in <module>
solver(shop_text,['burger'])
File "work.py", line 43, in solver
(price, solution) = solve_one_shop(shop_info, required_items)
File "work.py", line 26, in solve_one_shop
for (price,combo) in shop[first_item]:
KeyError: 'burger'
whereas if I run the same code on input-2.csv , and query for solver(shop_text,['A','CCC']), I get correct result
=====input-2.csv======
1,2.00,A
1,1.25,B
1,2.00,B
1,1.00,A
1,1.00,A+B
1,1.50,A+CCC
1,2.50,A+D
2,3.00,A
2,1.00,B
2,1.20,CCC
2,1.25,D
=========output====
{'1': {'A': [[2.0, ['A']], [1.0, ['A']], [1.0, ['A', 'B']], [1.5, ['A', 'CCC']], [2.5, ['A', 'D']]], 'B': [[1.25, ['B']], [2.0, ['B']], [1.0, ['A', 'B']]], 'D': [[2.5, ['A', 'D']]], 'CCC': [[1.5, ['A', 'CCC']]]}, '2': {'A': [[3.0, ['A']]], 'B': [[1.0, ['B']]], 'D': [[1.25, ['D']]], 'CCC': [[1.2, ['CCC']]]}}
SHOP [[2.0, ['A']], [1.0, ['A']], [1.0, ['A', 'B']], [1.5, ['A', 'CCC']], [2.5, ['A', 'D']]]
SHOP [[1.5, ['A', 'CCC']]]
SHOP [[1.5, ['A', 'CCC']]]
SHOP [[1.5, ['A', 'CCC']]]
SHOP [[1.5, ['A', 'CCC']]]
SHOP [[3.0, ['A']]]
SHOP [[1.2, ['CCC']]]
SHOP_ID=1
A,CCC(1.50) = 1.50
You can figure out the error if you do this:
In your solve_one_shop method, print the dictionary shop after the line first_item = items[0]. Doing that will print out:
{'A': [[3.0, ['A']]], 'B': [[1.0, ['B']]], 'D': [[1.25, ['D']]], 'CCC': [[1.2, ['CCC']]]}
So, burger is not one of its keys and hence it throws a KeyError
Add this line:
2,1.25,burger
to the end of your input.csv file and your code works fine.
Do the reading of values from the shop dictionary in a try except block to deal with the case where an item may not be present.
Note:
In your method build_shops the line:
shop_id,cost,items = item_info.replace('\n', '').split(',')
although strips off the newline, it does not strip off the carriage return. To fix that, do this:
shop_id,cost,items = item_info.replace('\n', '').replace('\r', '').split(',')
Hope this helps.
I think I've fixed it...
solve_one_shop
The for loop should only happen within the if, otherwise you get a KeyError. Also, I have changed it so that it only returns if all_possible contains anything (an empty list evaluates to False.
edit To prevent a TypeError I have done assigned to a temporary value this_subset and the rest of the loop only happens is it is not None.
def solve_one_shop(shop, items):
if len(items) == 0:
return [0.0, []]
all_possible = []
first_item = items[0]
if first_item in shop:
for (price,combo) in shop[first_item]:
sub_set = [x for x in items if x not in combo]
this_subset = solve_one_shop(shop, sub_set)
if this_subset is not None:
price_sub_set,solution = this_subset
solution.append([price,combo])
all_possible.append([price+price_sub_set, solution])
if all_possible:
cheapest = min(all_possible, key=(lambda x: x[0]))
return cheapest
solver
I have assigned the return value of solve_one_shop to an intermediate variable. If this is None, then the shop is not added to result_all_shops.
edit If result_all_shops is empty, then print a message instead of trying to find the min.
def solver(input_data, required_items):
shops = build_shops(input_data)
result_all_shops = []
for shop_id,shop_info in shops.iteritems():
this_shop = solve_one_shop(shop_info, required_items)
if this_shop is not None:
(price, solution) = this_shop
result_all_shops.append([shop_id, price, solution])
if result_all_shops:
shop_id,total_price,solution = min(result_all_shops, key=(lambda x: x[1]))
print('SHOP_ID=%s' % shop_id)
sln_str = [','.join(items)+'(%0.2f)'%price for (price,items) in solution]
sln_str = '+'.join(sln_str)
print(sln_str + ' = %0.2f' % total_price)
else:
print "Item not available"

Categories