Bin-packing/Knapsack variation: Fitting discrete data into discrete servers - python

I have a Python coding task that appears to be some kind of variation of either bin-packing or knapsack problem, I'm not entirely sure. I have an option that seems to work, but I don't think it's the correct solution per se, as there may be edge cases that could fail. (I'm not a CS or math student, so my knowledge in algorithms/combinatorics is quite rudimentary.)
The problem
A user can choose a configuration of 3 data types:
Small data is 1 GB
Medium data is 1.5 GB
Large data is 2 GB
The console app asks: "How many small pieces you need? Medium? Large?", in order. I need to fit these pieces of data into the cheapest server configuration:
Small server holds 10 GB, costs $68.84
Medium server holds 24 GB, costs $140.60
Large server holds 54 GB, costs $316.09
So for example, if the user chooses a total of 20 GB of data, the function should note that it would be cheaper to use 2 small servers rather than 1 medium server.
The function that I wrote primarily uses division to look for whole numbers, with floor/ceil calls wherever appropriate. I wrote blocks that sequentially go through a configuration with just L servers, then L & M, then L, M & S, etc.
Here is the function:
def allocate_servers(setup):
'''This function allocates servers based on user's inputs.'''
# setup is a dict of type {'S':int, 'M':int, 'L':int}, each is amount of data needed
# Global variables that initialise to 0
global COUNTER_S
global COUNTER_M
global COUNTER_L
# Calculate total size need
total_size = setup['S'] * PLANET_SIZES['S'] + \
setup['M'] * PLANET_SIZES['M'] + \
setup['L'] * PLANET_SIZES['L']
print('\nTotal size requirement: {} GB\n'.format(total_size))
# Find cheapest server combo
# 1. Using just large servers
x = total_size / SERVERS['L']['cap'] # Here and later cap is server capacity, eg 54 in this case
if x <= 1:
COUNTER_L = 1
else:
COUNTER_L = int(ceil(x))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L) # this function creates a dict and calculates prices
OPTIONS.append(option)
reset_counters()
# 2. Using large and medium servers
if x <= 1:
COUNTER_L = 1
else:
COUNTER_L = int(floor(x))
total_size_temp = total_size - SERVERS['L']['cap'] * COUNTER_L
y = total_size_temp / SERVERS['M']['cap']
if y <= 1:
COUNTER_M = 1
else:
COUNTER_M = int(ceil(y))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L)
OPTIONS.append(option)
reset_counters()
# 3. Using large, medium and small servers
if x <= 1:
COUNTER_L = 1
else:
COUNTER_L = int(floor(x))
total_size_temp = total_size - SERVERS['L']['cap'] * COUNTER_L
y = total_size_temp / SERVERS['M']['cap']
if y <= 1:
COUNTER_M = 1
else:
COUNTER_M = int(floor(y))
total_size_temp = total_size_temp - SERVERS['M']['cap'] * COUNTER_M
z = total_size_temp / SERVERS['S']['cap']
if z <= 1:
COUNTER_S = 1
else:
COUNTER_S = int(ceil(z))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L)
OPTIONS.append(option)
reset_counters()
# 4. Using large and small servers
if x <= 1:
COUNTER_L = 1
else:
COUNTER_L = int(floor(x))
total_size_temp = total_size - SERVERS['L']['cap'] * COUNTER_L
z = total_size_temp / SERVERS['S']['cap']
if z <= 1:
COUNTER_S = 1
else:
COUNTER_S = int(ceil(z))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L)
OPTIONS.append(option)
reset_counters()
# 5. Using just medium servers
y = total_size / SERVERS['M']['cap']
if y <= 1:
COUNTER_M = 1
else:
COUNTER_M = int(ceil(y))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L)
OPTIONS.append(option)
reset_counters()
# 6. Using medium and small servers
if y <= 1:
COUNTER_M = 1
else:
COUNTER_M = int(floor(y))
total_size_temp = total_size - SERVERS['M']['cap'] * COUNTER_M
z = total_size_temp / SERVERS['S']['cap']
if z <= 1:
COUNTER_S = 1
else:
COUNTER_S = int(ceil(z))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L)
OPTIONS.append(option)
reset_counters()
# 7. Using just small servers
z = total_size / SERVERS['S']['cap']
if z <= 1:
COUNTER_S = 1
else:
COUNTER_S = int(ceil(z))
option = generate_option(COUNTER_S, COUNTER_M, COUNTER_L)
OPTIONS.append(option)
reset_counters()
# Comparing prices of options
cheapest = min(OPTIONS, key = lambda option: option['total_price'])
return cheapest
I have a sense that something is wrong here. For example, when I input 100 small data, 350 medium and 50 large, I get this output:
Total size requirement: 725.0 GB
All calculated options:
[{'L': 14, 'M': 0, 'S': 0, 'total_price': 4425.259999999999},
{'L': 13, 'M': 1, 'S': 0, 'total_price': 4249.77},
{'L': 13, 'M': 1, 'S': 0, 'total_price': 4249.77},
{'L': 13, 'M': 0, 'S': 3, 'total_price': 4315.6900000000005},
{'L': 0, 'M': 31, 'S': 0, 'total_price': 4358.599999999999},
{'L': 0, 'M': 30, 'S': 1, 'total_price': 4286.84},
{'L': 0, 'M': 0, 'S': 73, 'total_price': 5025.320000000001}]
For the chosen planets you need:
0 Small servers
1 Medium servers
13 Large servers
Price: $4249.77
The function seems to work as intended; however, I just manually checked that, for example, if I was to take 29 medium servers that leaves us with 725-696 = 29 GB, which I could fit onto 3 small servers. Total cost for 29 medium and 3 small is $4283.92, which is cheaper than the M : 30, S : 1 option, but doesn't even make it into the list.
What am I missing here? I have a feeling that my algorithm is very crude and I'm potentially missing out on more optimal solutions.
Do I need to literally go through every possible option, eg for 14/13/12/11/10... large servers, with medium/small combinations also iterating through every option?
EDIT: I had a limited amount of time to solve this problem, so I managed to brute force it. I added for loops in my function, iterating over every possible result. So first with maximum amount of Large servers (say, 14), then 13 Large and rest Medium, then 12 Large and rest Medium, etc... It takes a while to run with large numbers (10k of each data type took maybe like 20 seconds?), but it seems to work.

You only need to consider configurations with less than 12 small servers (because you could replace 12 small with 5 medium) and less than 27 medium servers (because you could replace 27 medium with 12 large). You can loop over the number of small and medium servers and then calculate the number of large servers as max(0, ceil((need − 10 small − 24 medium) / 54)).
from math import ceil
def cost(cart):
s, m, l = cart
return 68.84 * s + 140.6 * m + 316.09 * l
def cheapest(need):
return min(
(
(s, m, max(0, ceil((need - 10 * s - 24 * m) / 54)))
for s in range(12)
for m in range(27)
),
key=cost,
)

Related

How to efficiently process a list that continously being appended with new item in Python

Objective:
To visualize the population size of a particular organism over finite time.
Assumptions:
The organism has a life span of age_limit days
Only Females of age day_lay_egg days can lay the egg, and the female is allowed to lay an egg a maximum of max_lay_egg times. Each breeding session, a maximum of only egg_no eggs can be laid with a 50% probability of producing male offspring.
Initial population of 3 organisms consist of 2 Female and 1 Male
Code Snippets:
Currently, the code below should produced the expected output
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
def get_breeding(d,**kwargs):
if d['lay_egg'] <= kwargs['max_lay_egg'] and d['dborn'] > kwargs['day_lay_egg'] and d['s'] == 1:
nums = np.random.choice([0, 1], size=kwargs['egg_no'], p=[.5, .5]).tolist()
npol=[dict(s=x,d=d['d'], lay_egg=0, dborn=0) for x in nums]
d['lay_egg'] = d['lay_egg'] + 1
return d,npol
return d,None
def to_loop_initial_population(**kwargs):
npol=kwargs['ipol']
nday = 0
total_population_per_day = []
while nday < kwargs['nday_limit']:
# print(f'Executing day {nday}')
k = []
for dpol in npol:
dpol['d'] += 1
dpol['dborn'] += 1
dpol,h = get_breeding(dpol,**kwargs)
if h is None and dpol['dborn'] <= kwargs['age_limit']:
# If beyond the age limit, ignore the parent and update only the decedent
k.append(dpol)
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
# If below age limit, append the parent and its offspring
h.extend([dpol])
k.extend(h)
total_population_per_day.append(dict(nsize=len(k), day=nday))
nday += 1
npol = k
return total_population_per_day
## Some spec and store all setting in a dict
numsex=[1,1,0] # 0: Male, 1: Female
# s: sex, d: day, lay_egg: Number of time the female lay an egg, dborn: The organism age
ipol=[dict(s=x,d=0, lay_egg=0, dborn=0) for x in numsex] # The initial population
age_limit = 45 # Age limit for the species
egg_no=3 # Number of eggs
day_lay_egg = 30 # Matured age for egg laying
nday_limit=360
max_lay_egg=2
para=dict(nday_limit=nday_limit,ipol=ipol,age_limit=age_limit,
egg_no=egg_no,day_lay_egg=day_lay_egg,max_lay_egg=max_lay_egg)
dpopulation = to_loop_initial_population(**para)
### make some plot
df = pd.DataFrame(dpopulation)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()
Output:
Problem/Question:
The time to complete the execution time increases exponentially with nday_limit. I need to improve the efficiency of the code. How can I speed up the running time?
Other Thoughts:
I am tempted to apply joblib as below. To my surprise, the execution time is worse.
def djob(dpol,k,**kwargs):
dpol['d'] = dpol['d'] + 1
dpol['dborn'] = dpol['dborn'] + 1
dpol,h = get_breeding(dpol,**kwargs)
if h is None and dpol['dborn'] <= kwargs['age_limit']:
# If beyond the age limit, ignore the that particular subject
k.append(dpol)
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
# If below age limit, append the parent and its offspring
h.extend([dpol])
k.extend(h)
return k
def to_loop_initial_population(**kwargs):
npol=kwargs['ipol']
nday = 0
total_population_per_day = []
while nday < kwargs['nday_limit']:
k = []
njob=1 if len(npol)<=50 else 4
if njob==1:
print(f'Executing day {nday} with single cpu')
for dpols in npol:
k=djob(dpols,k,**kwargs)
else:
print(f'Executing day {nday} with single parallel')
k=Parallel(n_jobs=-1)(delayed(djob)(dpols,k,**kwargs) for dpols in npol)
k = list(itertools.chain(*k))
ll=1
total_population_per_day.append(dict(nsize=len(k), day=nday))
nday += 1
npol = k
return total_population_per_day
for
nday_limit=365
Your code looks alright overall but I can see several points of improvement that are slowing your code down significantly.
Though it must be noted that you can't really help the code slowing down too much with increasing nday values, since the population you need to keep track of keeps growing and you keep re-populating a list to track this. It's expected as the number of objects increase, the loops will take longer to complete, but you can reduce the time it takes to complete a single loop.
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
Here you ask the instance of h every single loop, after confirming whether it's None. You know for a fact that h is going to be a list, and if not, your code will error anyway even before reaching that line for the list not to have been able to be created.
Furthermore, you have a redundant condition check for age of dpol, and then redundantly first extend h by dpol and then k by h. This can be simplified together with the previous issue to this:
if dpol['dborn'] <= kwargs['age_limit']:
k.append(dpol)
if h:
k.extend(h)
The results are identical.
Additionally, you're passing around a lot of **kwargs. This is a sign that your code should be a class instead, where some unchanging parameters are saved through self.parameter. You could even use a dataclass here (https://docs.python.org/3/library/dataclasses.html)
Also, you mix responsibilities of functions which is unnecessary and makes your code more confusing. For instance:
def get_breeding(d,**kwargs):
if d['lay_egg'] <= kwargs['max_lay_egg'] and d['dborn'] > kwargs['day_lay_egg'] and d['s'] == 1:
nums = np.random.choice([0, 1], size=kwargs['egg_no'], p=[.5, .5]).tolist()
npol=[dict(s=x,d=d['d'], lay_egg=0, dborn=0) for x in nums]
d['lay_egg'] = d['lay_egg'] + 1
return d,npol
return d,None
This code contains two responsibilities: Generating a new individual if conditions are met, and checking these conditions, and returning two different things based on them.
This would be better done through two separate functions, one which simply checks the conditions, and another that generates a new individual as follows:
def check_breeding(d, max_lay_egg, day_lay_egg):
return d['lay_egg'] <= max_lay_egg and d['dborn'] > day_lay_egg and d['s'] == 1
def get_breeding(d, egg_no):
nums = np.random.choice([0, 1], size=egg_no, p=[.5, .5]).tolist()
npol=[dict(s=x, d=d['d'], lay_egg=0, dborn=0) for x in nums]
return npol
Where d['lay_egg'] could be updated in-place when iterating over the list if the condition is met.
You could speed up your code even further this way, if you edit the list as you iterate over it (it is not typically recommended but it's perfectly fine to do if you know what you're doing. Make sure to do it by using the index and limit it to the previous bounds of the length of the list, and decrement the index when an element is removed)
Example:
i = 0
maxiter = len(npol)
while i < maxiter:
if check_breeding(npol[i], max_lay_egg, day_lay_egg):
npol.extend(get_breeding(npol[i], egg_no))
if npol[i]['dborn'] > age_limit:
npol.pop(i)
i -= 1
maxiter -= 1
Which could significantly reduce processing time since you're not making a new list and appending all elements all over again every iteration.
Finally, you could check some population growth equation and statistical methods, and you could even reduce this whole code to a calculation problem with iterations, though that wouldn't be a sim anymore.
Edit
I've fully implemented my suggestions for improvements to your code and timed them in a jupyter notebook using %%time. I've separated out function definitions from both so they wouldn't contribute to the time, and the results are telling. I also made it so females produce another female 100% of the time, to remove randomness, otherwise it would be even faster. I compared the results from both to verify they produce identical results (they do, but I removed the 'd_born' parameter cause it's not used in the code apart from setting).
Your implementation, with nday_limit=100 and day_lay_egg=15:
Wall time 23.5s
My implementation with same parameters:
Wall time 18.9s
So you can tell the difference is quite significant, which grows even farther apart for larger nday_limit values.
Full implementation of edited code:
from dataclasses import dataclass
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#dataclass
class Organism:
sex: int
times_laid_eggs: int = 0
age: int = 0
def __init__(self, sex):
self.sex = sex
def check_breeding(d, max_lay_egg, day_lay_egg):
return d.times_laid_eggs <= max_lay_egg and d.age > day_lay_egg and d.sex == 1
def get_breeding(egg_no): # Make sure to change probabilities back to 0.5 and 0.5 before using it
nums = np.random.choice([0, 1], size=egg_no, p=[0.0, 1.0]).tolist()
npol = [Organism(x) for x in nums]
return npol
def simulate(organisms, age_limit, egg_no, day_lay_egg, max_lay_egg, nday_limit):
npol = organisms
nday = 0
total_population_per_day = []
while nday < nday_limit:
i = 0
maxiter = len(npol)
while i < maxiter:
npol[i].age += 1
if check_breeding(npol[i], max_lay_egg, day_lay_egg):
npol.extend(get_breeding(egg_no))
npol[i].times_laid_eggs += 1
if npol[i].age > age_limit:
npol.pop(i)
maxiter -= 1
continue
i += 1
total_population_per_day.append(dict(nsize=len(npol), day=nday))
nday += 1
return total_population_per_day
if __name__ == "__main__":
numsex = [1, 1, 0] # 0: Male, 1: Female
ipol = [Organism(x) for x in numsex] # The initial population
age_limit = 45 # Age limit for the species
egg_no = 3 # Number of eggs
day_lay_egg = 15 # Matured age for egg laying
nday_limit = 100
max_lay_egg = 2
dpopulation = simulate(ipol, age_limit, egg_no, day_lay_egg, max_lay_egg, nday_limit)
df = pd.DataFrame(dpopulation)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()
Try structuring your code as a matrix like state[age][eggs_remaining] = count instead. It will have age_limit rows and max_lay_egg columns.
Males start in the 0 eggs_remaining column, and every time a female lays an egg they move down one (3->2->1->0 with your code above).
For each cycle, you just drop the last row, iterate over all the rows after age_limit and insert a new first row with the number of males and females.
If (as in your example) there only is a vanishingly small chance that a female would die of old age before laying all their eggs, you can just collapse everything into a state_alive[age][gender] = count and a state_eggs[eggs_remaining] = count instead, but it shouldn't be necessary unless the age goes really high or you want to run thousands of simulations.
use numpy array operation as much as possible instead of using loop can improve your performance, see below codes tested in notebook - https://www.kaggle.com/gfteafun/notebook03118c731b
Note that when comparing the time the nsize scale matters.
%%time​
​
# s: sex, d: day, lay_egg: Number of time the female lay an egg, dborn: The organism age
x = np.array([(x, 0, 0, 0) for x in numsex ] )
iparam = np.array([0, 1, 0, 1])
​
total_population_per_day = []
for nday in range(nday_limit):
x = x + iparam
c = np.all(x < np.array([2, nday_limit, max_lay_egg, age_limit]), axis=1) & np.all(x >= np.array([1, day_lay_egg, 0, day_lay_egg]), axis=1)
total_population_per_day.append(dict(nsize=len(x[x[:,3]<age_limit, :]), day=nday))
n = x[c, 2].shape[0]
​
if n > 0:
x[c, 2] = x[c, 2] + 1
newborns = np.array([(x, nday, 0, 0) for x in np.random.choice([0, 1], size=egg_no, p=[.5, .5]) for i in range(n)])
x = np.vstack((x, newborns))
​
​
df = pd.DataFrame(total_population_per_day)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()

How to properly add gradually increasing/decreasing space between objects?

I've trying to implement transition from an amount of space to another which is similar to acceleration and deceleration, except i failed and the only thing that i got from this was this infinite stack of mess, here is a screenshot showing this in action:
you can see a very black circle here, which are in reality something like 100 or 200 circles stacked on top of each other
and i reached this result using this piece of code:
def Place_circles(curve, circle_space, cs, draw=True, screen=None):
curve_acceleration = []
if type(curve) == tuple:
curve_acceleration = curve[1][0]
curve_intensity = curve[1][1]
curve = curve[0]
#print(curve_intensity)
#print(curve_acceleration)
Circle_list = []
idx = [0,0]
for c in reversed(range(0,len(curve))):
for p in reversed(range(0,len(curve[c]))):
user_dist = circle_space[curve_intensity[c]] + curve_acceleration[c] * p
dist = math.sqrt(math.pow(curve[c][p][0] - curve[idx[0]][idx[1]][0],2)+math.pow(curve [c][p][1] - curve[idx[0]][idx[1]][1],2))
if dist > user_dist:
idx = [c,p]
Circle_list.append(circles.circles(round(curve[c][p][0]), round(curve[c][p][1]), cs, draw, screen))
This place circles depending on the intensity (a number between 0 and 2, random) of the current curve, which equal to an amount of space (let's say between 20 and 30 here, 20 being index 0, 30 being index 2 and a number between these 2 being index 1).
This create the stack you see above and isn't what i want, i also came to the conclusion that i cannot use acceleration since the amount of time to move between 2 points depend on the amount of circles i need to click on, knowing that there are multiple circles between each points, but not being able to determine how many lead to me being unable to the the classic acceleration formula.
So I'm running out of options here and ideas on how to transition from an amount of space to another.
any idea?
PS: i scrapped the idea above and switched back to my master branch but the code for this is still available in the branch i created here https://github.com/Mrcubix/Osu-StreamGenerator/tree/acceleration .
So now I'm back with my normal code that don't possess acceleration or deceleration.
TL:DR i can't use acceleration since i don't know the amount of circles that are going to be placed between the 2 points and make the time of travel vary (i need for exemple to click circles at 180 bpm of one circle every 0.333s) so I'm looking for another way to generate gradually changing space.
First, i took my function that was generating the intensity for each curves in [0 ; 2]
Then i scrapped the acceleration formula as it's unusable.
Now i'm using a basic algorithm to determine the maximum amount of circles i can place on a curve.
Now the way my script work is the following:
i first generate a stream (multiple circles that need to be clicked at high bpm)
this way i obtain the length of each curves (or segments) of the polyline.
i generate an intensity for each curve using the following function:
def generate_intensity(Circle_list: list = None, circle_space: int = None, Args: list = None):
curve_intensity = []
if not Args or Args[0] == "NewProfile":
prompt = True
while prompt:
max_duration_intensity = input("Choose the maximum amount of curve the change in intensity will occur for: ")
if max_duration_intensity.isdigit():
max_duration_intensity = int(max_duration_intensity)
prompt = False
prompt = True
while prompt:
intensity_change_odds = input("Choose the odds of occurence for changes in intensity (1-100): ")
if intensity_change_odds.isdigit():
intensity_change_odds = int(intensity_change_odds)
if 0 < intensity_change_odds <= 100:
prompt = False
prompt = True
while prompt:
min_intensity = input("Choose the lowest amount of spacing a circle will have: ")
if min_intensity.isdigit():
min_intensity = float(min_intensity)
if min_intensity < circle_space:
prompt = False
prompt = True
while prompt:
max_intensity = input("Choose the highest amount of spacing a circle will have: ")
if max_intensity.isdigit():
max_intensity = float(max_intensity)
if max_intensity > circle_space:
prompt = False
prompt = True
if Args:
if Args[0] == "NewProfile":
return [max_duration_intensity, intensity_change_odds, min_intensity, max_intensity]
elif Args[0] == "GenMap":
max_duration_intensity = Args[1]
intensity_change_odds = Args[2]
min_intensity = Args[3]
max_intensity = Args[4]
circle_space = ([min_intensity, circle_space, max_intensity] if not Args else [Args[0][3],circle_space,Args[0][4]])
count = 0
for idx, i in enumerate(Circle_list):
if idx == len(Circle_list) - 1:
if random.randint(0,100) < intensity_change_odds:
if random.randint(0,100) > 50:
curve_intensity.append(2)
else:
curve_intensity.append(0)
else:
curve_intensity.append(1)
if random.randint(0,100) < intensity_change_odds:
if random.randint(0,100) > 50:
curve_intensity.append(2)
count += 1
else:
curve_intensity.append(0)
count += 1
else:
if curve_intensity:
if curve_intensity[-1] == 2 and not count+1 > max_duration_intensity:
curve_intensity.append(2)
count += 1
continue
elif curve_intensity[-1] == 0 and not count+1 > max_duration_intensity:
curve_intensity.append(0)
count += 1
continue
elif count+1 > 2:
curve_intensity.append(1)
count = 0
continue
else:
curve_intensity.append(1)
else:
curve_intensity.append(1)
curve_intensity.reverse()
if curve_intensity.count(curve_intensity[0]) == len(curve_intensity):
print("Intensity didn't change")
return circle_space[1]
print("\n")
return [circle_space, curve_intensity]
with this, i obtain 2 list, one with the spacing i specified, and the second one is the list of randomly generated intensity.
from there i call another function taking into argument the polyline, the previously specified spacings and the generated intensity:
def acceleration_algorithm(polyline, circle_space, curve_intensity):
new_circle_spacing = []
for idx in range(len(polyline)): #repeat 4 times
spacing = []
Length = 0
best_spacing = 0
for p_idx in range(len(polyline[idx])-1): #repeat 1000 times / p_idx in [0 ; 1000]
# Create multiple list containing spacing going from circle_space[curve_intensity[idx-1]] to circle_space[curve_intensity[idx]]
spacing.append(np.linspace(circle_space[curve_intensity[idx]],circle_space[curve_intensity[idx+1]], p_idx).tolist())
# Sum distance to find length of curve
Length += abs(math.sqrt((polyline[idx][p_idx+1][0] - polyline[idx][p_idx][0]) ** 2 + (polyline [idx][p_idx+1][1] - polyline[idx][p_idx][1]) ** 2))
for s in range(len(spacing)): # probably has 1000 list in 1 list
length_left = Length # Make sure to reset length for each iteration
for dist in spacing[s]: # substract the specified int in spacing[s]
length_left -= dist
if length_left > 0:
best_spacing = s
else: # Since length < 0, use previous working index (best_spacing), could also jsut do `s-1`
if spacing[best_spacing] == []:
new_circle_spacing.append([circle_space[1]])
continue
new_circle_spacing.append(spacing[best_spacing])
break
return new_circle_spacing
with this, i obtain a list with the space between each circles that are going to be placed,
from there, i can Call Place_circles() again, and obtain the new stream:
def Place_circles(polyline, circle_space, cs, DoDrawCircle=True, surface=None):
Circle_list = []
curve = []
next_circle_space = None
dist = 0
for c in reversed(range(0, len(polyline))):
curve = []
if type(circle_space) == list:
iter_circle_space = iter(circle_space[c])
next_circle_space = next(iter_circle_space, circle_space[c][-1])
for p in reversed(range(len(polyline[c])-1)):
dist += math.sqrt((polyline[c][p+1][0] - polyline[c][p][0]) ** 2 + (polyline [c][p+1][1] - polyline[c][p][1]) ** 2)
if dist > (circle_space if type(circle_space) == int else next_circle_space):
dist = 0
curve.append(circles.circles(round(polyline[c][p][0]), round(polyline[c][p][1]), cs, DoDrawCircle, surface))
if type(circle_space) == list:
next_circle_space = next(iter_circle_space, circle_space[c][-1])
Circle_list.append(curve)
return Circle_list
the result is a stream with varying space between circles (so accelerating or decelerating), the only issue left to be fixed is pygame not updating the screen with the new set of circle after i call Place_circles(), but that's an issue i'm either going to try to fix myself or ask in another post
the final code for this feature can be found on my repo : https://github.com/Mrcubix/Osu-StreamGenerator/tree/Acceleration_v02

GC skew method using sliding window concept in python

I have completed a beginner's course in python and I am working on a problem to improve my coding skills. In this problem, I have to calculate the GC-skew by dividing the entire sequence into subsequences of equal length. I am working in a jupyter notebook.
I have to create a code so that I'll get the number of C's and G's from the sequence and then calculate GC skew in each window. window size = 5kb with an increment of 1kb.
What I have done so far is that first stored the sequence in a list and took user input for length of box/window and increment of the box. Then I tried to create a loop for calculating the number of C's and G's in each window but here I am facing an issue as instead of getting number of C's and G's in a window/box, I am getting number of C's and G's from the entire sequence for number of times the loop is running. I want number total number of C's and total no of G's in each window.
Please suggest how can I get the mentioned number of characters and GC skew for each overlapping sliding window/box. Also is there any concept of sliding window in python which I can use it here?
char = []
with open('keratin.txt') as f:
for line in f:
line = line.strip()
for ch in line:
char.append(ch)
print(char)
len(char)
f1 = open('keratin.txt','r')
f2 = open('keratin.txt','a+')
lob = input('Enter length of box =')
iob = input('Enter the increment of the box =')
i=0
lob = 5000
iob = 1000
nob = 1 #no. of boxes
for i in range (0,len(char)-lob):
b = i
while( b < lob + i and b < len(char)):
nC = 0
nG = 0
if char[b] == 'C':
nC = nC + 1
elif char[b] == 'G':
nG = nG + 1
b = b + 1
print(nC)
print(nG)
i = i + iob
nob = nob + 1
I hope this would help in understanding,
number_of_C_and_G = []
# Go from 0 to end, skipping length of box and increment. 0, 6000, 12000 ...
for i in range(0, len(char), lob+inc):
nC = 0
nG = 0
# Go from start to length of box, 0 to 5000, 6000 to 11000 ...
for j in range(i, lob):
if char[j] == 'C':
nC += 1
else if char[j] == 'G':
nG += 1
# Put the value for the box in the list
number_of_C_and_G.append( (nC, nG) )

Optimizing a Python synth on Raspberry Pi

For the past few weeks I have been working on a project which is all very new to me, and I'm learning as I go. I'm building a synthesizer using a Raspberry Pi 2 and I'm coding it in Python3, as I have some basic knowledge of the language, but not much real experience. I've muddled through pretty well so far, but I have now hit the wall I knew I would hit eventually: performance.
I have been using Pygame and its Sound module to create the sounds I want, and then using my own mathematical algorithms to calculate the ADS(H)R volume envelope for every sound. I tweak this envelope using 8 potentiometers. 3 of them control the length in seconds of the Attack, Decay, Release and another one to set the Sustain level. Then I added 4 more pots which control the curvature of each part of the envelope (except one of them instead sets a Hold value for Sustain). I have a PiTFT screen connected as well which draws up the current shape and length of the entire envelope, as well as prints out the current values of ADSR.
To play sounds I use a 4x4 Adafruit Trellis board and with different button combinations I can play every note between C0 and C8.
I use SciPy and NumPy to create different kinds of soundwaves, as in Sine, Square, Triangle, Sawtooth, Pulse and Noise.
As I have been using regular for loops to change the volume of the sound according to the ADSR envelope, running the function PlaySound takes a while to complete (depending on my ADSR settings of course). This prompted me to try using threads. I don't know if I'm using it in the best way, of if I should use it at all, but it was the only way I could think of to achieve polyphony. Otherwise it had to wait until a sound was completed until it would resume the main loop. So now I can play several notes at the same time. Well, two notes at least. After that it lags and the third one doesn't seem to play until one of the previous sounds have finished.
I've done some tests and checks and I should be able to runt up to 4 threads at the same time, but I might be missing something. One guess is that the system itself has reserved two threads (cores) for other usage.
I realize also that Python is not the most efficient language to use, and I've been looking into Pure Data as well, but I'm having trouble wrapping my head around it (I prefer code over a click-and-drag-gui). I want to keep using Python for as long as possible. I might look into using pyo, but I think I'd have to mostly start from scratch with my code then (which I am willing to do, but I don't want to give up on my current code just yet).
So. Here's my question(s): How can I optimize this to be truly polyphonic? Two notes is not enough. Should I skip the threads altogether? Can I implement the ADSR envelope in a better, less costly way? How can I clean up my messy math? What other performance bottlenecks are there, that I have overlooked? The Pygame drawing to the screen seems to be negligable at the moment, as there is virtually no difference at all if I disable it completely. Here is my code so far:
import pygame
from pygame.mixer import Sound, get_init, pre_init, get_num_channels
from array import array
import RPi.GPIO as GPIO
import alsaaudio
import time
import Adafruit_Trellis
import Adafruit_MCP3008
import math
import _thread
import os
import multiprocessing
import numpy as np
from scipy import signal as sg
import struct
#print(str(multiprocessing.cpu_count()))
os.putenv('SDL_FBDEV','/dev/fb1')
fps = pygame.time.Clock()
FRAMERATE = 100
MINSEC = 1/FRAMERATE
BLUE = ( 0, 0, 255)
WHITE = (255, 255, 255)
DARKRED = (128, 0, 0)
DARKBLUE = ( 0, 0, 128)
RED = (255, 0, 0)
GREEN = ( 0, 255, 0)
DARKGREEN = ( 0, 128, 0)
YELLOW = (255, 255, 0)
DARKYELLOW = (128, 128, 0)
BLACK = ( 0, 0, 0)
PTCH = [ 1.00, 1.059633027522936, 1.122324159021407, 1.18960244648318,
1.259938837920489, 1.335168195718654, 1.414067278287462,
1.498470948012232, 1.587767584097859, 1.681957186544343,
1.782262996941896, 1.888073394495413, 2.00 ]
FREQ = { # Parsed from http://www.phy.mtu.edu/~suits/notefreqs.html
'C0': 16.35, 'Cs0': 17.32, 'D0': 18.35, 'Ds0': 19.45, 'E0': 20.60,
'F0': 21.83, 'Fs0': 23.12, 'G0': 24.50, 'Gs0': 25.96, 'A0': 27.50,
'As0': 29.14, 'B0': 30.87, 'C1': 32.70, 'Cs1': 34.65, 'D1': 36.71,
'Ds1': 38.89, 'E1': 41.20, 'F1': 43.65, 'Fs1': 46.25, 'G1': 49.00,
'Gs1': 51.91, 'A1': 55.00, 'As1': 58.27, 'B1': 61.74, 'C2': 65.41,
'Cs2': 69.30, 'D2': 73.42, 'Ds2': 77.78, 'E2': 82.41, 'F2': 87.31,
'Fs2': 92.50, 'G2': 98.00, 'Gs2': 103.83, 'A2': 110.00, 'As2': 116.54,
'B2': 123.47, 'C3': 130.81, 'Cs3': 138.59, 'D3': 146.83, 'Ds3': 155.56,
'E3': 164.81, 'F3': 174.61, 'Fs3': 185.00, 'G3': 196.00, 'Gs3': 207.65,
'A3': 220.00, 'As3': 233.08, 'B3': 246.94, 'C4': 261.63, 'Cs4': 277.18,
'D4': 293.66, 'Ds4': 311.13, 'E4': 329.63, 'F4': 349.23, 'Fs4': 369.99,
'G4': 392.00, 'Gs4': 415.30, 'A4': 440.00, 'As4': 466.16, 'B4': 493.88,
'C5': 523.25, 'Cs5': 554.37, 'D5': 587.33, 'Ds5': 622.25, 'E5': 659.26,
'F5': 698.46, 'Fs5': 739.99, 'G5': 783.99, 'Gs5': 830.61, 'A5': 880.00,
'As5': 932.33, 'B5': 987.77, 'C6': 1046.50, 'Cs6': 1108.73, 'D6': 1174.66,
'Ds6': 1244.51, 'E6': 1318.51, 'F6': 1396.91, 'Fs6': 1479.98, 'G6': 1567.98,
'Gs6': 1661.22, 'A6': 1760.00, 'As6': 1864.66, 'B6': 1975.53, 'C7': 2093.00,
'Cs7': 2217.46, 'D7': 2349.32, 'Ds7': 2489.02, 'E7': 2637.02, 'F7': 2793.83,
'Fs7': 2959.96, 'G7': 3135.96, 'Gs7': 3322.44, 'A7': 3520.00,
'As7': 3729.31, 'B7': 3951.07,
'C8': 4186.01, 'Cs8': 4434.92, 'D8': 4698.64, 'Ds8': 4978.03,
}
buttons = ['A',PTCH[9],PTCH[10],PTCH[11],'B',PTCH[6],PTCH[7],PTCH[8],'C',PTCH[3],PTCH[4],PTCH[5],PTCH[12],PTCH[0],PTCH[1],PTCH[2] ]
octaves = { 'BASE':'0', 'A':'1', 'B':'2', 'C':'3', 'AB':'4', 'AC':'5', 'BC':'6', 'ABC':'7' }
class Note(pygame.mixer.Sound):
def __init__(self, frequency, volume=.1):
self.frequency = frequency
self.oktostop = False
Sound.__init__(self, self.build_samples())
self.set_volume(volume)
def playSound(self, Aval, Dval, Sval, Rval, Acurve, Dcurve, Shold, Rcurve, fps):
self.set_volume(0)
self.play(-1)
if Aval >= MINSEC:
Alength = round(Aval*FRAMERATE)
for num in range(0,Alength+1):
fps.tick_busy_loop(FRAMERATE)
volume = (Acurve[1]*pow(num*MINSEC,Acurve[0]))/100
self.set_volume(volume)
#print(fps.get_time()," ",str(volume))
else:
self.set_volume(100)
if Sval <= 1 and Sval > 0 and Dval >= MINSEC:
Dlength = round(Dval*FRAMERATE)
for num in range(0,Dlength+1):
fps.tick_busy_loop(FRAMERATE)
volume = (Dcurve[1]*pow(num*MINSEC,Dcurve[0])+100)/100
self.set_volume(volume)
#print(fps.get_time()," ",str(volume))
elif Sval <= 1 and Sval > 0 and Dval < MINSEC:
self.set_volume(Sval)
else:
self.set_volume(0)
if Shold >= MINSEC:
Slength = round(Shold*FRAMERATE)
for num in range(0,Slength+1):
fps.tick_busy_loop(FRAMERATE)
while True:
if self.oktostop:
if Sval > 0 and Rval >= MINSEC:
Rlength = round(Rval*FRAMERATE)
for num in range(0,Rlength+1):
fps.tick_busy_loop(FRAMERATE)
volume = (Rcurve[1]*pow(num*MINSEC,Rcurve[0])+(Sval*100))/100
self.set_volume(volume)
#print(fps.get_time()," ",str(volume))
self.stop()
break
def stopSound(self):
self.oktostop = True
def build_samples(self):
Fs = get_init()[0]
f = self.frequency
sample = Fs/f
x = np.arange(sample)
# Sine wave
#y = 0.5*np.sin(2*np.pi*f*x/Fs)
# Square wave
y = 0.5*sg.square(2*np.pi*f*x/Fs)
# Pulse wave
#sig = np.sin(2 * np.pi * x)
#y = 0.5*sg.square(2*np.pi*f*x/Fs, duty=(sig + 1)/2)
# Sawtooth wave
#y = 0.5*sg.sawtooth(2*np.pi*f*x/Fs)
# Triangle wave
#y = 0.5*sg.sawtooth(2*np.pi*f*x/Fs,0.5)
# White noise
#y = 0.5*np.random.uniform(-1.000,1.000,sample)
return y
pre_init(44100, -16, 2, 2048)
pygame.init()
screen = pygame.display.set_mode((480, 320))
pygame.mouse.set_visible(False)
CLK = 5
MISO = 6
MOSI = 13
CS = 12
mcp = Adafruit_MCP3008.MCP3008(clk=CLK, cs=CS, miso=MISO, mosi=MOSI)
Asec = 1.0
Dsec = 1.0
Ssec = 1.0
Rsec = 1.0
matrix0 = Adafruit_Trellis.Adafruit_Trellis()
trellis = Adafruit_Trellis.Adafruit_TrellisSet(matrix0)
NUMTRELLIS = 1
numKeys = NUMTRELLIS * 16
I2C_BUS = 1
trellis.begin((0x70, I2C_BUS))
# light up all the LEDs in order
for i in range(int(numKeys)):
trellis.setLED(i)
trellis.writeDisplay()
time.sleep(0.05)
# then turn them off
for i in range(int(numKeys)):
trellis.clrLED(i)
trellis.writeDisplay()
time.sleep(0.05)
posRecord = {'attack': [], 'decay': [], 'sustain': [], 'release': []}
octaval = {'A':False,'B':False,'C':False}
pitch = 0
tone = None
old_tone = None
note = None
volume = 0
#m = alsaaudio.Mixer('PCM')
#mastervol = m.getvolume()
sounds = {}
values = [0]*8
oldvalues = [0]*8
font = pygame.font.SysFont("comicsansms", 22)
while True:
fps.tick_busy_loop(FRAMERATE)
#print(fps.get_time())
update = False
#m.setvolume(int(round(MCP3008(4).value*100)))
#mastervol = m.getvolume()
values = [0]*8
for i in range(8):
# The read_adc function will get the value of the specified channel (0-7).
values[i] = mcp.read_adc(i)/1000
if values[i] >= 1:
values[i] = 1
# Print the ADC values.
#print('| {0:>4} | {1:>4} | {2:>4} | {3:>4} | {4:>4} | {5:>4} | {6:>4} | {7:>4} |'.format(*values))
#print(str(pygame.mixer.Channel(0).get_busy())+" "+str(pygame.mixer.Channel(1).get_busy())+" "+str(pygame.mixer.Channel(2).get_busy())+" "+str(pygame.mixer.Channel(3).get_busy())+" "+str(pygame.mixer.Channel(4).get_busy())+" "+str(pygame.mixer.Channel(5).get_busy())+" "+str(pygame.mixer.Channel(6).get_busy())+" "+str(pygame.mixer.Channel(7).get_busy()))
Sval = values[2]*Ssec
Aval = values[0]*Asec
if Sval == 1:
Dval = 0
else:
Dval = values[1]*Dsec
if Sval < MINSEC:
Rval = 0
else:
Rval = values[3]*Rsec
if Aval > 0:
if values[4] <= MINSEC: values[4] = MINSEC
Acurve = [round(values[4]*4,3),round(100/pow(Aval,(values[4]*4)),3)]
else:
Acurve = False
if Dval > 0:
if values[5] <= MINSEC: values[5] = MINSEC
Dcurve = [round(values[5]*4,3),round(((Sval*100)-100)/pow(Dval,(values[5]*4)),3)]
else:
Dcurve = False
Shold = values[6]*4*Ssec
if Rval > 0 and Sval > 0:
if values[7] <= MINSEC: values[7] = MINSEC
Rcurve = [round(values[7]*4,3),round(-Sval*100/pow(Rval,(values[7]*4)),3)]
else:
Rcurve = False
if update:
screen.fill((0, 0, 0))
scrnvals = ["A: "+str(round(Aval,2))+"s","D: "+str(round(Dval,2))+"s","S: "+str(round(Sval,2)),"R: "+str(round(Rval,2))+"s","H: "+str(round(Shold,2))+"s","ENV: "+str(round(Aval,2)+round(Dval,2)+round(Shold,2)+round(Rval,2))+"s"]
for line in range(len(scrnvals)):
text = font.render(scrnvals[line], True, (0, 128, 0))
screen.blit(text,(60*line+40, 250))
# Width of one second in number of pixels
ASCALE = 20
DSCALE = 20
SSCALE = 20
RSCALE = 20
if Aval >= MINSEC:
if Aval <= 1:
ASCALE = 80
else:
ASCALE = 20
# Attack
for yPos in range(0,101):
xPos = round(pow((yPos/Acurve[1]),(1/Acurve[0]))*ASCALE)
posRecord['attack'].append((int(xPos) + 40, int(-yPos) + 130))
if len(posRecord['attack']) > 1:
pygame.draw.lines(screen, DARKRED, False, posRecord['attack'], 2)
if Dval >= MINSEC:
if Dval <= 1:
DSCALE = 80
else:
DSCALE = 20
# Decay
for yPos in range(100,round(Sval*100)-1,-1):
xPos = round(pow(((yPos-100)/Dcurve[1]),(1/Dcurve[0]))*DSCALE)
#print(str(yPos)+" = "+str(Dcurve[1])+"*"+str(xPos)+"^"+str(Dcurve[0])+"+100")
posRecord['decay'].append((int(xPos) + 40 + round(Aval*ASCALE), int(-yPos) + 130))
if len(posRecord['decay']) > 1:
pygame.draw.lines(screen, DARKGREEN, False, posRecord['decay'], 2)
# Sustain
if Shold >= MINSEC:
for xPos in range(0,round(Shold*SSCALE)):
posRecord['sustain'].append((int(xPos) + 40 + round(Aval*ASCALE) + round(Dval*DSCALE), int(100-Sval*100) + 30))
if len(posRecord['sustain']) > 1:
pygame.draw.lines(screen, DARKYELLOW, False, posRecord['sustain'], 2)
if Rval >= MINSEC:
if Rval <= 1:
RSCALE = 80
else:
RSCALE = 20
# Release
for yPos in range(round(Sval*100),-1,-1):
xPos = round(pow(((yPos-round(Sval*100))/Rcurve[1]),(1/Rcurve[0]))*RSCALE)
#print(str(xPos)+" = (("+str(yPos)+"-"+str(round(Sval*100))+")/"+str(Rcurve[1])+")^(1/"+str(Rcurve[0])+")")
posRecord['release'].append((int(xPos) + 40 + round(Aval*ASCALE) + round(Dval*DSCALE) + round(Shold*SSCALE), int(-yPos) + 130))
if len(posRecord['release']) > 1:
pygame.draw.lines(screen, DARKBLUE, False, posRecord['release'], 2)
posRecord = {'attack': [], 'decay': [], 'sustain': [], 'release': []}
pygame.display.update()
tone = None
pitch = 0
time.sleep(MINSEC)
# If a button was just pressed or released...
if trellis.readSwitches():
# go through every button
for i in range(numKeys):
# if it was pressed, turn it on
if trellis.justPressed(i):
print('v{0}'.format(i))
trellis.setLED(i)
if i == 0:
octaval['A'] = True
elif i == 4:
octaval['B'] = True
elif i == 8:
octaval['C'] = True
else:
pitch = buttons[i]
button = i
# if it was released, turn it off
if trellis.justReleased(i):
print('^{0}'.format(i))
trellis.clrLED(i)
if i == 0:
octaval['A'] = False
elif i == 4:
octaval['B'] = False
elif i == 8:
octaval['C'] = False
else:
sounds[i].stopSound()
# tell the trellis to set the LEDs we requested
trellis.writeDisplay()
octa = ''
if octaval['A']:
octa += 'A'
if octaval['B']:
octa += 'B'
if octaval['C']:
octa += 'C'
if octa == '':
octa = 'BASE'
if pitch > 0:
tone = FREQ['C0']*pow(2,int(octaves[octa]))*pitch
if tone:
sounds[button] = Note(tone)
_thread.start_new_thread(sounds[button].playSound,(Aval, Dval, Sval, Rval, Acurve, Dcurve, Shold, Rcurve, fps))
print(str(tone))
GPIO.cleanup()
what you are doing at the moment, is firing a sound and giving up all control, until that sound has been played. The general approach here would be to change that and process one sample at a time and push that to a buffer, that is played back periodicaly. That sample would be a sum of all your voices/signals. That way, you can decide for every sample, if a new voice is to be triggered and you can decide how long to play a note while already playing it. One way to do this would be to install a timer, that triggers a callback-function every 1/48000 s if you want a samplingrate of 48kHz.
You could still use multithreading for parallel processing, if you need to process a lot of voices, but not one thread for one voice, that would be overkill in my opinions. If that is nescessary or not depends on how much filtering/processing you do and how effective/ineffective your program is.
e.g.
sample_counter = 0
output_buffer = list()
def callback_fct():
pitch_0 = 2
pitch_1 = 4
sample_counter += 1 #time in ms
signal_0 = waveform(sample_counter * pitch_0)
signal_1 = waveform(sample_counter * pitch_1)
signal_out = signal_0 * 0.5 + signal_1 *0.5
output_buffer.append(signal_out)
return 0
if __name__ == "__main__":
call_this_function_every_ms(callback_fct)
play_sound_from_outputbuffer() #plays sound from outputbuffer by popping samples from the beginning of the list.
Something like that. the waveform() function would give you sample-values based on the actual time times the desired pitch. In C you would do all that with pointers, that overflow at the end of the Wavetable, so you won't have to deal with the question, when you should reset your sample_counter without getting glitches in the waveform (it will get real big realy soon). But I am shure, there are more "pythonic" aproaches to that. Another good reason to do this in a more low level language is speed. As soon as you involve real DSP, you will count your processor clock ticks. At that point python may just have too much overhead.
You are right that python is probably one of the bottlenecks. Commercial soft-synths are, almost without exception, written in C++ to leverage all kinds of optimization - the most pertinent of these is use of vector processing units.
There are, nonetheless, plenty of optimizations open to you in Python:
You are calculating the envelope every sample, and in an expensive way (using pow() - which is not totally hardware accelerated on ARM Cortex CPUs. You can potentially pre-compute the transfer function and simply multiply this with each sample. I also suspect that at 44.1kHz or higher, you don't need to change the envelope every sample - perhaps every 100 or so is good enough.
Your oscillators are also calculated per-sample, and as far as I can tell, per note playback. Some of them are fairly cheap, but trig functions less so, Practical soft-synths use oscillator wave-tables and phase-accumulator as an approximation.
Things you have less control of
Accuracy: You are ultimately generating a 16-bit sample. I suspect that by default Python is using double precision for everything - which has a 48-bit mantissa - about 3 times wider than you need.
Double-precision maths functions are slow on ARM Cortex A parts - significantly so in fact. Single precision can go via the VPU with many operations you would use a lot in DSP such as MAC (multiply-accumulate) taking a single cycle (although they take something like 16 cycles to clear the pipeline). Double precision is orders of magnitude slower.
#Rantanplan's answer above alludes to the kind of software architecture soft-synths are built with - one which is event driven, with a render-handler called upon periodically to supply samples. A polyphonic softsynth an do these in parallel.
In a well optimized implementation the processing of each sample for each voice would involve:
* One lookup from the wave-table (having first calculated the buffer offset using integer maths)
* multiplication by the envelope
* Mix the sample with others in the output buffer.
The key to performance is that there are almost no flow control statements in this tight loop.
Periodically, possibly per callback interval, the envelope would be updated. This parallelizes for several adjacent samples at once on CPUs with VPUs - so that would be two-ways on an ARM Cortex A part.

Python Script slowing down as it progresses?

I have a simulation running that has this basic structure:
from time import time
def CSV(*args):
#write * args to .CSV file
return
def timeleft(a,L,period):
print(#details on how long last period took, ETA#)
for L in range(0,6,4):
for a in range(1,100):
timeA = time()
for t in range(1,1000):
## Manufacturer in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
## Distributor in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
## Wholesaler in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
## Retailer in Supply Chain ##
inventory_accounting_lists.append(#simple calculations#)
# Simulation to determine the optimal B-value (Basestock level)
for B in range(1,100):
for tau in range(1,1000):
## simple inventory accounting operations##
CSV(Simulation_Results)
timeB = time()
timeleft(a,L,timeB-timeA)
As the script continues, it seems to be getting slower and slower. Here is what it is for these values (and it increases linearly as a increases).
L = 0, a = 1: 1.15 minutes
L = 0, a = 99: 1.7 minutes
L = 2, a = 1: 2.7 minutes
L = 2, a = 99: 5.15 minutes
L = 4, a = 1: 4.5 minutes
L = 4, a = 15: 4.95 minutes (this is the latest value it has reached)
Why would each iteration take longer? Each iteration of the loop essentially resets everything except for a master global list, which is being added to each time. However, loops inside each "period" aren't accessing this master list -- they are accessing the same local list every time.
EDIT 1: I will post the simulation code here, in case anyone wants to wade through it, but I warn you, it is rather long, and the variable names are probably unnecessarily confusing.
#########
a = 0.01
L = 0
total = 1000
sim = 500
inv_cost = 1
bl_cost = 4
#########
# Functions
import random
from time import time
time0 = time()
# function to report ETA etc.
def timeleft(a,L,period_time):
if L==0:
periods_left = ((1-a)*100)-1+2*99
if L==2:
periods_left = ((1-a)*100)-1+99
if L==4:
periods_left = ((1-a)*100)-1+0*99
minute_time = period_time/60
minutes_left = (periods_left*period_time)/60
hours_left = (periods_left*period_time)/3600
percentage_complete = 100*((297-periods_left)/297)
print("Time for last period = ","%.2f" % minute_time," minutes")
print("%.2f" % percentage_complete,"% complete")
if hours_left<1:
print("%.2f" % minutes_left," minutes left")
else:
print("%.2f" % hours_left," hours left")
print("")
return
def dcopy(inList):
if isinstance(inList, list):
return list( map(dcopy, inList) )
return inList
# Save values to .CSV file
def CSV(a,L,I_STD_1,I_STD_2,I_STD_3,I_STD_4,O_STD_0,
O_STD_1,O_STD_2,O_STD_3,O_STD_4):
pass
# Initialization
# These are the global, master lists of data
I_STD_1 = [[0],[0],[0]]
I_STD_2 = [[0],[0],[0]]
I_STD_3 = [[0],[0],[0]]
I_STD_4 = [[0],[0],[0]]
O_STD_0 = [[0],[0],[0]]
O_STD_1 = [[0],[0],[0]]
O_STD_2 = [[0],[0],[0]]
O_STD_3 = [[0],[0],[0]]
O_STD_4 = [[0],[0],[0]]
for L in range(0,6,2):
# These are local lists that are appended to at the end of every period
I_STD_1_L = []
I_STD_2_L = []
I_STD_3_L = []
I_STD_4_L = []
O_STD_0_L = []
O_STD_1_L = []
O_STD_2_L = []
O_STD_3_L = []
O_STD_4_L = []
test = []
for n in range(1,100): # THIS is the start of the 99 value loop
a = n/100
print ("L=",L,", alpha=",a)
# Initialization for each Period
F_1 = [0,10] # Forecast
F_2 = [0,10]
F_3 = [0,10]
F_4 = [0,10]
R_0 = [10] # Items Received
R_1 = [10]
R_2 = [10]
R_3 = [10]
R_4 = [10]
for i in range(L):
R_1.append(10)
R_2.append(10)
R_3.append(10)
R_4.append(10)
I_1 = [10] # Final Inventory
I_2 = [10]
I_3 = [10]
I_4 = [10]
IP_1 = [10+10*L] # Inventory Position
IP_2 = [10+10*L]
IP_3 = [10+10*L]
IP_4 = [10+10*L]
O_1 = [10] # Items Ordered
O_2 = [10]
O_3 = [10]
O_4 = [10]
BL_1 = [0] # Backlog
BL_2 = [0]
BL_3 = [0]
BL_4 = [0]
OH_1 = [20] # Items on Hand
OH_2 = [20]
OH_3 = [20]
OH_4 = [20]
OR_1 = [10] # Order received from customer
OR_2 = [10]
OR_3 = [10]
OR_4 = [10]
Db_1 = [10] # Running Average Demand
Db_2 = [10]
Db_3 = [10]
Db_4 = [10]
var_1 = [0] # Running Variance in Demand
var_2 = [0]
var_3 = [0]
var_4 = [0]
B_1 = [IP_1[0]+10] # Optimal Basestock
B_2 = [IP_2[0]+10]
B_3 = [IP_3[0]+10]
B_4 = [IP_4[0]+10]
D = [0,10] # End constomer demand
for i in range(total+1):
D.append(9)
D.append(12)
D.append(8)
D.append(11)
period = [0]
from time import time
timeA = time()
# 1000 time periods t
for t in range(1,total+1):
period.append(t)
#### MANUFACTURER ####
# Manufacturing order from previous time period put into production
R_4.append(O_4[t-1])
#recieve shipment from supplier, calculate items OH HAND
if I_4[t-1]<0:
OH_4.append(R_4[t])
else:
OH_4.append(I_4[t-1]+R_4[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (O_3[t-1] + BL_4[t-1]) <= OH_4[t]: # No Backlog
I_4.append(OH_4[t] - (O_3[t-1] + BL_4[t-1]))
BL_4.append(0)
R_3.append(O_3[t-1]+BL_4[t-1])
else:
I_4.append(OH_4[t] - (O_3[t-1] + BL_4[t-1])) # Backlogged
BL_4.append(-I_4[t])
R_3.append(OH_4[t])
# Update Inventory Position
IP_4.append(IP_4[t-1] + O_4[t-1] - O_3[t-1])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_4[t] + a*O_3[t-1]
F_4.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_4.append((1/t)*sum(O_3[0:t]))
s = 0
for i in range(0,t):
s+=(O_3[i]-Db_4[t])**2
if t==1:
var_4.append(0) # var(1) = 0
else:
var_4.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_4 = [10000000000]*10
Run_4 = [0]*10
for B in range(10,500):
S_OH_4 = OH_4[:]
S_I_4 = I_4[:]
S_R_4 = R_4[:]
S_BL_4 = BL_4[:]
S_IP_4 = IP_4[:]
S_O_4 = O_4[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_4[t] > 0:
S_O_4.append(B - S_IP_4[t])
else:
S_O_4.append(0)
c = 0
for i in range(t+1,t+sim+1):
S_R_4.append(S_O_4[i-1])
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_4[t+1],(var_4[t])**(.5))
# Receive simulated shipment, calculate simulated items on hand
if S_I_4[i-1]<0:
S_OH_4.append(S_R_4[i])
else:
S_OH_4.append(S_I_4[i-1]+S_R_4[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_4[i-1])
S_I_4.append(S_OH_4[i] - owed)
if owed <= S_OH_4[i]: # No Backlog
S_BL_4.append(0)
c += inv_cost*S_I_4[i]
else:
S_BL_4.append(-S_I_4[i]) # Backlogged
c += bl_cost*S_BL_4[i]
# Update Inventory Position
S_IP_4.append(S_IP_4[i-1] + S_O_4[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_4[i]) > 0:
S_O_4.append(B - S_IP_4[i])
else:
S_O_4.append(0)
# Log Simulation costs for that B-value
S_BC_4.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_4[B-i]-S_BC_4[B-i-1])
Run_4.append(sum(dummy)/float(len(dummy)))
if Run_4[B-3] > 0 and B>20:
break
else:
Run_4.append(0)
# Use minimum cost as new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_4))
optimal_B = var[1]
B_4.append(optimal_B)
# Calculate O(t)
if B_4[t] - IP_4[t] > 0:
O_4.append(B_4[t] - IP_4[t])
else:
O_4.append(0)
#### DISTRIBUTOR ####
#recieve shipment from supplier, calculate items OH HAND
if I_3[t-1]<0:
OH_3.append(R_3[t])
else:
OH_3.append(I_3[t-1]+R_3[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (O_2[t-1] + BL_3[t-1]) <= OH_3[t]: # No Backlog
I_3.append(OH_3[t] - (O_2[t-1] + BL_3[t-1]))
BL_3.append(0)
R_2.append(O_2[t-1]+BL_3[t-1])
else:
I_3.append(OH_3[t] - (O_2[t-1] + BL_3[t-1])) # Backlogged
BL_3.append(-I_3[t])
R_2.append(OH_3[t])
# Update Inventory Position
IP_3.append(IP_3[t-1] + O_3[t-1] - O_2[t-1])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_3[t] + a*O_2[t-1]
F_3.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_3.append((1/t)*sum(O_2[0:t]))
s = 0
for i in range(0,t):
s+=(O_2[i]-Db_3[t])**2
if t==1:
var_3.append(0) # var(1) = 0
else:
var_3.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_3 = [10000000000]*10
Run_3 = [0]*10
for B in range(10,500):
S_OH_3 = OH_3[:]
S_I_3 = I_3[:]
S_R_3 = R_3[:]
S_BL_3 = BL_3[:]
S_IP_3 = IP_3[:]
S_O_3 = O_3[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_3[t] > 0:
S_O_3.append(B - S_IP_3[t])
else:
S_O_3.append(0)
c = 0
for i in range(t+1,t+sim+1):
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_3[t+1],(var_3[t])**(.5))
S_R_3.append(S_O_3[i-1])
# Receive simulated shipment, calculate simulated items on hand
if S_I_3[i-1]<0:
S_OH_3.append(S_R_3[i])
else:
S_OH_3.append(S_I_3[i-1]+S_R_3[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_3[i-1])
S_I_3.append(S_OH_3[i] - owed)
if owed <= S_OH_3[i]: # No Backlog
S_BL_3.append(0)
c += inv_cost*S_I_3[i]
else:
S_BL_3.append(-S_I_3[i]) # Backlogged
c += bl_cost*S_BL_3[i]
# Update Inventory Position
S_IP_3.append(S_IP_3[i-1] + S_O_3[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_3[i]) > 0:
S_O_3.append(B - S_IP_3[i])
else:
S_O_3.append(0)
# Log Simulation costs for that B-value
S_BC_3.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_3[B-i]-S_BC_3[B-i-1])
Run_3.append(sum(dummy)/float(len(dummy)))
if Run_3[B-3] > 0 and B>20:
break
else:
Run_3.append(0)
# Use minimum cost as new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_3))
optimal_B = var[1]
B_3.append(optimal_B)
# Calculate O(t)
if B_3[t] - IP_3[t] > 0:
O_3.append(B_3[t] - IP_3[t])
else:
O_3.append(0)
#### WHOLESALER ####
#recieve shipment from supplier, calculate items OH HAND
if I_2[t-1]<0:
OH_2.append(R_2[t])
else:
OH_2.append(I_2[t-1]+R_2[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (O_1[t-1] + BL_2[t-1]) <= OH_2[t]: # No Backlog
I_2.append(OH_2[t] - (O_1[t-1] + BL_2[t-1]))
BL_2.append(0)
R_1.append(O_1[t-1]+BL_2[t-1])
else:
I_2.append(OH_2[t] - (O_1[t-1] + BL_2[t-1])) # Backlogged
BL_2.append(-I_2[t])
R_1.append(OH_2[t])
# Update Inventory Position
IP_2.append(IP_2[t-1] + O_2[t-1] - O_1[t-1])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_2[t] + a*O_1[t-1]
F_2.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_2.append((1/t)*sum(O_1[0:t]))
s = 0
for i in range(0,t):
s+=(O_1[i]-Db_2[t])**2
if t==1:
var_2.append(0) # var(1) = 0
else:
var_2.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_2 = [10000000000]*10
Run_2 = [0]*10
for B in range(10,500):
S_OH_2 = OH_2[:]
S_I_2 = I_2[:]
S_R_2 = R_2[:]
S_BL_2 = BL_2[:]
S_IP_2 = IP_2[:]
S_O_2 = O_2[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_2[t] > 0:
S_O_2.append(B - S_IP_2[t])
else:
S_O_2.append(0)
c = 0
for i in range(t+1,t+sim+1):
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_2[t+1],(var_2[t])**(.5))
# Receive simulated shipment, calculate simulated items on hand
S_R_2.append(S_O_2[i-1])
if S_I_2[i-1]<0:
S_OH_2.append(S_R_2[i])
else:
S_OH_2.append(S_I_2[i-1]+S_R_2[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_2[i-1])
S_I_2.append(S_OH_2[i] - owed)
if owed <= S_OH_2[i]: # No Backlog
S_BL_2.append(0)
c += inv_cost*S_I_2[i]
else:
S_BL_2.append(-S_I_2[i]) # Backlogged
c += bl_cost*S_BL_2[i]
# Update Inventory Position
S_IP_2.append(S_IP_2[i-1] + S_O_2[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_2[i]) > 0:
S_O_2.append(B - S_IP_2[i])
else:
S_O_2.append(0)
# Log Simulation costs for that B-value
S_BC_2.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_2[B-i]-S_BC_2[B-i-1])
Run_2.append(sum(dummy)/float(len(dummy)))
if Run_2[B-3] > 0 and B>20:
break
else:
Run_2.append(0)
# Use minimum cost as new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_2))
optimal_B = var[1]
B_2.append(optimal_B)
# Calculate O(t)
if B_2[t] - IP_2[t] > 0:
O_2.append(B_2[t] - IP_2[t])
else:
O_2.append(0)
#### RETAILER ####
#recieve shipment from supplier, calculate items OH HAND
if I_1[t-1]<0:
OH_1.append(R_1[t])
else:
OH_1.append(I_1[t-1]+R_1[t])
# Recieve and dispatch order, update Inventory and Backlog for time t
if (D[t] +BL_1[t-1]) <= OH_1[t]: # No Backlog
I_1.append(OH_1[t] - (D[t] + BL_1[t-1]))
BL_1.append(0)
R_0.append(D[t]+BL_1[t-1])
else:
I_1.append(OH_1[t] - (D[t] + BL_1[t-1])) # Backlogged
BL_1.append(-I_1[t])
R_0.append(OH_1[t])
# Update Inventory Position
IP_1.append(IP_1[t-1] + O_1[t-1] - D[t])
# Use exponential smoothing to forecast future demand
future_demand = (1-a)*F_1[t] + a*D[t]
F_1.append(future_demand)
# Calculate D_bar(t) and Var(t)
Db_1.append((1/t)*sum(D[1:t+1]))
s = 0
for i in range(1,t+1):
s+=(D[i]-Db_1[t])**2
if t==1: # Var(1) = 0
var_1.append(0)
else:
var_1.append((1/(t-1))*s)
# Simulation to determine B(t)
S_BC_1 = [10000000000]*10
Run_1 = [0]*10
for B in range(10,500):
S_OH_1 = OH_1[:]
S_I_1 = I_1[:]
S_R_1 = R_1[:]
S_BL_1 = BL_1[:]
S_IP_1 = IP_1[:]
S_O_1 = O_1[:]
# Update O(t)(the period just before the simulation begins)
# using the B value for the simulation
if B - S_IP_1[t] > 0:
S_O_1.append(B - S_IP_1[t])
else:
S_O_1.append(0)
c=0
for i in range(t+1,t+sim+1):
#simulate demand
demand = -1
while demand <0:
demand = random.normalvariate(F_1[t+1],(var_1[t])**(.5))
S_R_1.append(S_O_1[i-1])
# Receive simulated shipment, calculate simulated items on hand
if S_I_1[i-1]<0:
S_OH_1.append(S_R_1[i])
else:
S_OH_1.append(S_I_1[i-1]+S_R_1[i])
# Receive and send order, update Inventory and Backlog (simulated)
owed = (demand + S_BL_1[i-1])
S_I_1.append(S_OH_1[i] - owed)
if owed <= S_OH_1[i]: # No Backlog
S_BL_1.append(0)
c += inv_cost*S_I_1[i]
else:
S_BL_1.append(-S_I_1[i]) # Backlogged
c += bl_cost*S_BL_1[i]
# Update Inventory Position
S_IP_1.append(S_IP_1[i-1] + S_O_1[i-1] - demand)
# Update Order, Upstream member dispatches goods
if (B-S_IP_1[i]) > 0:
S_O_1.append(B - S_IP_1[i])
else:
S_O_1.append(0)
# Log Simulation costs for that B-value
S_BC_1.append(c)
# If the simulated costs are increasing, stop
if B>11:
dummy = []
for i in range(0,10):
dummy.append(S_BC_1[B-i]-S_BC_1[B-i-1])
Run_1.append(sum(dummy)/float(len(dummy)))
if Run_1[B-3] > 0 and B>20:
break
else:
Run_1.append(0)
# Use minimum as your new B(t)
var = min((val, idx) for (idx, val) in enumerate(S_BC_1))
optimal_B = var[1]
B_1.append(optimal_B)
# Calculate O(t)
if B_1[t] - IP_1[t] > 0:
O_1.append(B_1[t] - IP_1[t])
else:
O_1.append(0)
### Calculate the Standard Devation of the last half of time periods ###
def STD(numbers):
k = len(numbers)
mean = sum(numbers) / k
SD = (sum([dev*dev for dev in [x-mean for x in numbers]])/(k-1))**.5
return SD
start = (total//2)+1
# Only use the last half of the time periods to calculate the standard deviation
I_STD_1_L.append(STD(I_1[start:]))
I_STD_2_L.append(STD(I_2[start:]))
I_STD_3_L.append(STD(I_3[start:]))
I_STD_4_L.append(STD(I_4[start:]))
O_STD_0_L.append(STD(D[start:]))
O_STD_1_L.append(STD(O_1[start:]))
O_STD_2_L.append(STD(O_2[start:]))
O_STD_3_L.append(STD(O_3[start:]))
O_STD_4_L.append(STD(O_4[start:]))
from time import time
timeB = time()
timeleft(a,L,timeB-timeA)
I_STD_1[L//2] = I_STD_1_L[:]
I_STD_2[L//2] = I_STD_2_L[:]
I_STD_3[L//2] = I_STD_3_L[:]
I_STD_4[L//2] = I_STD_4_L[:]
O_STD_0[L//2] = O_STD_0_L[:]
O_STD_1[L//2] = O_STD_1_L[:]
O_STD_2[L//2] = O_STD_2_L[:]
O_STD_3[L//2] = O_STD_3_L[:]
O_STD_4[L//2] = O_STD_4_L[:]
CSV(a,L,I_STD_1,I_STD_2,I_STD_3,I_STD_4,O_STD_0,
O_STD_1,O_STD_2,O_STD_3,O_STD_4)
from time import time
timeE = time()
print("Run Time: ",(timeE-time0)/3600," hours")
This would be a good time to look at a profiler. You can profile the code to determine where time is being spent. It would appear likely that you issue is in the simulation code, but without being able to see that code the best help you're likely to get going to be vague.
Edit in light of added code:
You're doing a fair amount of copying of lists, which while not terribly expensive can consume a lot of time.
I agree the your code is probably unnecessarily confusing and would advise you to clean up the code. Changing the confusing names to meaningful ones may help you find where you're having a problem.
Finally, it may be the case that your simulation is simply computationally expensive. You might want to consider looking into a SciPy, Pandas, or some other Python mathematic package to get better performance and perhaps better tools for expressing the model you're simulating.
I experienced a similar problem with a Python 3.x script I wrote. The script randomly generated 1,000,000 (one million) JSON objects, writing them out to a file.
My problem was that the program was growing progressively slower as time proceeded. Here is a timestamp trace every 10,000 objects:
So far: Mar23-17:56:46: 0
So far: Mar23-17:56:48: 10000 ( 2 seconds)
So far: Mar23-17:56:50: 20000 ( 2 seconds)
So far: Mar23-17:56:55: 30000 ( 5 seconds)
So far: Mar23-17:57:01: 40000 ( 6 seconds)
So far: Mar23-17:57:09: 50000 ( 8 seconds)
So far: Mar23-17:57:18: 60000 ( 8 seconds)
So far: Mar23-17:57:29: 70000 (11 seconds)
So far: Mar23-17:57:42: 80000 (13 seconds)
So far: Mar23-17:57:56: 90000 (14 seconds)
So far: Mar23-17:58:13: 100000 (17 seconds)
So far: Mar23-17:58:30: 110000 (17 seconds)
So far: Mar23-17:58:51: 120000 (21 seconds)
So far: Mar23-17:59:12: 130000 (21 seconds)
So far: Mar23-17:59:35: 140000 (23 seconds)
As can be seen, the script takes progressively longer to generate groups of 10,000 records.
In my case it turned out to be the way I was generating unique ID numbers, each in the range of 10250000000000-10350000000000. To avoid regenerating the same ID twice, I stored a newly generated ID in a list, checking later that the ID does not exist in the list:
trekIdList = []
def GenerateRandomTrek ():
global trekIdList
while True:
r = random.randint (10250000000000, 10350000000000)
if not r in trekIdList:
trekIdList.append (r)
return r
The problem is that an unsorted list takes O(n) to search. As newly generated IDs are appended to the list, the time needed to traverse/search the list grows.
The solution was to switch to a dictionary (or map):
trekIdList = {}
. . .
def GenerateRandomTrek ():
global trekIdList
while True:
r = random.randint (10250000000000, 10350000000000)
if not r in trekIdList:
trekIdList [r] = 1
return r
The improvement was immediate:
So far: Mar23-18:11:30: 0
So far: Mar23-18:11:30: 10000
So far: Mar23-18:11:31: 20000
So far: Mar23-18:11:31: 30000
So far: Mar23-18:11:31: 40000
So far: Mar23-18:11:32: 50000
So far: Mar23-18:11:32: 60000
So far: Mar23-18:11:32: 70000
So far: Mar23-18:11:33: 80000
So far: Mar23-18:11:33: 90000
So far: Mar23-18:11:33: 100000
So far: Mar23-18:11:34: 110000
So far: Mar23-18:11:34: 120000
So far: Mar23-18:11:34: 130000
So far: Mar23-18:11:35: 140000
The reason is that accessing a value in a dictionary/map/hash is O(1).
Moral: When dealing with large numbers of items, use a dictionary/map or binary searching a sorted list rathen than an unordered list.
You can use cProfile and the like but many times it will still be hard to spot the issue. However knowing that slowness is in linear progression is at huge benefit for you since you already kind of know what the problem is, but not exactly where it is.
I'd start by elimination and simplifying:
Make a small fast example that demonstrates the sluggishness as a separate file.
Run the above and keep removing/commenting out huge portions of the code.
Once you have narrowed down enough, look for Python keywords values(), items(), in, for , deepcopy as good examples.
By continuously simplifying the example and re-running the test script you will eventually get down to the core issue.
Once you resolved one bottleneck, you might find that you still exhibit the sluggishness when you bring back the old code. Most probably there's more than 1 bottlenecks then.

Categories