What is wrong with my algorithm/code? - python

I am trying to figure out what is wrong with my code. Currently, I am trying to get the averages of everything with the same temp (ex temp 18 = 225 conductivity average, temp 19 = 15 conductivity average, etc).
Could someone tell me if this is a simple coding mistake or a algorithm mistake and offer some help to fix this problem?
temp = [18,18,19,19,20]
conductivity = [200,250,20,10,15]
tempcheck = temp[0];
conductivitysum = 0;
datapoint = 0;
assert len(temp) == len(conductivity)
for i in range(len(temp)):
if tempcheck == temp[i]:
datapoint+=1
conductivitysum+=conductivity[i]
else:
print conductivitysum/datapoint
datapoint=0
conductivitysum=0
tempcheck=temp[i]
For some reason, it is printing out
225
10
When it should be printing out
225
15
15

in else clause
put :
conductivitysum=0
datapoint=0
tempcheck = temp[i]
conductivitysum+=conductivity[i]
datapoint+=1
because when you go to else clause, you miss that particular conductivity of i. It doesn't get saved. So before moving to next i, save that conductivity

Change the else to:
for i in range(len(temp)):
if tempcheck == temp[i]:
datapoint+=1
conductivitysum+=conductivity[i]
else:
print conductivitysum/datapoint
datapoint=1
conductivitysum=conductivity[i]
tempcheck=temp[i]
When you get to the pair (19, 20) you need to keep them and count one datapoint, not 0 datapoints. At the moment you are skipping them and only keeping the next one - (19, 10).
Alternatively, rewrite it as
>>> temp = [18,18,19,19,20]
>>> conductivity = [200,250,20,10,15]
# build a dictionary to group the conductivities by temperature
>>> groups = {}
>>> for (t, c) in zip(temp, conductivity):
... groups[t] = groups.get(t, []) + [c]
...
# view it
>>> groups
{18: [200, 250], 19: [20, 10], 20: [15]}
# average the conductivities for each temperature
>>> for t, cs in groups.items():
print t, float(sum(cs))/len(cs)
...
18 225
19 15
20 15
>>>

When I saw this code, the first thing that popped into my head was the zip function. I hope that the following code is what you want.
temp = [18,18,19,19,20]
conductivity = [200,250,20,10,15]
assert len(temp) == len(conductivity)
# Matches each temp value to its corresponding conductivity value with zip
relations = [x for x in zip(temp, conductivity)]
for possible_temp in set(temp): # Takes each possible temparature (18,19,20)
total = 0
divide_by = 0
# The next four lines of code will check each match and figure out the
# summed total conductivity value for each temp value and how much it
# should be divided by to create an average.
for relation in relations:
if relation[0] == possible_temp:
total += relation[1]
divide_by += 1
print(int(total / divide_by))

Related

How to efficiently process a list that continously being appended with new item in Python

Objective:
To visualize the population size of a particular organism over finite time.
Assumptions:
The organism has a life span of age_limit days
Only Females of age day_lay_egg days can lay the egg, and the female is allowed to lay an egg a maximum of max_lay_egg times. Each breeding session, a maximum of only egg_no eggs can be laid with a 50% probability of producing male offspring.
Initial population of 3 organisms consist of 2 Female and 1 Male
Code Snippets:
Currently, the code below should produced the expected output
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
def get_breeding(d,**kwargs):
if d['lay_egg'] <= kwargs['max_lay_egg'] and d['dborn'] > kwargs['day_lay_egg'] and d['s'] == 1:
nums = np.random.choice([0, 1], size=kwargs['egg_no'], p=[.5, .5]).tolist()
npol=[dict(s=x,d=d['d'], lay_egg=0, dborn=0) for x in nums]
d['lay_egg'] = d['lay_egg'] + 1
return d,npol
return d,None
def to_loop_initial_population(**kwargs):
npol=kwargs['ipol']
nday = 0
total_population_per_day = []
while nday < kwargs['nday_limit']:
# print(f'Executing day {nday}')
k = []
for dpol in npol:
dpol['d'] += 1
dpol['dborn'] += 1
dpol,h = get_breeding(dpol,**kwargs)
if h is None and dpol['dborn'] <= kwargs['age_limit']:
# If beyond the age limit, ignore the parent and update only the decedent
k.append(dpol)
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
# If below age limit, append the parent and its offspring
h.extend([dpol])
k.extend(h)
total_population_per_day.append(dict(nsize=len(k), day=nday))
nday += 1
npol = k
return total_population_per_day
## Some spec and store all setting in a dict
numsex=[1,1,0] # 0: Male, 1: Female
# s: sex, d: day, lay_egg: Number of time the female lay an egg, dborn: The organism age
ipol=[dict(s=x,d=0, lay_egg=0, dborn=0) for x in numsex] # The initial population
age_limit = 45 # Age limit for the species
egg_no=3 # Number of eggs
day_lay_egg = 30 # Matured age for egg laying
nday_limit=360
max_lay_egg=2
para=dict(nday_limit=nday_limit,ipol=ipol,age_limit=age_limit,
egg_no=egg_no,day_lay_egg=day_lay_egg,max_lay_egg=max_lay_egg)
dpopulation = to_loop_initial_population(**para)
### make some plot
df = pd.DataFrame(dpopulation)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()
Output:
Problem/Question:
The time to complete the execution time increases exponentially with nday_limit. I need to improve the efficiency of the code. How can I speed up the running time?
Other Thoughts:
I am tempted to apply joblib as below. To my surprise, the execution time is worse.
def djob(dpol,k,**kwargs):
dpol['d'] = dpol['d'] + 1
dpol['dborn'] = dpol['dborn'] + 1
dpol,h = get_breeding(dpol,**kwargs)
if h is None and dpol['dborn'] <= kwargs['age_limit']:
# If beyond the age limit, ignore the that particular subject
k.append(dpol)
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
# If below age limit, append the parent and its offspring
h.extend([dpol])
k.extend(h)
return k
def to_loop_initial_population(**kwargs):
npol=kwargs['ipol']
nday = 0
total_population_per_day = []
while nday < kwargs['nday_limit']:
k = []
njob=1 if len(npol)<=50 else 4
if njob==1:
print(f'Executing day {nday} with single cpu')
for dpols in npol:
k=djob(dpols,k,**kwargs)
else:
print(f'Executing day {nday} with single parallel')
k=Parallel(n_jobs=-1)(delayed(djob)(dpols,k,**kwargs) for dpols in npol)
k = list(itertools.chain(*k))
ll=1
total_population_per_day.append(dict(nsize=len(k), day=nday))
nday += 1
npol = k
return total_population_per_day
for
nday_limit=365
Your code looks alright overall but I can see several points of improvement that are slowing your code down significantly.
Though it must be noted that you can't really help the code slowing down too much with increasing nday values, since the population you need to keep track of keeps growing and you keep re-populating a list to track this. It's expected as the number of objects increase, the loops will take longer to complete, but you can reduce the time it takes to complete a single loop.
elif isinstance(h, list) and dpol['dborn'] <= kwargs['age_limit']:
Here you ask the instance of h every single loop, after confirming whether it's None. You know for a fact that h is going to be a list, and if not, your code will error anyway even before reaching that line for the list not to have been able to be created.
Furthermore, you have a redundant condition check for age of dpol, and then redundantly first extend h by dpol and then k by h. This can be simplified together with the previous issue to this:
if dpol['dborn'] <= kwargs['age_limit']:
k.append(dpol)
if h:
k.extend(h)
The results are identical.
Additionally, you're passing around a lot of **kwargs. This is a sign that your code should be a class instead, where some unchanging parameters are saved through self.parameter. You could even use a dataclass here (https://docs.python.org/3/library/dataclasses.html)
Also, you mix responsibilities of functions which is unnecessary and makes your code more confusing. For instance:
def get_breeding(d,**kwargs):
if d['lay_egg'] <= kwargs['max_lay_egg'] and d['dborn'] > kwargs['day_lay_egg'] and d['s'] == 1:
nums = np.random.choice([0, 1], size=kwargs['egg_no'], p=[.5, .5]).tolist()
npol=[dict(s=x,d=d['d'], lay_egg=0, dborn=0) for x in nums]
d['lay_egg'] = d['lay_egg'] + 1
return d,npol
return d,None
This code contains two responsibilities: Generating a new individual if conditions are met, and checking these conditions, and returning two different things based on them.
This would be better done through two separate functions, one which simply checks the conditions, and another that generates a new individual as follows:
def check_breeding(d, max_lay_egg, day_lay_egg):
return d['lay_egg'] <= max_lay_egg and d['dborn'] > day_lay_egg and d['s'] == 1
def get_breeding(d, egg_no):
nums = np.random.choice([0, 1], size=egg_no, p=[.5, .5]).tolist()
npol=[dict(s=x, d=d['d'], lay_egg=0, dborn=0) for x in nums]
return npol
Where d['lay_egg'] could be updated in-place when iterating over the list if the condition is met.
You could speed up your code even further this way, if you edit the list as you iterate over it (it is not typically recommended but it's perfectly fine to do if you know what you're doing. Make sure to do it by using the index and limit it to the previous bounds of the length of the list, and decrement the index when an element is removed)
Example:
i = 0
maxiter = len(npol)
while i < maxiter:
if check_breeding(npol[i], max_lay_egg, day_lay_egg):
npol.extend(get_breeding(npol[i], egg_no))
if npol[i]['dborn'] > age_limit:
npol.pop(i)
i -= 1
maxiter -= 1
Which could significantly reduce processing time since you're not making a new list and appending all elements all over again every iteration.
Finally, you could check some population growth equation and statistical methods, and you could even reduce this whole code to a calculation problem with iterations, though that wouldn't be a sim anymore.
Edit
I've fully implemented my suggestions for improvements to your code and timed them in a jupyter notebook using %%time. I've separated out function definitions from both so they wouldn't contribute to the time, and the results are telling. I also made it so females produce another female 100% of the time, to remove randomness, otherwise it would be even faster. I compared the results from both to verify they produce identical results (they do, but I removed the 'd_born' parameter cause it's not used in the code apart from setting).
Your implementation, with nday_limit=100 and day_lay_egg=15:
Wall time 23.5s
My implementation with same parameters:
Wall time 18.9s
So you can tell the difference is quite significant, which grows even farther apart for larger nday_limit values.
Full implementation of edited code:
from dataclasses import dataclass
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#dataclass
class Organism:
sex: int
times_laid_eggs: int = 0
age: int = 0
def __init__(self, sex):
self.sex = sex
def check_breeding(d, max_lay_egg, day_lay_egg):
return d.times_laid_eggs <= max_lay_egg and d.age > day_lay_egg and d.sex == 1
def get_breeding(egg_no): # Make sure to change probabilities back to 0.5 and 0.5 before using it
nums = np.random.choice([0, 1], size=egg_no, p=[0.0, 1.0]).tolist()
npol = [Organism(x) for x in nums]
return npol
def simulate(organisms, age_limit, egg_no, day_lay_egg, max_lay_egg, nday_limit):
npol = organisms
nday = 0
total_population_per_day = []
while nday < nday_limit:
i = 0
maxiter = len(npol)
while i < maxiter:
npol[i].age += 1
if check_breeding(npol[i], max_lay_egg, day_lay_egg):
npol.extend(get_breeding(egg_no))
npol[i].times_laid_eggs += 1
if npol[i].age > age_limit:
npol.pop(i)
maxiter -= 1
continue
i += 1
total_population_per_day.append(dict(nsize=len(npol), day=nday))
nday += 1
return total_population_per_day
if __name__ == "__main__":
numsex = [1, 1, 0] # 0: Male, 1: Female
ipol = [Organism(x) for x in numsex] # The initial population
age_limit = 45 # Age limit for the species
egg_no = 3 # Number of eggs
day_lay_egg = 15 # Matured age for egg laying
nday_limit = 100
max_lay_egg = 2
dpopulation = simulate(ipol, age_limit, egg_no, day_lay_egg, max_lay_egg, nday_limit)
df = pd.DataFrame(dpopulation)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()
Try structuring your code as a matrix like state[age][eggs_remaining] = count instead. It will have age_limit rows and max_lay_egg columns.
Males start in the 0 eggs_remaining column, and every time a female lays an egg they move down one (3->2->1->0 with your code above).
For each cycle, you just drop the last row, iterate over all the rows after age_limit and insert a new first row with the number of males and females.
If (as in your example) there only is a vanishingly small chance that a female would die of old age before laying all their eggs, you can just collapse everything into a state_alive[age][gender] = count and a state_eggs[eggs_remaining] = count instead, but it shouldn't be necessary unless the age goes really high or you want to run thousands of simulations.
use numpy array operation as much as possible instead of using loop can improve your performance, see below codes tested in notebook - https://www.kaggle.com/gfteafun/notebook03118c731b
Note that when comparing the time the nsize scale matters.
%%time​
​
# s: sex, d: day, lay_egg: Number of time the female lay an egg, dborn: The organism age
x = np.array([(x, 0, 0, 0) for x in numsex ] )
iparam = np.array([0, 1, 0, 1])
​
total_population_per_day = []
for nday in range(nday_limit):
x = x + iparam
c = np.all(x < np.array([2, nday_limit, max_lay_egg, age_limit]), axis=1) & np.all(x >= np.array([1, day_lay_egg, 0, day_lay_egg]), axis=1)
total_population_per_day.append(dict(nsize=len(x[x[:,3]<age_limit, :]), day=nday))
n = x[c, 2].shape[0]
​
if n > 0:
x[c, 2] = x[c, 2] + 1
newborns = np.array([(x, nday, 0, 0) for x in np.random.choice([0, 1], size=egg_no, p=[.5, .5]) for i in range(n)])
x = np.vstack((x, newborns))
​
​
df = pd.DataFrame(total_population_per_day)
sns.lineplot(x="day", y="nsize", data=df)
plt.xticks(rotation=15)
plt.title('Day vs population')
plt.show()

Minimum number of steps to reach a given number

I need to calculate the minimum number of ways to reach a value, x, from value n, by adding/subtracting a list of values, l, to n.
For example: Value n = 100, value X = 45
List, l,: 50,6,1
The best way to do this is to say:
100-50-6+1 = 45
I want a programme to work this out for any value of x and n given list, l
I am really struggling to outline how I would write this.
I am confused about how to overcome the following issues:
How to inform the programme if I should attempt an addition or
subtraction and how many times this should be done. For example I
might need to subtract, then add, then subtract again to reach a
solution
How do I include enough for/while loops to ensure I can provide a
solution for all possible input values
Has anyone come across an issue like this before and have any ideas how I could outline the code for such a solution (I am using Python if it helps direct me towards learning about particular functions available that could assist me)
Thanks
This is my attempt so far but I am stuck
inputA = ""
while inputA == "":
inputA = input("""Please enter two numbers, separated by a comma.
The first value should indicate the number of jugs:
The second value should indicate the volume to be measured
""")
itemList = list(inputA.split(","))
valueToMeasure = int(itemList[1])
inputB = ""
while inputB == "":
inputB = input("Plese enter the volumes for the {} jug(s) listed: ".format((itemList[0])))
if len(inputB.split(",")) != int(itemList[0]):
inputB = ""
TargetVolume = itemList[1]
jugSizes = inputB.split(",")
print("Calculating: smallest number of steps to get", TargetVolume, "ml using jugs of sizes:", jugSizes)
jugSizes.sort()
jugSizes.reverse()
largestJug = int(jugSizes[0])
ratioTable = {}
for item in jugSizes:
firstVal = int(jugSizes[0])
itemV = int(item)
valueToAssign = firstVal/itemV
ratioTable[int(item)] = int(valueToAssign)
taskPossible = True
if valueToMeasure > largestJug:
print ("Impossible task")
taskPossible = False
newList = jugSizes
if taskPossible == True:
for item in jugSizes:
if item < TargetVolume: break
newList = newList[1:]
newDict = {}
for itemA in ratioTable:
if int(itemA) < int(item):
newDict[itemA]= ratioTable[itemA]
print ("Do work with these numbers:", newDict)
This is how I would approach the problem if I understand correctly.
X = 45
largest_jug = measured = 100
jug_sizes = [50, 6, 1]
steps = []
jug_to_use = 0
while measured != X:
if jug_to_use < len(jug_sizes) - 1: # we have smaller jugs in reserve
error_with_large_jug = min([abs(measured - jug_sizes[jug_to_use] - X), abs(measured + jug_sizes[jug_to_use] - X)])
error_with_small_jug = min([abs(measured - jug_sizes[jug_to_use + 1] - X), abs(measured + jug_sizes[jug_to_use + 1] - X)])
if error_with_small_jug < error_with_large_jug:
jug_to_use += 1
if measured > X:
measured -= jug_sizes[jug_to_use]
steps.append(('-', jug_sizes[jug_to_use]))
else:
measured += jug_sizes[jug_to_use]
steps.append(('+', jug_sizes[jug_to_use]))
print(steps)
Yielding
[('-', 50), ('-', 6), ('+', 1)]
It basically starts by using the largest jug, until it's in range of the next size and so on. We can test it with randomly sized jugs of [30, 7, 1] and see it again results in an accurate answer of [('-', 30), ('-', 30), ('+', 7), ('-', 1), ('-', 1)].
Important notes:
jug_sizes should be ordered largest to smallest
This solution assumes the X can be reached with the numbers provided in jug_sizes (otherwise it will infinitely loop)
This doesn't take into account that a jug size can make the target unreachable (i.e. [50, 12, 5] where the 12 size should be skipped, otherwise the solution is unreachable
This assumes every jug should be used (related to above point)
I'm sure you could figure out solutions for all these problems based on your specific circumstances though

How to cycle through the index of an array?

line 14 is where my main problem is.i need to cycle through each item in the array and use it's index to determine whether or not it is a multiple of four so i can create proper spacing for binary numbers.
def decimalToBinary(hu):
bits = []
h = []
while hu > 0:
kla = hu%2
bits.append(kla)
hu = int(hu/2)
for i in reversed(bits):
h.append(i)
if len(h) <= 4:
print (''.join(map(str,h)))
else:
for j in range(len(h)):
h.index(1) = h.index(1)+1
if h.index % 4 != 0:
print (''.join(map(str,h)))
elif h.index % 4 == 0:
print (' '.join(map(str,h)))
decimalToBinary( 23 )
If what you're looking for is the index of the list from range(len(h)) in the for loop, then you can change that line to for idx,j in enumerate(range(len(h))): where idx is the index of the range.
This line h.index(1) = h.index(1)+1 is incorrect. Modified your function, so at least it executes and generates an output, but whether it is correct, i dont know. Anyway, hope it helps:
def decimalToBinary(hu):
bits = []
h = []
while hu > 0:
kla = hu%2
bits.append(kla)
hu = int(hu/2)
for i in reversed(bits):
h.append(i)
if len(h) <= 4:
print (''.join(map(str,h)))
else:
for j in range(len(h)):
h_index = h.index(1)+1 # use h_index variable instead of h.index(1)
if h_index % 4 != 0:
print (''.join(map(str,h)))
elif h_index % 4 == 0:
print (' '.join(map(str,h)))
decimalToBinary( 23 )
# get binary version to check your result against.
print(bin(23))
This results:
#outout from decimalToBinary
10111
10111
10111
10111
10111
#output from bin(23)
0b10111
You're trying to join the bits to string and separate them every 4 bits. You could modify your code with Marcin's correction (by replacing the syntax error line and do some other improvements), but I suggest doing it more "Pythonically".
Here's my version:
def decimalToBinary(hu):
bits = []
while hu > 0:
kla = hu%2
bits.append(kla)
hu = int(hu/2)
h = [''.join(map(str, bits[i:i+4])) for i in range(0,len(bits),4)]
bu = ' '.join(h)
print bu[::-1]
Explanation for the h assignment line:
range(0,len(bits),4): a list from 0 to length of bits with step = 4, eg. [0, 4, 8, ...]
[bits[i:i+4] for i in [0, 4, 8]: a list of lists whose element is every four elements from bits
eg. [ [1,0,1,0], [0,1,0,1] ...]
[''.join(map(str, bits[i:i+4])) for i in range(0,len(bits),4)]: convert the inner list to string
bu[::-1]: reverse the string
If you are learning Python, it's good to do your way. As #roippi pointed out,
for index, value in enumerate(h):
will give you access to both index and value of member of h in each loop.
To group 4 digits, I would do like this:
def decimalToBinary(num):
binary = str(bin(num))[2:][::-1]
index = 0
spaced = ''
while index + 4 < len(binary):
spaced += binary[index:index+4]+' '
index += 4
else:
spaced += binary[index:]
return spaced[::-1]
print decimalToBinary(23)
The result is:
1 0111

Determine which numbers in list add up to specified value

I have a quick (hopefully accounting problem. I just entered a new job and the books are a bit of a mess. The books have these lump sums logged, while the bank account lists each and every individual deposit. I need to determine which deposits belong to each lump sum in the books. So, I have these four lump sums:
[6884.41, 14382.14, 2988.11, 8501.60]
I then have this larger list of individual deposits (sorted):
[98.56, 98.56, 98.56, 129.44, 160.0, 242.19, 286.87, 290.0, 351.01, 665.0, 675.0, 675.0, 677.45, 677.45, 695.0, 695.0, 695.0, 695.0, 715.0, 720.0, 725.0, 730.0, 745.0, 745.0, 750.0, 750.0, 750.0, 750.0, 758.93, 758.93, 763.85, 765.0, 780.0, 781.34, 781.7, 813.79, 824.97, 827.05, 856.28, 874.08, 874.44, 1498.11, 1580.0, 1600.0, 1600.0]
In Python, how can I determine which sub-set of the longer list sums to one of the lump sum values?
(NOTE: these numbers have the additional problem that the sum of the lump sums is $732.70 more than the sum of the individual accounts. I'm hoping that this doesn't make this problem completely unsolvable)
Here's a pretty good start at a solution:
import datetime as dt
from itertools import groupby
from math import ceil
def _unique_subsets_which_sum_to(target, value_counts, max_sums, index):
value, count = value_counts[index]
if index:
# more values to be considered; solve recursively
index -= 1
rem = max_sums[index]
# find the minimum amount that this value must provide,
# and the minimum occurrences that will satisfy that value
if target <= rem:
min_k = 0
else:
min_k = (target - rem + value - 1) // value # rounded up to next int
# find the maximum occurrences of this value
# which result in <= target
max_k = min(count, target // value)
# iterate across min..max occurrences
for k in range(min_k, max_k+1):
new_target = target - k*value
if new_target:
# recurse
for solution in _unique_subsets_which_sum_to(new_target, value_counts, max_sums, index):
yield ((solution + [(value, k)]) if k else solution)
else:
# perfect solution, no need to recurse further
yield [(value, k)]
else:
# this must finish the solution
if target % value == 0:
yield [(value, target // value)]
def find_subsets_which_sum_to(target, values):
"""
Find all unique subsets of values which sum to target
target integer >= 0, total to be summed to
values sequence of integer > 0, possible components of sum
"""
# this function is basically a shell which prepares
# the input values for the recursive solution
# turn sequence into sorted list
values = sorted(values)
value_sum = sum(values)
if value_sum >= target:
# count how many times each value appears
value_counts = [(value, len(list(it))) for value,it in groupby(values)]
# running total to each position
total = 0
max_sums = [0]
for val,num in value_counts:
total += val * num
max_sums.append(total)
start = dt.datetime.utcnow()
for sol in _unique_subsets_which_sum_to(target, value_counts, max_sums, len(value_counts) - 1):
yield sol
end = dt.datetime.utcnow()
elapsed = end - start
seconds = elapsed.days * 86400 + elapsed.seconds + elapsed.microseconds * 0.000001
print(" -> took {:0.1f} seconds.".format(seconds))
# I multiplied each value by 100 so that we can operate on integers
# instead of floating-point; this will eliminate any rounding errors.
values = [
9856, 9856, 9856, 12944, 16000, 24219, 28687, 29000, 35101, 66500,
67500, 67500, 67745, 67745, 69500, 69500, 69500, 69500, 71500, 72000,
72500, 73000, 74500, 74500, 75000, 75000, 75000, 75000, 75893, 75893,
76385, 76500, 78000, 78134, 78170, 81379, 82497, 82705, 85628, 87408,
87444, 149811, 158000, 160000, 160000
]
sum_to = [
298811,
688441,
850160 #,
# 1438214
]
def main():
subset_sums_to = []
for target in sum_to:
print("\nSolutions which sum to {}".format(target))
res = list(find_subsets_which_sum_to(target, values))
print(" {} solutions found".format(len(res)))
subset_sums_to.append(res)
return subset_sums_to
if __name__=="__main__":
subsetsA, subsetsB, subsetsC = main()
which on my machine results in
Solutions which sum to 298811
-> took 0.1 seconds.
2 solutions found
Solutions which sum to 688441
-> took 89.8 seconds.
1727 solutions found
Solutions which sum to 850160
-> took 454.0 seconds.
6578 solutions found
# Solutions which sum to 1438214
# -> took 7225.2 seconds.
# 87215 solutions found
The next step is to cross-compare solution subsets and see which ones can coexist together. I think the fastest approach would be to store subsets for the smallest three lump sums, iterate through them and (for compatible combinations) find the remaining values and plug them into the solver for the last lump sum.
Continuing from where I left off (+ a few changes to the above code to grab the return lists for subsums to the first three values).
I wanted a way to easily get the remaining value-coefficients each time;
class NoNegativesDict(dict):
def __sub__(self, other):
if set(other) - set(self):
raise ValueError
else:
res = NoNegativesDict()
for key,sv in self.iteritems():
ov = other.get(key, 0)
if sv < ov:
raise ValueError
# elif sv == ov:
# pass
elif sv > ov:
res[key] = sv - ov
return res
then I apply it as
value_counts = [(value, len(list(it))) for value,it in groupby(values)]
vc = NoNegativesDict(value_counts)
nna = [NoNegativesDict(a) for a in subsetsA]
nnb = [NoNegativesDict(b) for b in subsetsB]
nnc = [NoNegativesDict(c) for c in subsetsC]
# this is kind of ugly; with some more effort
# I could probably make it a recursive call also
b_tries = 0
c_tries = 0
sol_count = 0
start = dt.datetime.utcnow()
for a in nna:
try:
res_a = vc - a
sa = str(a)
for b in nnb:
try:
res_b = res_a - b
b_tries += 1
sb = str(b)
for c in nnc:
try:
res_c = res_b - c
c_tries += 1
#unpack remaining values
res_values = [val for val,num in res_c.items() for i in range(num)]
for sol in find_subsets_which_sum_to(1438214, res_values):
sol_count += 1
print("\n================")
print("a =", sa)
print("b =", sb)
print("c =", str(c))
print("d =", str(sol))
except ValueError:
pass
except ValueError:
pass
except ValueError:
pass
print("{} solutions found in {} b-tries and {} c-tries".format(sol_count, b_tries, c_tries))
end = dt.datetime.utcnow()
elapsed = end - start
seconds = elapsed.days * 86400 + elapsed.seconds + elapsed.microseconds * 0.000001
print(" -> took {:0.1f} seconds.".format(seconds))
and the final output:
0 solutions found in 1678 b-tries and 93098 c-tries
-> took 73.0 seconds.
So the final answer is there is no solution for your given data.
Hope that helps ;-)

Python interval interesction

My problem is as follows:
having file with list of intervals:
1 5
2 8
9 12
20 30
And a range of
0 200
I would like to do such an intersection that will report the positions [start end] between my intervals inside the given range.
For example:
8 9
12 20
30 200
Beside any ideas how to bite this, would be also nice to read some thoughts on optimization, since as always the input files are going to be huge.
this solution works as long the intervals are ordered by the start point and does not require to create a list as big as the total range.
code
with open("0.txt") as f:
t=[x.rstrip("\n").split("\t") for x in f.readlines()]
intervals=[(int(x[0]),int(x[1])) for x in t]
def find_ints(intervals, mn, mx):
next_start = mn
for x in intervals:
if next_start < x[0]:
yield next_start,x[0]
next_start = x[1]
elif next_start < x[1]:
next_start = x[1]
if next_start < mx:
yield next_start, mx
print list(find_ints(intervals, 0, 200))
output:
(in the case of the example you gave)
[(0, 1), (8, 9), (12, 20), (30, 200)]
Rough algorithm:
create an array of booleans, all set to false seen = [False]*200
Iterate over the input file, for each line start end set seen[start] .. seen[end] to be True
Once done, then you can trivially walk the array to find the unused intervals.
In terms of optimisations, if the list of input ranges is sorted on start number, then you can track the highest seen number and use that to filter ranges as they are processed -
e.g. something like
for (start,end) in input:
if end<=lowest_unseen:
next
if start<lowest_unseen:
start=lowest_unseen
...
which (ignoring the cost of the original sort) should make the whole thing O(n) - you go through the array once to tag seen/unseen and once to output unseens.
Seems I'm feeling nice. Here is the (unoptimised) code, assuming your input file is called input
seen = [False]*200
file = open('input','r')
rows = file.readlines()
for row in rows:
(start,end) = row.split(' ')
print "%s %s" % (start,end)
for x in range( int(start)-1, int(end)-1 ):
seen[x] = True
print seen[0:10]
in_unseen_block=False
start=1
for x in range(1,200):
val=seen[x-1]
if val and not in_unseen_block:
continue
if not val and in_unseen_block:
continue
# Must be at a change point.
if val:
# we have reached the end of the block
print "%s %s" % (start,x)
in_unseen_block = False
else:
# start of new block
start = x
in_unseen_block = True
# Handle end block
if in_unseen_block:
print "%s %s" % (start, 200)
I'm leaving the optimizations as an exercise for the reader.
If you make a note every time that one of your input intervals either opens or closes, you can do what you want by putting together the keys of opens and closes, sort into an ordered set, and you'll be able to essentially think, "okay, let's say that each adjacent pair of numbers forms an interval. Then I can focus all of my logic on these intervals as discrete chunks."
myRange = range(201)
intervals = [(1,5), (2,8), (9,12), (20,30)]
opens = {}
closes = {}
def open(index):
if index not in opens:
opens[index] = 0
opens[index] += 1
def close(index):
if index not in closes:
closes[index] = 0
closes[index] += 1
for start, end in intervals:
if end > start: # Making sure to exclude empty intervals, which can be problematic later
open(start)
close(end)
# Sort all the interval-endpoints that we really need to look at
oset = {0:None, 200:None}
for k in opens.keys():
oset[k] = None
for k in closes.keys():
oset[k] = None
relevant_indices = sorted(oset.keys())
# Find the clear ranges
state = 0
results = []
for i in range(len(relevant_indices) - 1):
start = relevant_indices[i]
end = relevant_indices[i+1]
start_state = state
if start in opens:
start_state += opens[start]
if start in closes:
start_state -= closes[start]
end_state = start_state
if end in opens:
end_state += opens[end]
if end in closes:
end_state -= closes[end]
state = end_state
if start_state == 0:
result_start = start
result_end = end
results.append((result_start, result_end))
for start, end in results:
print(str(start) + " " + str(end))
This outputs:
0 1
8 9
12 20
30 200
The intervals don't need to be sorted.
This question seems to be a duplicate of Merging intervals in Python.
If I understood well the problem, you have a list of intervals (1 5; 2 8; 9 12; 20 30) and a range (0 200), and you want to get the positions outside your intervals, but inside given range. Right?
There's a Python library that can help you on that: python-intervals (also available from PyPI using pip). Disclaimer: I'm the maintainer of that library.
Assuming you import this library as follows:
import intervals as I
It's quite easy to get your answer. Basically, you first want to create a disjunction of intervals based on the ones you provide:
inters = I.closed(1, 5) | I.closed(2, 8) | I.closed(9, 12) | I.closed(20, 30)
Then you compute the complement of these intervals, to get everything that is "outside":
compl = ~inters
Then you create the union with [0, 200], as you want to restrict the points to that interval:
print(compl & I.closed(0, 200))
This results in:
[0,1) | (8,9) | (12,20) | (30,200]

Categories