I am trying to solve a recursion exercise and get really confused.
the question is as follows:
let's say i have an apartment which is n squared meters,
i = [1,2,3..., n] are units of square meters, and [p1, p2, p3, ..., pn] are the corresponding prices per square meter (for ex. p1 is the price for 1 squared meter, and pn is the price for n squared meters).
I want to find the optimal way to divide my apartment which will give me the "maximal income".
Example - if i have 4 squared meter apartment, and the price list for the sizes 1,2,3,4 are correspondingly [1,5,8,9], then these are the set of options:
leave the apartment as one 4 squared meter unit (value: 9)
split the 4 square meters into 1,1,1,1 square meters (total value: 4)
split the 4 square meters into 1,1,2 square meters (total value: 7)
split the 4 square meters into 2,2 square meters (total value: 10)
split the 4 square meters into 1,3 square meters (total value: 9)
therefore my function "profit" should return the number 10 for the input:
profit([1,5,8,9], 4)
i have been asked to solve this using the following pattern where recursion calls must be only inside the loop:
def profit(value, size):
...
for i in range(size):
...
return ...
i managed to solve this without the loop condition after a very long time, but it really frustrates me how hard and un-intuitive recursion function are.
i would really appreciate general guidance tips for these kind of question, or even if you can refer me to other sources which might help me learn this topic better. it's too hard for me to follow sometimes.
and of course, would appreciate your help with this specific function...
Solved it using the following function:
def profit(value,size):
if size <= 0:
return 0
lst1 = []
for i in range(size):
lst1.append(profit(value, size-(i+1))+value[i])
return max(lst1)
You can create a function that finds the possible combinations of the range of the square size. Then, for each combination that sums to four, the maximum floor size can be found:
def profit(value, size):
def combinations(d, _size, current = []):
if sum(current) == _size:
yield current
else:
for i in d:
if sum(current+[i]) <= _size:
yield from combinations(d, _size, current+[i])
options = list(combinations(range(1, size+1), size))
prices = dict(zip(range(1, size+1), value))
result = max(options, key=lambda x:sum(prices[i] for i in x))
return sum(prices[i] for i in result)
print(profit([1,5,8,9], 4))
Output:
10
I don't want to give you the full answer, since this seems like it could be an assignment. However, I will try to push you in the right direction as to why recursion is optimal here. I've added one line of code to your example that I think will help you out. Before adding it to your code, I suggest you try to completely wrap your head around what is happening here.
def profit(value, size):
for i in range(size):
# Get the profit of the size before this combined with a 1
profit(value, size - 1) + profit(value, 1)
If you're having trouble understanding why this is useful, feel free to leave a comment and I can give you a more in depth explanation later.
EDIT:
A key concept to keep in mind when implementing a recursive function are your base cases.
In this example, you already know what the values are for each size, so incorporate that into your solution.
def profit(value, size):
# BASE CASE
if size == 1 :
return value[0]
# Size > 1
for i in range(size):
# Return the maximum value of all given combinations.
return max(value[size], profit(value, size - 1) + profit(value, 1))
This is an almost complete solution now, there's just one piece missing.
Hint: This code currently fails to test profit(value, 2) + profit(value, 2) (which happens to be the maximum profit in this scenario)
Related
Note: This is a revised version of a post I made some days ago that was closed due to incompleteness. I have now done my best to optimise it, and it should now be a minimal reproducible example.
The question:
"It is easily proved that no equilateral triangle exists with integral length sides and integral area. However, the almost equilateral triangle 5-5-6 has an area of 12 square units.
We shall define an almost equilateral triangle to be a triangle for which two sides are equal and the third differs by no more than one unit.
Find the sum of the perimeters of all almost equilateral triangles with integral side lengths and area and whose perimeters do not exceed one billion (1,000,000,000)."
The answer to the problem is 518408346.
My result is much larger than this number. How come? After looking through the comments on the previous post prior to its suspension, I believe that it is due to a floating-point error.
I assume that my code generates numbers that are border-line integers which Python falsely takes for integers. That would explain why my result is much larger than the correct. I have observed that python does this when the number of leading zeros after the decimal point exceed 15 (e.g., 3.0000000000000005 is taken as 3.0000000000000005 whereas 3.(>15x 0) is taken as 3.0. If there was a way to change this setting, my method could work. Do you agree? I have thought that the module, decimal, could prove useful here, but I am not sure how to utilize it for this purpose.
This is my code:
sum_of_p=0
for i in range(2,333333334):
if i%(5*10**6)==0:
print(i)
h=(i**2-((i+1)*0.5)**2)**0.5
if int(h)==h:
a=0.5*(i+1)*h
if int(a)==a:
sum_of_p+=3*i+1
h=(i**2-((i-1)*0.5)**2)**0.5
if int(h)==h:
a=0.5*(i-1)*h
if int(a)==a:
sum_of_p+=3*i-1
print(sum_of_p)
I assume that using floats is not a good idea for integer values problem. Here is solution that I have found. If your version or python is below 3.8, then you will have to use more slow is_square_ function
import math
def is_square_(apositiveint):
# Taken from:
# https://stackoverflow.com/questions/2489435/check-if-a-number-is-a-perfect-square
x = apositiveint // 2
seen = set([x])
while x * x != apositiveint:
x = (x + (apositiveint // x)) // 2
if x in seen: return False
seen.add(x)
return True
def is_square(i: int) -> bool:
return i == math.isqrt(i) ** 2
def check(a, b, c):
""" return preimeter if area of triangle with sides of lengts a,b,c is integer """
perimeter = a + b + c
if perimeter % 2 == 1:
# preimeter should be even
return 0
p = perimeter // 2
# Use Heron's formula
H = p*(p-a)*(p-b)*(p-c)
if is_square(H):
return perimeter
return 0
sum_of_p = 0
max_i = 1000000000 // 3
for i in range(2, max_i + 1):
if i % (10**5) == 0:
print(i*100 / max_i )
sum_of_p += check(i, i, i+1)
sum_of_p += check(i, i, i-1)
print(sum_of_p)
I would like to calculate the Teager Energy Kurtosis in a function in Python 3.8. I think this should also work with list comprehension.
I tried it with the following code, but I get an error message that the numpy object is not iterable. The variable data contains a list with measured values from an accelerometer.
def EO(data):
numerator = pow(len(data),2)*sum((pow(((pow(data[i+1],2) - pow(data[i],2))-(sum(pow(data[i+1],2) - pow(data[i],2))/len(data))),4)) for i in range(len(data)-1))
denominator = pow(sum(pow(((pow(data[i+1],2) - pow(data[i],2))-(sum(pow(data[i+1],2) - pow(data[i],2))/len(data))),2) for i in range(len(data)-1)),2)
energy_operator = numerator/denominator
return energy_operator
What is the general approach for implementing such formulas where you have to iterate multiple times, also of course with regard to efficiency. The dataset from which the values are to be calculated contains 133329 entries.
I guess the main problem is that the sum of the denominator contains another sum which has to be formed first. How to do that ?. Without list comprehension I would iterate through the whole dataset twice with a for loop to first get the average value and with that calculate the rest in the second iteration. The readability of this is then of course gone.
Any suggestions are welcome !
Cheers,
Gerrit
EDIT:
This is the working code without using list comprehension:
def EO_5(data):
summe = 0
num_sum = 0
den_sum = 0
for i in range(1,len(data)-1):
summe += pow(data[i],2)-((data[i-1])*(data[i+1]))
ave = summe/len(data)
for i in range(1,len(data)-1):
num_sum += pow((pow(data[i],2)-((data[i-1])*(data[i+1])))-ave,4)
den_sum += pow((pow(data[i],2)-((data[i-1])*(data[i+1])))-ave,2)
numerator = (len(data)-1)*num_sum
denominator = pow(den_sum,2)
return numerator/denominator
sum(pow(data[i+1],2) - pow(data[i],2))
I think that's the (a) problem. The argument to sum is basically just a number, when it should be something list-like (iterable).
The other problem is that this is badly over-golfed. Lines that run on that long are frowned on, etc.
The other other problem is that the math expressed in the two code-blocks you've shared don't seem to match. The first, which isn't working, seems to more closely follow what's in the image you linked, but IDK if that means it's "correct". Do you have a better reference for "Teager Energy Kurtosis".
I haven't tested this in any way, but it's pretty much how I'd simplify the code you said is working.
def EO_5(data):
n = len(data) - 1)
deltas = tuple(
pow(x, 2) - (before * after)
for (before, x, after)
in zip(data[:-2], data[1:-1], data[2:])
)
ave = sum(deltas) / len(data)
num_sum = sum(pow(d - ave, 4) for d in deltas)
den_sum = sum(pow(d - ave, 2) for d in deltas)
numerator = n * num_sum
denominator = pow(den_sum, 2)
return numerator / denominator
If you're having problems with performance, you may be able to get numpy to leverage vector operations to make this even more streamlined, but I have limited experience with that.
Here it is.
def EO_5(data):
ave = (sum([i**2 for i in data[1:-1]])-sum([i*j for i,j in zip(data[:-2],data[2:])]))/len(data)
num = (sum([(j**2-i*k-ave)**4 for i,j,k in zip(data[:-2],data[1:-1],data[2:])]))*(len(data)-1)
den = (sum([(j**2-i*k-ave)**2 for i,j,k in zip(data[:-2],data[1:-1],data[2:])]))**2
return num/den
I wrote an algorithm to solve 0-1 knapsack problem which works perfect which is as follows:
def zero_one_knapsack_problem(weight: list, items: list, values: list, total_capacity: int) -> list:
"""
A function that implement dynamic programming to solve the zero one knapsack problem. It has exponential
time complexity as supposed.
:param weight: the weight list each element correspond to item at same index
:param items: the array of items ordered same as weight list and values list
:param values: the values list
:param total_capacity: the total capcaity of knapsack
:return: How to fill the knapsack
"""
items_length = len(items)+1
total_capacity += 1
# Create The table
table = [[0 for w in range(total_capacity)] for y in range(items_length)]
for i in range(1, items_length):
for j in range(total_capacity):
if weight[i-1] > j: # Item does not fit
pass
else:
# calculate Take It or Not
table[i][j] = max(values[i-1]+table[i-1][j-weight[i-1]], table[i-2][j])
print("The optimal value to carry is: ${}".format(table[items_length-1][total_capacity-1]))
From the analysis the time complexity is seta(items_length * total_capacity) which is the summation of the 2 loops together(ignoring constants). Then i read online that this method has exponential time complexity(Not from one source many blogs says exponential also). Which I can't see how it comes for example consider any of below examples:
1-) 10 * 100000000000 = 1×10¹²
2-) 11 * 100000000000 = 1.1×10¹²
3-) 12 * 100000000000 = 1.2×10¹²
# difference between each
2 and 3 = 100000000000 = 1.2*10^12 - 1.1*10^12
1 and 2 = 100000000000 = 1.1*10^12 - 1*10^12
as you can see increasing input by 1 didn't cause any exponential growth. So how can they say this algorithm is exponential in a mathematical way.
With a problem size of N bits, you can have, for example, sqrt(N) objects with weights about sqrt(N) bits long, and total_capacity about sqrt(N) bits long.
That makes total_capacity about sqrt(2)N, and your solution takes O(sqrt(N)*sqrt(2)N) time, which is certainly exponential.
I have an N-body simulation that generates a list of particle positions, for multiple timesteps in the simulation. For a given frame, I want to generate a list of the pairs of particles' indices (i, j) such that dist(p[i], p[j]) < masking_radius. Essentially I'm creating a list of "interaction" pairs, where the pairs are within a certain distance of each other. My current implementation looks something like this:
interaction_pairs = []
# going through each unique pair (order doesn't matter)
for i in range(num_particles):
for j in range(i + 1, num_particles):
if dist(p[i], p[j]) < masking_radius:
interaction_pairs.append((i,j))
Because of the large number of particles, this process takes a long time (>1 hr per test), and it is severely limiting to what I need to do with the data. I was wondering if there was any more efficient way to structure the data such that calculating these pairs would be more efficient instead of comparing every possible combination of particles. I was looking into KDTrees, but I couldn't figure out a way to utilize them to compute this more efficiently. Any help is appreciated, thank you!
Since you are using python, sklearn has multiple implementations for nearest neighbours finding:
http://scikit-learn.org/stable/modules/neighbors.html
There is KDTree and Balltree provided.
As for KDTree the main point is to push all the particles you have into KDTree, and then for each particle ask query: "give me all particles in range X". KDtree usually do this faster than bruteforce search.
You can read more for example here: https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/kdtrees.pdf
If you are using 2D or 3D space, then other option is to just cut the space into big grid (which cell size of masking radius) and assign each particle into one grid cell. Then you can find possible candidates for interaction just by checking neighboring cells (but you also have to do a distance check, but for much fewer particle pairs).
Here's a fairly simple technique using plain Python that can reduce the number of comparisons required.
We first sort the points along either the X, Y, or Z axis (selected by axis in the code below). Let's say we choose the X axis. Then we loop over point pairs like your code does, but when we find a pair whose distance is greater than the masking_radius we test whether the difference in their X coordinates is also greater than the masking_radius. If it is, then we can bail out of the inner j loop because all points with a greater j have a greater X coordinate.
My dist2 function calculates the squared distance. This is faster than calculating the actual distance because computing the square root is relatively slow.
I've also included code that behaves similar to your code, i.e., it tests every pair of points, for speed comparison purposes; it also serves to check that the fast code is correct. ;)
from random import seed, uniform
from operator import itemgetter
seed(42)
# Make some fake data
def make_point(hi=10.0):
return [uniform(-hi, hi) for _ in range(3)]
psize = 1000
points = [make_point() for _ in range(psize)]
masking_radius = 4.0
masking_radius2 = masking_radius ** 2
def dist2(p, q):
return (p[0] - q[0])**2 + (p[1] - q[1])**2 + (p[2] - q[2])**2
pair_count = 0
test_count = 0
do_fast = 1
if do_fast:
# Sort the points on one axis
axis = 0
points.sort(key=itemgetter(axis))
# Fast
for i, p in enumerate(points):
left, right = i - 1, i + 1
for j in range(i + 1, psize):
test_count += 1
q = points[j]
if dist2(p, q) < masking_radius2:
#interaction_pairs.append((i, j))
pair_count += 1
elif q[axis] - p[axis] >= masking_radius:
break
if i % 100 == 0:
print('\r {:3} '.format(i), flush=True, end='')
total_pairs = psize * (psize - 1) // 2
print('\r {} / {} tests'.format(test_count, total_pairs))
else:
# Slow
for i, p in enumerate(points):
for j in range(i+1, psize):
q = points[j]
if dist2(p, q) < masking_radius2:
#interaction_pairs.append((i, j))
pair_count += 1
if i % 100 == 0:
print('\r {:3} '.format(i), flush=True, end='')
print('\n', pair_count, 'pairs')
output with do_fast = 1
181937 / 499500 tests
13295 pairs
output with do_fast = 0
13295 pairs
Of course, if most of the point pairs are within masking_radius of each other, there won't be much benefit in using this technique. And sorting the points adds a little bit of time, but Python's TimSort is rather efficient, especially if the data is already partially sorted, so if the masking_radius is sufficiently small you should see a noticeable improvement in the speed.
My target was simple, using genetic algorithm to reproduce the classical "Hello, World" string.
My code was based on this post. The code mainly contain 4 parts:
Generate the population which has serval different individual
Define the fitness and grade function which evaluate the individual good or bad based on the comparing with target.
Filter the population and leave len(pop)*retain individuals
Add some other individuals and mutate randomly
The parents's DNA will pass over to its children to comprise the whole population.
I modified the code and shows like this:
import numpy as np
import string
from operator import add
from random import random, randint
def population(GENSIZE,target):
p = []
for i in range(0,GENSIZE):
individual = [np.random.choice(list(string.printable[:-5])) for j in range(0,len(target))]
p.append(individual)
return p
def fitness(source, target):
fitval = 0
for i in range(0,len(source)-1):
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
def grade(pop, target):
'Find average fitness for a population.'
summed = reduce(add, (fitness(x, target) for x in pop))
return summed / (len(pop) * 1.0)
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in p]
graded = [ x[1] for x in sorted(graded)]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random_select > random():
parents.append(individual)
# mutate some individuals
for individual in parents:
if mutate > random():
pos_to_mutate = randint(0, len(individual)-1)
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
#
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
GENSIZE = 40
target = "Hello, World"
p = population(GENSIZE,target)
fitness_history = [grade(p, target),]
for i in xrange(20):
p = evolve(p, target)
fitness_history.append(grade(p, target))
# print p
for datum in fitness_history:
print datum
But it seems that the result can't fit targetwell.
I tried to change the GENESIZE and loop time(more generation).
But the result always get stuck. Sometimes, enhance the loop time can help to find a optimum solution. But when I change the loop time to an much larger number like for i in xrange(10000). The result shows the error like:
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
ValueError: chr() arg not in range(256)
Anyway, how to modify my code and get an good result.
Any advice would be appreciate.
The chr function in Python2 only accepts values in the range 0 <= i < 256.
You are passing:
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
So you need to check that the result of
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
is not going to be outside that range, and take corrective action before passing to chr if it is outside that range.
EDIT
A reasonable fix for the ValueError might be to take the amended value modulo 256 before passing to chr:
chr((ord(individual[pos_to_mutate]) + np.random.randint(-1, 1)) % 256)
There is another bug: the fitness calculation doesn't take the final element of the candidate list into account: it should be:
def fitness(source, target):
fitval = 0
for i in range(0,len(source)): # <- len(source), not len(source) -1
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
Given that source and target must be of equal length, the function can be written as:
def fitness(source, target):
return sum((ord(t) - ord(s)) ** 2 for (t, s) in zip(target, source))
The real question was, why doesn't the code provided evolve random strings until the target string is reached.
The answer, I believe, is it may, but will take a lot of iterations to do so.
Consider, in the blog post referenced in the question, each iteration generates a child which replaces the least fit member of the gene pool if the child is fitter. The selection of the child's parent is biased towards fitter parents, increasing the likelihood that the child will enter the gene pool and increase the overall "fitness" of the pool. Consequently the members of the gene pool converge on the desired result within a few thousand iterations.
In the code in the question, the probability of mutation is much lower, based on the initial conditions, that is the defaults for the evolve function.
Parents that are retained have only a 1% chance of mutating, and one third of the time the "mutation" will not result in a change (zero is a possible result of random.randint(-1, 1)).
Discard parents are replaced by individuals created by merging two retained individuals. Since only 20% of parents are retained, the population can converge on a local minimum where each new child is effectively a copy of an existing parent, and so no diversity is introduced.
So apart from fixing the two bugs, the way to converge more quickly on the target is to experiment with the initial conditions and to consider changing the code in the question to inject more diversity, for example by mutating children as in the original blog post, or by extending the range of possible mutations.