Optimal group selection from a dictionary Python - python

So I have a dictionary with key:value(tuple). Something like this. {"name":(4,5),....} where
(4,5) represents two categories (cat1, cat2). Given a maximum number for the second category, I would like to find the optimal combination of dictionary entries such that the 1st category is maximized or minimized.
For example, if maxCat2 = 15, I want to find some combination of entries from the dictionary such that, when I add the cat2 values from each entry together, I'm under 15. There may be many such conditions. Of these possibilities, I would like to pick the one that when I add up the values for cat1 for each entry it is larger than any of the other possibilities.
I thought about writing an algorithm to get all permutations of the entries in the dictionary and then see if each one meets the maxCat2 criteria and then see which one of those gives me the largest total cat1 value. If I have 20 entries, that means I would check 20! combinations, which is a very large number. Is there anything that I can do to avoid this? Thanks.

As Jochen Ritzel pointed out, this can be seen as an instance of the knapsack problem.
Typically, you have a set of objects that have both "weight" (the "second category", in your example) and "value" (or "cost", if it is a minimization problem).
The problem consists in picking a subset of the objects such that the sum of their "values" is maximized/minimized, subject to the constraint the sum of the weights cannot exceed a specified maximum.
Though the problem is intractable in general, if the constraint on the maximum value for the sum of weights is fixed, there exists a polynomial time solution using dynamic programming or memoization.
Very broadly, the idea is to define a set of values where
Cij denotes the maximum sum (of "values") attainable considering only the first i objects where the total weight (of the chosen subset) cannot exceed j.
There are two possible choices here to calculate Cij .
either element i is included in the subset and then
Cij = valuei + Ci-1,j-weighti
or element i is not in the subset of chosen objects, so
Cij = Ci-1,j
The maximum of the two needs to be picked.
If n is the number of elements and w is the maximum total weight, then the answer ends up in Cnw.

Related

Selecting items with highest value in a list with the right order

I am solving a knapsack problem by using branch and bound algorithm I am working on right now. In the algorithm, I wanted to start selecting the items with the highest density(value/weight). I created a list named "density" and made necessary calculations. I need to pick the maximum value each time from that list. But everytime I try, the order get mixed. I need to update the variable "a" because everytime I delete an item the list gets one smaller. But couldn't figure out how to update it. I need help on selecting the items in the right order.
weight, value, density are lists. capacity and room are integer values given in the problem.
This is the density list.
What I want is, to get the index of the maximum item in this list. Then, subtract the "weight" of it from the "capacity" in order to find how much "room" left. And add the "value" to the "highest" in order the reach the highest value could be added in the knapsack. After I did this for the first item, then iterate it until no or very little room left.
def branch_n_bound(value,weight,capacity):
global highest,size
size=0
room=capacity
density = [0] * len(items)
highest = 0
for i in range(n):
density[i] = val[i] / weight[i]
for i in range(n):
a=density.index(max(density))
if weight[a]<=room:
room-=weight[a]
highest+=value[a]
size+=weight[a]
taken[a]=1
del density[a], weight[a], value[a]
else:
break
I think the problem you try to solve can be solved easier with a change in data structure. Instead of building the density array, you can build an array of tuples [(density, weight, value)...] and base your solution over that array. If you don't want to use so much extra memory and assuming you are ok with changing the input data, you can mark your indices as deleted - for example, you can set the value, weight and density to something negative to know that data was deleted from that index.
You can also take a look at the heapq data structure: https://docs.python.org/3/library/heapq.html . You can work with a heap to extract the maximum, and store indices in that heap.

How should I group these elements such that overall variance is minimized?

I have a set of elements, which is for example
x= [250,255,273,180,400,309,257,368,349,248,401,178,149,189,46,277,293,149,298,223]
I want to group these into n number of groups A,B,C... such that sum of all group variances is minimized. Each group need not have same number of elements.
I would like a optimization approach in python or R.
I would sort the numbers into increasing order and then use dynamic programming to work out where to place the boundaries between groups of contiguous elements. For example, if the only constraint is that every number must be in exactly one group, work from left to right. At each stage, for i=1..n work out the set of boundaries that produces minimum variance computed among the elements seen so far for i groups. For i=1 there is no choice. For i>1 consider every possible location for the boundary of the last group, and look up the previously computed answer for the best allocation of items before this boundary into i-1 groups, and use the figure previously computed here to work out the contribution of the variance of the previous i-1 groups.
(I haven't done the algebra, but I believe that if you have groups A and B where mean(A) < mean(B) but there are elements a in A and b in B such that a > b, you can reduce the variance by swapping these between groups. So the lower variance must come from groups that are contiguous when the elements are written out in sorted order).

Distribute points to objects by a given function

I have an idea but I'm stuck with implemention for something basic.
I'm trying to make a points division (Limited number of points) to a certain amount of objects. What does that mean-
If we assume I have 100 points divided into 5 objects, let's say we'll list the objects in the list:
[1,2,3,4,5]
The first place in list will have the highest number of points. Then the second highest place followed and so on ..
I want a function that divides the following points in descending order according to a given function (eg linear, exponential, constant, etc.)
I hope I explained well .. I did my best :)
Does anyone know a package in Python or a nice way to implement such a thing?
Let's say you have a list of the objects to which you want to give points. Then you can do:
totpoints=100
score=[] #this list holds the score based on position
totscore=0 #this will be the sum of all the scores
for i in range(len(lst)): #lst is the list
if mode=="linear":
score[i]=i
elif mode=="quadratic":
score[i]=i*i
elif mode=="exponential":
score[i]=exp(i)
else: #constant
score[i]=1
for i in score:
totscore+=i
for i in range(len(lst)):
lst[i].send(round(score[i]*totpoints/totscore)
#assuming you send the values by some method of the objects
This actually gives the most points to the ones with an higher index, so you would first reverse the score list to get higher-first.
Obviously the best way to use this is inside a function that you'll then be able to call with different modes, totpoints and lsts.
GOTCHA: this may give out a larger or smaller number of points than you actually wanted, depending on the rounding of the values. If you need to be precise add a check for the total number of points you send.
I was almost forgetting: if you need the points as a list, you can do
points=[round(s*totpoints/totscore) for s in score]

Finding the closest possible values from two dictionaries

Let's suppose you have two existing dictionaries A and B
If you already choose an initial two items from dictionaries A and B with values A1 = 1.0 and B1 = 2.0, respectively, is there any way to find any two different existing items in the dictionaries A and B that each have different values (i.e. A2 and B2) from A1 and B1, and would also minimize the value (A2-A1)**2 + (B2-B1)**2?
The number of items in the dictionary is unfixed and could exceed 100,000.
Edit - This is important: the keys for A and B are the same, but the values corresponding to those keys in A and B are different. A particular choice of key will yield an ordered pair (A1,B1) that is different from any other possible order pair (A2,B2)—different keys have different order pairs. For example, both A and B will have the key 3,4 and this will yield a value of 1.0 for dict A and 2.0 for B. This one key will then be compared to every other key possible to find the other ordered pair (i.e. both the key and values of the items in A and B) that minimizes the squared differences between them.
You'll need a specialized data structure, not a standard Python dictionary. Look up quad-tree or kd-tree. You are effectively minimizing the Euclidean distance between two points (your objective function is just a square root away from Euclidean distance, and your dictionary A is storing x-coordinates, B y-coordinates.). Computational-geometry people have been studying this for years.
Well, maybe I am misreading your question and making it harder than it is. Are you saying that you can pick any value from A and any value from B, regardless of whether their keys are the same? For instance, the pick from A could be K:V (3,4):2.0, and the pick from B could be (5,6):3.0? Or does it have to be (3,4):2.0 from A and (3,4):6.0 from B? If the former, the problem is easy: just run through the values from A and find the closest to A1; then run through the values from B and find the closest to B1. If the latter, my first paragraph was the right answer.
Your comment says that the harder problem is the one you want to solve, so here is a little more. Sedgewick's slides explain how the static grid, the 2d-tree, and the quad-tree work. http://algs4.cs.princeton.edu/lectures/99GeometricSearch.pdf . Slides 15 through 29 explain mainly the 2d-tree, with 27 through 29 covering the solution to the nearest-neighbor problem. Since you have the constraint that the point the algorithm finds must share neither x- nor y-coordinate with the query point, you might have to implement the algorithm yourself or modify an existing implementation. One alternative strategy is to use a kNN data structure (k nearest neighbors, as opposed to the single nearest neighbor), experiment with k, and hope that your chosen k will always be large enough to find at least one neighbor that meets your constraint.

Keeping Track of Dynamic Programming Steps

I'm teaching myself basic programming principles, and I'm stuck on a dynamic programming problem. Let's take the infamous Knapsack Problem:
Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
Let's set the weight limit to 10, and let's give two lists: weights = [2,4,7] and values = [8,4,9] (I just made these up). I can write the code to give the maximum value given the constraint--that's not a problem. But what about if I want to know what values I actually ended up using? Not the total value--the individual values. So for this example, the maximum would be the objects with weights 2 and 7, for a total value of 8 + 9 = 17. I don't want my answer to read "17" though--I want an output of a list like: (8, 9). It might be easy for this problem, but the problem I'm working on uses lists that are much bigger and that have repeat numbers (for example, multiple objects have a value of 8).
Let me know if anyone can think of anything. As always, much love and appreciation to the community.
Consider each partial solution a Node. Simply add whatever you use into each of these nodes and whichever node becomes the answer at the end will contain the set of items you used.
So each time you find a new solution you just set the list of items to the list of items of the new optimal solution and repeat for each.
A basic array implementation can help you keep track of what items enabled a new DP state to get it's value. For example, if your DP array is w[] then you can have another array p[]. Every time a state is generated for w[i], you set p[i] to the item you used to get to 'w[i]'. Then to output the list of items used for w[n], output p[n], and then move to the index n-weightOf(p[n]) until you reach 0 to output all the items.

Categories