I have an idea but I'm stuck with implemention for something basic.
I'm trying to make a points division (Limited number of points) to a certain amount of objects. What does that mean-
If we assume I have 100 points divided into 5 objects, let's say we'll list the objects in the list:
[1,2,3,4,5]
The first place in list will have the highest number of points. Then the second highest place followed and so on ..
I want a function that divides the following points in descending order according to a given function (eg linear, exponential, constant, etc.)
I hope I explained well .. I did my best :)
Does anyone know a package in Python or a nice way to implement such a thing?
Let's say you have a list of the objects to which you want to give points. Then you can do:
totpoints=100
score=[] #this list holds the score based on position
totscore=0 #this will be the sum of all the scores
for i in range(len(lst)): #lst is the list
if mode=="linear":
score[i]=i
elif mode=="quadratic":
score[i]=i*i
elif mode=="exponential":
score[i]=exp(i)
else: #constant
score[i]=1
for i in score:
totscore+=i
for i in range(len(lst)):
lst[i].send(round(score[i]*totpoints/totscore)
#assuming you send the values by some method of the objects
This actually gives the most points to the ones with an higher index, so you would first reverse the score list to get higher-first.
Obviously the best way to use this is inside a function that you'll then be able to call with different modes, totpoints and lsts.
GOTCHA: this may give out a larger or smaller number of points than you actually wanted, depending on the rounding of the values. If you need to be precise add a check for the total number of points you send.
I was almost forgetting: if you need the points as a list, you can do
points=[round(s*totpoints/totscore) for s in score]
Related
I am solving a knapsack problem by using branch and bound algorithm I am working on right now. In the algorithm, I wanted to start selecting the items with the highest density(value/weight). I created a list named "density" and made necessary calculations. I need to pick the maximum value each time from that list. But everytime I try, the order get mixed. I need to update the variable "a" because everytime I delete an item the list gets one smaller. But couldn't figure out how to update it. I need help on selecting the items in the right order.
weight, value, density are lists. capacity and room are integer values given in the problem.
This is the density list.
What I want is, to get the index of the maximum item in this list. Then, subtract the "weight" of it from the "capacity" in order to find how much "room" left. And add the "value" to the "highest" in order the reach the highest value could be added in the knapsack. After I did this for the first item, then iterate it until no or very little room left.
def branch_n_bound(value,weight,capacity):
global highest,size
size=0
room=capacity
density = [0] * len(items)
highest = 0
for i in range(n):
density[i] = val[i] / weight[i]
for i in range(n):
a=density.index(max(density))
if weight[a]<=room:
room-=weight[a]
highest+=value[a]
size+=weight[a]
taken[a]=1
del density[a], weight[a], value[a]
else:
break
I think the problem you try to solve can be solved easier with a change in data structure. Instead of building the density array, you can build an array of tuples [(density, weight, value)...] and base your solution over that array. If you don't want to use so much extra memory and assuming you are ok with changing the input data, you can mark your indices as deleted - for example, you can set the value, weight and density to something negative to know that data was deleted from that index.
You can also take a look at the heapq data structure: https://docs.python.org/3/library/heapq.html . You can work with a heap to extract the maximum, and store indices in that heap.
How do I write a recursive code that tells me the maximum number of coins I can collect from a grid in which each cell may or may not contain a coin only moving down and right? I also have to use memoization.
ex: [[0,0,1],
[0,1,1],
[1,0,0]]
max_coins moving only down and right = 2
First, you'll need the recurrence relation behind this problem: if the maximum number of coins at cell [i][j] is denoted by C[i][j], then
C[i][j] = max(C[i - 1][j], C[i][j - 1]) + No. of coins on cell[i][j]
If you code using this recurrence, there will be many overlaps of the same calls with the same parameters for different cells, and its complexity would be exponential. To avoid this, you can store the results of the intermediate calls in an array and use them when they're needed again. This way, you'll need to calculate the value for a cell only once, and the code would be much faster.
So, first create a 2D array that would contain the maximum number of coins you can have at any cell, then populate it with the appropriate values using the recurrence relation. Go from top row to bottom, left to right.
I'm teaching myself basic programming principles, and I'm stuck on a dynamic programming problem. Let's take the infamous Knapsack Problem:
Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
Let's set the weight limit to 10, and let's give two lists: weights = [2,4,7] and values = [8,4,9] (I just made these up). I can write the code to give the maximum value given the constraint--that's not a problem. But what about if I want to know what values I actually ended up using? Not the total value--the individual values. So for this example, the maximum would be the objects with weights 2 and 7, for a total value of 8 + 9 = 17. I don't want my answer to read "17" though--I want an output of a list like: (8, 9). It might be easy for this problem, but the problem I'm working on uses lists that are much bigger and that have repeat numbers (for example, multiple objects have a value of 8).
Let me know if anyone can think of anything. As always, much love and appreciation to the community.
Consider each partial solution a Node. Simply add whatever you use into each of these nodes and whichever node becomes the answer at the end will contain the set of items you used.
So each time you find a new solution you just set the list of items to the list of items of the new optimal solution and repeat for each.
A basic array implementation can help you keep track of what items enabled a new DP state to get it's value. For example, if your DP array is w[] then you can have another array p[]. Every time a state is generated for w[i], you set p[i] to the item you used to get to 'w[i]'. Then to output the list of items used for w[n], output p[n], and then move to the index n-weightOf(p[n]) until you reach 0 to output all the items.
So I have a dictionary with key:value(tuple). Something like this. {"name":(4,5),....} where
(4,5) represents two categories (cat1, cat2). Given a maximum number for the second category, I would like to find the optimal combination of dictionary entries such that the 1st category is maximized or minimized.
For example, if maxCat2 = 15, I want to find some combination of entries from the dictionary such that, when I add the cat2 values from each entry together, I'm under 15. There may be many such conditions. Of these possibilities, I would like to pick the one that when I add up the values for cat1 for each entry it is larger than any of the other possibilities.
I thought about writing an algorithm to get all permutations of the entries in the dictionary and then see if each one meets the maxCat2 criteria and then see which one of those gives me the largest total cat1 value. If I have 20 entries, that means I would check 20! combinations, which is a very large number. Is there anything that I can do to avoid this? Thanks.
As Jochen Ritzel pointed out, this can be seen as an instance of the knapsack problem.
Typically, you have a set of objects that have both "weight" (the "second category", in your example) and "value" (or "cost", if it is a minimization problem).
The problem consists in picking a subset of the objects such that the sum of their "values" is maximized/minimized, subject to the constraint the sum of the weights cannot exceed a specified maximum.
Though the problem is intractable in general, if the constraint on the maximum value for the sum of weights is fixed, there exists a polynomial time solution using dynamic programming or memoization.
Very broadly, the idea is to define a set of values where
Cij denotes the maximum sum (of "values") attainable considering only the first i objects where the total weight (of the chosen subset) cannot exceed j.
There are two possible choices here to calculate Cij .
either element i is included in the subset and then
Cij = valuei + Ci-1,j-weighti
or element i is not in the subset of chosen objects, so
Cij = Ci-1,j
The maximum of the two needs to be picked.
If n is the number of elements and w is the maximum total weight, then the answer ends up in Cnw.
Company 1 has this vector:
['books','video','photography','food','toothpaste','burgers'] ... ...
Company 2 has this vector:
['video','processor','photography','LCD','power supply', 'books'] ... ...
Suppose this is a frequency distribution (I could make it a tuple but too much to type).
As you can see...these vectors have things that overlap. "video" and "photography" seem to be "similar" between two vectors due to the fact that they are in similar positions. And..."books" is obviously a strong point for company 1.
Ordering and positioning does matter, as this is a frequency distribution.
What algorithms could you use to play around with this? What algorithms could you use that could provide valuable data for these companies, using these vectors?
I am new to text-mining and information-retrieval. Could someone guide me about those topics in relation to this question?
Is position is very important, as you emphasize, then the crucial metric will be based on the difference of positions between the same items in the different vectors (you can, for example, sum the absolute values of the differences, or their squares). The big issue that needs to be solved is -- how much to weigh an item that's present (say it's the N-th one) in one vector, and completely absent in the other. Is that a relatively minor issue -- as if the missing item was actually present right after the actual ones, for example -- or a really, really big deal? That's impossible to say without more understanding of the actual application area. You can try various ways to deal with this issue and see what results they give on example cases you care about!
For example, suppose "a missing item is roughly the same as if it were present, right after the actual ones". Then, you can preprocess each input vector into a dict mapping item to position (crucial optimization if you have to compare many pairs of input vectors!):
def makedict(avector):
return dict((item, i) for i, item in enumerate(avector))
and then, to compare two such dicts:
def comparedicts(d1, d2):
allitems = set(d1) | set(d2)
distances = [d1.get(x, len(d1)) - d2.get(x, len(d2)) for x in allitems]
return sum(d * d for d in distances)
(or, abs(d) instead of the squaring in the last statement). To make missing items weigh more (make dicts, i.e. vectors, be considered further away), you could use twice the lengths instead of just the lengths, or some large constant such as 100, in an otherwise similarly structured program.
I would suggest you a book called Programming Collective Intelligence. It's a very nice book on how you can retrieve information from simple data like this one. There are code examples included (in Python :)
Edit:
Just replying to gbjbaanb: This is Python!
a = ['books','video','photography','food','toothpaste','burgers']
b = ['video','processor','photography','LCD','power supply', 'books']
a = set(a)
b = set(b)
a.intersection(b)
set(['photography', 'books', 'video'])
b.intersection(a)
set(['photography', 'books', 'video'])
b.difference(a)
set(['LCD', 'power supply', 'processor'])
a.difference(b)
set(['food', 'toothpaste', 'burgers'])
Take a look at Hamming Distance
As mbg mentioned, the hamming distance is a good start. It's basically assigning a bitmask for every possible item whether it is contained in the companies value.
Eg. toothpaste is 1 for company A, but 0 for company B. You then count the bits which differ between the companies. The Jaccard coefficient is related to this.
Hamming distance will actually not be able to capture similarity between things like "video" and "photography". Obviously, a company that sells one does sell the other also with higher probability than a company that sells toothpaste.
For this, you can use stuff like LSI (it's also used for dimensionality reduction) or factorial codes (e.g. neural network stuff as Restricted Boltzman Machines, Autoencoders or Predictablity Minimization) to get more compact representations which you can then compare using the euclidean distance.
pick the rank of each entry (higher rank is better) and make sum of geometric means between matches
for two vectors
sum(sqrt(vector_multiply(x,y))) //multiply matches
Sum of ranks for each value over vector should be same for each vector (preferrebly 1)
That way you can make compares between more than 2 vectors.
If you apply ikkebr's metod you can find how a is simmilar to b
in that case just use
sum( b( b.intersection(a) ))
You could use the set_intersection algorithm. The 2 vectors must be sorted first (use sort call), then pass in 4 iterators and you'll get a collection back with the common elements inserted into it. There are a few others that operate similarly.