Knapsack I/O classic problem to get least valuable items - python

The classic knapsack addresses the solution to get the most valuable items inside the knapsack which has a limited weight it can carry.
I am trying to get instead the least valuable items.
The following code is a very good one using Recursive dynamic programming from rosetacode http://rosettacode.org/wiki/Knapsack_problem/0-1#Recursive_dynamic_programming_algorithm
def total_value(items, max_weight):
return sum([x[2] for x in items]) if sum([x[1] for x in items]) <= max_weight else 0
cache = {}
def solve(items, max_weight):
if not items:
return ()
if (items,max_weight) not in cache:
head = items[0]
tail = items[1:]
include = (head,) + solve(tail, max_weight - head[1])
dont_include = solve(tail, max_weight)
if total_value(include, max_weight) > total_value(dont_include, max_weight):
answer = include
else:
answer = dont_include
cache[(items,max_weight)] = answer
return cache[(items,max_weight)]
items = (
("map", 9, 150), ("compass", 13, 35), ("water", 153, 200), ("sandwich", 50, 160),
("glucose", 15, 60), ("tin", 68, 45), ("banana", 27, 60), ("apple", 39, 40),
("cheese", 23, 30), ("beer", 52, 10), ("suntan cream", 11, 70), ("camera", 32, 30),
("t-shirt", 24, 15), ("trousers", 48, 10), ("umbrella", 73, 40),
("waterproof trousers", 42, 70), ("waterproof overclothes", 43, 75),
("note-case", 22, 80), ("sunglasses", 7, 20), ("towel", 18, 12),
("socks", 4, 50), ("book", 30, 10),
)
max_weight = 400
solution = solve(items, max_weight)
print "items:"
for x in solution:
print x[0]
print "value:", total_value(solution, max_weight)
print "weight:", sum([x[1] for x in solution])
I have been trying to figure out how can i get the least valuable items looking on the internet with no luck so maybe somebody can help me with that.
I really apreciate your help in advance.

I'll try my best to guide you through what should be done to achieve this.
In order to make changes to this code and find the least valuable items with which you can fill the bag make a function which,
Takes in the most valuable items(solution in your code) as the
input
Find the (I'll call it least_items) items that you
will be leaving behind
Check if the total weight of the items in least_items is greater
than the max_weight.
If yes find the most valuable items in least_items and remove them from least_items.This will be a place where you will have
to initiate some sort of recursion to keep seperating the least
valueable from the most valuable
If no that means you could fill you knapsack with more items.So then you have to go back to the most valuable items you had
and keep looking for the least valuable items until you fill the
knapsack.Again some sort of recursion will have too be initiated
But take note that you will also have to include a terminating step so that the program stops when it has found the best solution.
This is not the best solution you could make though.I tried finding something better myself but unfortunately it demands more time than I thought.Feel free to leave any problems in the comments.I'll be happy to help.
Hope this helps.

Related

Python: unique value in list array

Image this output from fuzzywuzzy (values could be in another sequence):
[('car', 100, 28),
('tree', 80, 5),
('house', 44, 12),
('house', 44, 25),
('house', 44, 27)]
i want to treat the three houses as the same.
What is an efficient way to have only unique string values to come to this result:
(EDIT: since all houses has the same value 44, I donĀ“t care which of them is in the list. The last house value is irrelevant)
[('car', 100, 28),
('tree', 80, 5),
('house', 44, 12)]
I saw a lot of questions here about uniqueness in lists, but the answers are not working for my example, mostly because author needs a solution just for one list.
I tried this:
unique = []
for element in domain1:
if element[0] not in unique:
unique.append(element)
I thought I cound address the first values with element[0] and check if they exists in unique.
If I print unique I have the same result as after fuzzywuzzy. Seems I am not on the right path with my idea, so how can I achieve my desired result?
Thanks!
you can use dict for it for example:
data = [('car', 100, 28),
('tree', 80, 5),
('house', 44, 12),
('house', 44, 25),
('house', 44, 27)
]
list({x[0]: x for x in reversed(data)}.values())
give you
[('house', 44, 12), ('tree', 80, 5), ('car', 100, 28)]
using the dict give you unique by first element, and the reversed need to put right value to the result ( by default it will be last met).
Could use dict.setdefault here to store the first item found(using first item in tuple as key):
lst = [
("car", 100, 28),
("tree", 80, 5),
("house", 44, 12),
("house", 44, 25),
("house", 44, 27),
]
d = {}
for x, y, z in lst:
d.setdefault(x, (x, y, z))
print(list(d.values()))
Or using indexing instead of tuple unpacking:
d = {}
for item in lst:
d.setdefault(item[0], item)
Output:
[('car', 100, 28), ('tree', 80, 5), ('house', 44, 12)]

How to solve Students marks dashboard kinds of problems - Can't we use simple code in python to solve this problem..?

Consider the marks list of class students given in two lists
Students = ['student1','student2','student3','student4','student5','student6','student7','student8','student9','student10']
Marks = [45, 78, 12, 14, 48, 43, 45, 98, 35, 80]
from the above two lists the Student[0] got Marks[0], Student[1] got Marks[1] and so on
Who got marks between >25th percentile <75th percentile, in the increasing order of marks
My question -
Can't we use simple code in python to solve this problem..?
I have written code till this. To find the numbers >25 and <75 but unable to make it in ascending order. Sort() is not working and sorted is also not working. Please help how to extract the particular array values and assign to another array to solve this problem.
for i in range(0,10):
if Marks[i]>25 and Marks[i]<75:
print(Students[i],Marks[i])
print(i)
A small addition to your code can solve this issue, below is the solution
Students = ['student1','student2','student3','student4','student5','student6','student7','student8','student9','student10']
Marks = [45, 78, 12, 14, 48, 43, 45, 98, 35, 80]
Students,Marks=zip(*sorted(zip(Students, Marks))) #addition to your code
for i in range(0,10):
if Marks[i]>25 and Marks[i]<75:
print(Students[i],Marks[i])
25th percentile is "bottom fourth out of those who took the thing", and 75th percentile is "top fourth", regardless of the actual score. So what you need to do is sort the list, then take a slice out of the middle, based on the index.
Here's what I think you're trying to do:
import math
students = ['student1','student2','student3','student4','student5','student6','student7','student8','student9','student10']
marks = [45, 78, 12, 14, 48, 43, 45, 98, 35, 80]
# zip() will bind together corresponding elements of students and marks
# e.g. [('student1', 45), ('student2', 78), ...]
grades = list(zip(students, marks))
# once that's all in one list of 2-tuples, sort it by calling .sort() or using sorted()
# give it a "key", which specifies what criteria it should sort on
# in this case, it should sort on the mark, so the second element (index 1) of the tuple
grades.sort(key=lambda e:e[1])
# [('student3', 12), ('student4', 14), ('student9', 35), ('student6', 43), ('student1', 45), ('student7', 45), ('student5', 48), ('student2', 78), ('student10', 80), ('student8', 98)]
# now, just slice out the 25th and 75th percentile based on the length of that list
twentyfifth = math.ceil(len(grades) / 4)
seventyfifth = math.floor(3 * len(grades) / 4)
middle = grades[twentyfifth : seventyfifth]
print(middle)
# [('student6', 43), ('student1', 45), ('student7', 45), ('student5', 48)]
You have 10 students here, so how you round twentyfifth and seventyfifth is up to you (I chose to include those strictly those within 25-75th percentile, by rounding 'inwards' - you could do the opposite by switching ceil and floor, and get your final list to have two more elements in this case - or you could round them both the same way).
Looks like #Green Cloak Guy answer is the correct. But anyway, if what you want is to get the data of students with marks between two ranges I'll do it like this:
# Get a dict of students with it's mark, filtered by those with mark between 25 and 75
students_mark = {s: m for s, m in zip(Students, Marks) if m > 25 and m < 75}
# Sort results
res = dict(sorted(students_mark.items(), key=lambda i: i[1])
# res: {'student9': 35, 'student6': 43, 'student1': 45, 'student7': 45, 'student5': 48}
# In one line
res = {s: m for s, m in sorted(zip(Students, Marks), key=lambda i: i[1]) if m > 25 and m < 75}
As a summary: first link each student with it's score, and then filter and sort. I stored the result as dictionary because it seems more convinient.

Cluster analysis within a set of integers

Sorry for the broad title, I just do not know how to name this.
I have a list of integers, let's say:
X = [20, 30, 40, 50, 60, 70, 80, 100]
And a second list of tuples of size 2 to 6 made from this integers:
Y = [(20, 30), (40, 50, 80, 100), (100, 100, 100), ...]
Some of the numbers come back quite often in Y and I'd like to identify the group of integers coming back often.
Right now, I'm counting the number of apparition of each integer. It gives me some information, but nothing about the groups.
Example:
Y = [(20, 40, 80), (30, 60, 80), (60, 80, 100), (60, 80, 100, 20), (40, 60, 80, 20, 100), ...]
On that example (60, 80) and (60, 80, 100) are combinations which come back often.
I could use itertools.combinations_with_replacement() to generate every combinations and then count the number of apparition, but is there any other better way to do this?
Thanks.
Don't know if it is a strictly better way to do it or rather similar, but you could try to check for appearance fraction of subsets. Below a brute force way of doing so, storing the results in a dictionary. Quite possibly, it would be better to build a tree where you don't search through a branch if the appearance rate of its elements already did not make the cut. (i.e. if (20,80) does not appear together often enough, then why search for (20,80,100)?)
N=len(Y)
dicter = {}
for i in range(2,7):
for comb in itertools.combinations(X,i):
c3 = set(comb)
d3 = sum([c3.issubset(set(val)) for val in Y])/N
dicter['{}'.format(c3)] = d3
As edit: you probably are not interested in all non-appearances, so I'll throw in a piece of code to chop down the final dictionary size..First we define a function to return a shallow copy of our dictionary with 1 value removed. This is required to avoid RunTimeError when looping over the dict.
def removekey(d, key):
r = dict(d)
del r[key]
return r
Then we remove insignificant "clusters"
for d, v in dicter.items():
if v < 0.1:
dicter = removekey(dicter, d)
It will still be unsorted, as itertools and sets do not sort by themselves. Hope this will help you further along.
The approach that you are looking for is called
Frequent Itemset Mining
It finds frequent subsets, given a list of sets.

Extremely inefficient python code

I have made a program to allow users to input the largest possible hypotenuse of a right-angled triangle and my program will list down a list of all possible sides of the triangles. Problem is, the program takes forever to run when I input a value such as 10000. Any suggestions on how to improve the efficiency of the program?
Code:
largest=0
sets=0
hypotenuse=int(input("Please enter the length of the longest side of the triangle"))
for x in range(3,hypotenuse):
for y in range(4, hypotenuse):
for z in range(5,hypotenuse):
if(x<y<z):
if(x**2+y**2==z**2):
commonFactor=False
for w in range(2,x//2):
if (x%w==0 and y%w==0 and z%w==0):
commonFactor=True
break
if not(commonFactor):
print(x,y,z)
if(z>largest):
largest=z
sets+=1
print("Number of sets: %d"%sets)
print("Largest hypotenuse is %d"%largest)
Thanks!
like this?
hypothenuse=10000
thesets=[]
for x in xrange(1, hypothenuse):
a=math.sqrt(hypothenuse**2-x**2)
if(int(a)==a):
thesets.append([x,a])
print "amount of sets: ", len(thesets)
for i in range(len(thesets)):
print thesets[i][0],thesets[i][1], math.sqrt(thesets[i][0]**2+ thesets[i][1]**2)
edit: changed so you can print the sets too, (this method is in O(n), which is the fastest possible method i guess?) note: if you want the amount of sets, each one is given twice, for example: 15*2=9*2+12*2 = 12*2+9**2
Not sure if i understand your code correctly, but if you give in 12, do you than want all possible triangles with hypothenuse smaller than 12? or do you than want to know the possibilities (one as far as i know) to write 12*2=a*2+b**2?
if you want all possibilities, than i will edit the code a little bit
for all possibilities of a*2+b*2 = c**2, where c< hypothenuse (not sure if that is the thing you want):
hypothenuse=15
thesets={}
for x in xrange(1,hypothenuse):
for y in xrange(1,hypothenuse):
a=math.sqrt(x**2+y**2)
if(a<hypothenuse and int(a)==a):
if(x<=y):
thesets[(x,y)]=True
else:
thesets[(y,x)]=True
print len(thesets.keys())
print thesets.keys()
this solves in O(n**2), and your solution does not even work if hypothenuse=15, your solution gives:
(3, 4, 5)
(5, 12, 13)
Number of sets: 2
while correct is:
3
[(5, 12), (3, 4), (6, 8)]
since 5*2+12*2=13*2, 3*2+4*2=5*2, and 6*2+8*2=10**2, while your method does not give this third option?
edit: changed numpy to math, and my method doesnt give multiples either, i just showed why i get 3 instead of 2, (those 3 different ones are different solutions to the problem, hence all 3 are valid, so your solution to the problem is incomplete?)
Here's a quick attempt using pre-calculated squares and cached square-roots. There are probably many mathematical optimisations.
def find_tri(h_max=10):
squares = set()
sq2root = {}
sq_list = []
for i in xrange(1,h_max+1):
sq = i*i
squares.add(sq)
sq2root[sq] = i
sq_list.append(sq)
#
tris = []
for i,v in enumerate(sq_list):
for x in sq_list[i:]:
if x+v in squares:
tris.append((sq2root[v],sq2root[x],sq2root[v+x]))
return tris
Demo:
>>> find_tri(20)
[(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
One very easy optimization is to arbitrarily decide x <= y. e.g., if (10,15,x) is not a solution then (15,10,x) will not be a solution either. This also means that if 2x**2 > hypoteneuse**2 then you can terminate the algorithm as there is no solution.

Better Way to Write This List Comprehension?

I'm parsing a string that doesn't have a delimiter but does have specific indexes where fields start and stop. Here's my list comprehension to generate a list from the string:
field_breaks = [(0,2), (2,10), (10,13), (13, 21), (21, 32), (32, 43), (43, 51), (51, 54), (54, 55), (55, 57), (57, 61), (61, 63), (63, 113), (113, 163), (163, 213), (213, 238), (238, 240), (240, 250), (250, 300)]
s = '4100100297LICACTIVE 09-JUN-198131-DEC-2010P0 Y12490227WYVERN RESTAURANTS INC 1351 HEALDSBURG AVE HEALDSBURG CA95448 ROUND TABLE PIZZA 575 W COLLEGE AVE STE 201 SANTA ROSA CA95401 '
data = [s[x[0]:x[1]].strip() for x in field_breaks]
Any recommendation on how to improve this?
You can cut your field_breaks list in half by doing:
field_breaks = [0, 2, 10, 13, 21, 32, 43, ..., 250, 300]
s = ...
data = [s[x[0]:x[1]].strip() for x in zip(field_breaks[:-1], field_breaks[1:])]
You can use tuple unpacking for cleaner code:
data = [s[a:b].strip() for a,b in field_breaks]
To be honest, I don't find the parse-by-column-number approach very readable, and I question its maintainability (off by one errors and the like). Though I'm sure the list comprehensions are very virtuous and efficient in this case, and the suggested zip-based solution has a nice functional tweak to it.
Instead, I'm going to throw softballs from out here in left field, since list comprehensions are supposed to be in part about making your code more declarative. For something completely different, consider the following approach based on the pyparsing module:
def Fixed(chars, width):
return Word(chars, exact=width)
myDate = Combine(Fixed(nums,2) + Literal('-') + Fixed(alphas,3) + Literal('-')
+ Fixed(nums,4))
fullRow = Fixed(nums,2) + Fixed(nums,8) + Fixed(alphas,3) + Fixed(alphas,8)
+ myDate + myDate + ...
data = fullRow.parseString(s)
# should be ['41', '00100297', 'LIC', 'ACTIVE ',
# '09-JUN-1981', '31-DEC-2010', ...]
To make this even more declarative, you could name each of the fields as you come across them. I have no idea what the fields actually are, but something like:
someId = Fixed(nums,2)
someOtherId = Fixed(nums,8)
recordType = Fixed(alphas,3)
recordStatus = Fixed(alphas,8)
birthDate = myDate
issueDate = myDate
fullRow = someId + someOtherId + recordType + recordStatus
+ birthDate + issueDate + ...
Now an approach like this probably isn't going to break any land speed records. But, holy cow, wouldn't you find this easier to read and maintain?
Here is a way using map
data = map(s.__getslice__, *zip(*field_breaks))

Categories