Solving knapsack problem using a greedy python algorithm - python

I'm trying to solve the knapsack problem using Python, implementing a greedy algorithm. The result I'm getting back makes no sense to me.
Knapsack:
The first line gives the number of items, in this case 20. The last line gives the capacity of the knapsack, in this case 524. The remaining lines give the index, value and weight of each item.
20
1 91 29
2 60 65
3 61 71
4 9 60
5 79 45
6 46 71
7 19 22
8 57 97
9 8 6
10 84 91
11 20 57
12 72 60
13 32 49
14 31 89
15 28 2
16 81 30
17 55 90
18 43 25
19 100 82
20 27 19
524
Python code:
import os
def constructive():
knapsack = []
Weight = 0
while(Weight <= cap):
best = max(values)
i = values.index(best)
knapsack.append(i)
Weight = Weight + weights[i]
del values[i]
del weights[i]
return knapsack, Weight
def read_kfile(fname):
with open(fname, 'rU') as kfile:
lines = kfile.readlines() # reads the whole file
n = int(lines[0])
c = int(lines[n+1])
vs = []
ws = []
lines = lines[1:n+1] # Removes the first and last line
for l in lines:
numbers = l.split() # Converts the string into a list
vs.append(int(numbers[1])) # Appends value, need to convert to int
ws.append(int(numbers[2])) # Appends weigth, need to convert to int
return n, c, vs, ws
dir_path = os.path.dirname(os.path.realpath(__file__)) # Get the directory where the file is located
os.chdir(dir_path) # Change the working directory so we can read the file
knapfile = 'knap20.txt'
nitems, cap, values, weights = read_kfile(knapfile)
val1,val2 =constructive()
print ('knapsack',val1)
print('weight', val2)
print('cap', cap)
Result:
knapsack [18, 0, 8, 13, 3, 8, 1, 0, 3]
weight 570
cap 524

Welcome. the reason why your program is giving a weights over the cap limit is because on the final item you are putting in the knapsack, you aren't checking if it can fit in it. To do this just add an if statement, Also you should check if the list of values is empty. Do note that I have append (i+1) since your text file's index is starting at 1 but Python starts it's list index at 0:
def constructive():
knapsack = []
Weight = 0
while(Weight <= cap and values):
best = max(values)
i = values.index(best)
if weights[i] <= cap-Weight:
knapsack.append(i+1)
Weight = Weight + weights[i]
del values[i]
del weights[i]
return knapsack, Weight

The problem is -- in the last step -- the best item you find will exceed the maximum weight. But since you already entered the loop you add it anyway.
In the next iteration you recognize that you are over the cap and stop.
I am not sure how you want to proceed once the next best is too heavy. In case you simple want to stop and not add anything more you can simply modify your constructive to look as follows:
def constructive():
knapsack = []
Weight = 0
while(True):
best = max(values)
i = values.index(best)
if Weight + weights[i] > cap:
break
knapsack.append(i)
Weight = Weight + weights[i]
del values[i]
del weights[i]
return knapsack, Weight

Related

finding which person got highest percentage according to their marks

The first line of the input contains an integer which represents the number of lines
The next n lines represent a space-separated list of the person and their marks in the four subjects
output should be name of the highest percentage?
for example
input:-
4
Manoj 30 40 45 63
Shivam 38 29 45 60
Siddheshwar 38 35 39 45
Ananya 45 29 30 51
Output:-
Manoj
code :-
details_vertical=[]
for ctr in range(4):
details_vertical.append(input().split())
for name,marks,marks,marks in zip(*details_vertical):
print(f"{name}")
Is this something that you're looking for? Try it first, and ask questions.
There is room to improve it, but this is prob. most straightforward way.
details =[]
highest = 0
for i in range(4):
details = (input().split())
print(details) # just for debug, can comment out
total = sum(int(x) for x in details[1:]) # get this person's total
# change this print line to get average printout:
print(total / len(details[1:]) # it allows you have n+ scores flexibility - not assuming it's 4 only!
if total > highest: # processing along the way, so we don't have to save all the scores...
highest = total
best = details[0] # the person
print(best) # Manoj
is this you are looking for ?
input = """4
Manoj 30 40 45 63
Shivam 38 29 45 60
Siddheshwar 38 35 39 45
Ananya 45 29 30 51"""
# load data from row 1
x = input.split('\n')[1:]
x = [item.split() for item in x]
# append the average of 4 subject in last
y = [item + [sum([int(subitem) for subitem in item[1:]])/4] for item in x]
# sort the row by average
y.sort(key = lambda x:x[-1], reverse=True)
# first row first element is name with highest average
y[0][0]
This Solution might be an understandable one :
first_inp = int(input("No. of Students"))
no_of_subs = int(input("No. of Subjects :"))
details = {tuple(int(input("Marks")) for i in range(no_of_subs)):input("Name") for i in range(first_inp)}
lis_res = [sum(i) for i in details]
det_val = [v for v in details.values()]
print (det_val[lis_res.index(max(lis_res))])
In the end you can use return keyword to return that value if you are defining a function.

Read in Values from external file, add them, print results

Problem:
I have 50 text files, each with thousands of lines of text, each line has a value on it. I am only interesting in a small section near the middle (lines 757-827 - it is actually lines 745-805 I'm interested in, but the first 12 lines of every file is irrelevant stuff). I would like to read each file in. And then total the values between those lines. In the end I would like it to print off a pair of numbers in the format (((n+1)*18),total count), where n is the number of the file (since they are numbered starting at zero). Then repeat for all 50 files, giving 50 pairs of numbers, looking something like:
(18,77),(36,63),(54,50),(72,42),...
Code:
import numpy as np
%matplotlib inline
from numpy import loadtxt, linspace
import glob, os
fileToRun = 'Run0'
location = 'ControlRoom6'
DeadTime = 3
LiveTime = 15
folderId = '\\'
baseFolder = 'C:'+folderId+'Users'+folderId+location+folderId+'Documents'+folderId+'PhD'+folderId+'Ubuntu-Analysis-DCF'+folderId+'DCF-an-b+decay'+folderId+'dcp-ap-27Al'+folderId+''
prefix = 'DECAY_COINC'
folderToAnalyze = baseFolder + fileToRun + '\\'
MaestroT = LiveTime + DeadTime
## Gets number of files
files = []
os.chdir(folderToAnalyze)
for file in glob.glob(prefix + "*.Spe"):
files.append(file)
numfiles = len(files)
if numfiles<=1:
print('numfiles is {0}, minimum of 2 is required'.format(numfiles))
raise SystemExit(0)
xmin = 745
xmax = 815
skips = 12
n=[]
count=[]
for n in range(0, numfiles):
x = np.linspace(0, 8191, 8192)
finalprefix = str(n).zfill(3)
fullprefix = folderToAnalyze + prefix + finalprefix
y = loadtxt(fullprefix + ".Spe", skiprows = 12, max_rows = 8192)
for x in range(xmin+skips,xmax+skips):
count = count + y
time = MaestroT*(n+1)
print(time, count)
Current output is:
'ValueError Traceback (most recent call last)
in
84
85 for x in range(xmin+skips,xmax+skips):
---> 86 count = count + y
87 time = MaestroT*(n+1)
88
ValueError: operands could not be broadcast together with shapes (0,) (8192,)'
However I did previously have this running, it just printing out thousands of seemingly unconnected numbers. Does anyone know how I can alter the code to acheive the desired result?
EDIT: Data Set
In order to make the example easier to use, I've made a dropbox with some dummy data. The files are named the same as it would be reading in, and are written in the same format (the first 12 rows with unuseful information). Link is Here. I haven't written 8192 dummy numbers as I thought it would probably be easier and produce a nearer facsilime to just use the actual files with a few numbers changed.
Solution was to edit code as shown starting from 'xmin = 745':
xmin = 745
xmax = 815
skip = 12
for n in range(0, numfiles):
total = 0
x = np.linspace(0, 8191, 8192)
finalprefix = str(n).zfill(3)
fullprefix = folderToAnalyze + prefix + finalprefix
y = loadtxt(fullprefix + ".Spe", skiprows= xmin + skip, max_rows = xmax - xmin)
for x in y:
val = int(x)
total = total + val
print(((n+1)*MaestroT), total)
Prints out as
18 74
36 64
54 62
72 54
90 47
108 39
126 40
144 35
etc.
Which fit my needs.

facing with "list index out of range" error when using objects

I want to make list of objects. My code is:
std_list_instance = list()
for i in range(0, std_cls_1_num): #std_cls_1_num set by user
std = Student(std_cls_1_list[0][i], std_cls_1_list[1][i], std_cls_1_list[2][i]) #Student has 3 fields that given from a list
std_list_instance[i].append(std)
class def is:
class Student:
count = 0
def __init__(self, age, height, weight):
self.age = age
self.height = height
self.weight = weight
Student.count += 1
def get_ave_age(self):
print('age is %i' %self.age)
std_cls_1_list is:
std_cls_1_list = list()
for i in range(0, 3):
std_cls_1_list.append([int(std_cls_1_num) for std_cls_1_num in input().split()])
and finally user input is like:
5 #number of student
16 17 15 16 17 #age of 5 std
180 175 172 170 165 #height of 5 std
67 72 59 62 55 #weight of 5 std
then I faced with "list index out of range error".
I know there is a problem in my code, but I can't fix it.
I believe that you need to switch accessing list elements. Instead of [0][i] do [i][0]:
std_list_instance = list()
for i in range(0, std_cls_1_num): #std_cls_1_num set by user
std = Student(std_cls_1_list[i][0], std_cls_1_list[i][1], std_cls_1_list[i][2]) #Student has 3 fields that given from a list
std_list_instance[i].append(std)
But you will still get the error if:
std_cls_1_num > len(std_cls_1_list) - std_cls_1_num greater than number of elements in std_cls_1_list
not all(len(st) > 3 for st in std_cls_1_list) - not every element of std_cls_1_list has at least 3 elements
You must ensure the size of “std_cls_1_num” must be equal or less then the size of “std_cls_1_list” rows..if its greater then the row size the program will throw index error.
Also if possible please put the related code/definitions/input then only we can comment..

Python DEAP, how to stop the evolution when the fitness doesn't increase after X generations?

I want to stop the genetic algorithm when the fitness doesn't increase.
I'm using the DEAP library in python.
Typically, I have the following log file:
gen nevals mean max
0 100 0.352431 0.578592
1 83 -0.533964 0.719633
2 82 -0.567494 0.719633
3 81 -0.396759 0.751318
4 74 -0.340427 0.87888
5 80 -0.29756 0.888443
6 86 -0.509486 0.907789
7 85 -0.335586 1.06199
8 69 -0.23967 1.12339
9 73 -0.10727 1.20622
10 88 -0.181696 1.20622
11 77 -0.188449 1.20622
12 72 0.135398 1.25254
13 67 0.0304611 1.26931
14 74 -0.0436463 1.3181
15 70 0.289306 1.37582
16 79 -0.0441134 1.37151
17 73 0.339611 1.37204
18 68 -0.137938 1.37204
19 76 0.000527522 1.40034
20 84 0.198005 1.40078
21 69 0.243705 1.4306
22 74 0.11812 1.4306
23 83 0.16235 1.4306
24 82 0.270455 1.43492
25 76 -0.200259 1.43492
26 77 0.157181 1.43492
27 74 0.210868 1.43492
I initially set ngen = 200, but as you can see, the fitness function achieve a local maximum at 22th generation. So I want to stop the genetic algorithm when this happens.
def main():
random.seed(64)
pop = toolbox.population(n=100)
CXPB, MUTPB = 0.5, 0.2
print "Start of evolution"
fitnesses = list(map(toolbox.evaluate, pop))
for ind, fit in zip(pop, fitnesses):
ind.fitness.values = fit
print " Evaluated %i individuals" % len(pop)
fits = [ind.fitness.values[0] for ind in pop]
g = 0
while max(fits) < 0.67 and g < 1000000:
g = g + 1
print "-- Generation %i --" % g
offspring = toolbox.select(pop, len(pop))
offspring = list(map(toolbox.clone, offspring))
for child1, child2 in zip(offspring[::2], offspring[1::2]):
if random.random() < CXPB:
toolbox.mate(child1, child2)
del child1.fitness.values
del child2.fitness.values
for mutant in offspring:
if random.random() < MUTPB:
toolbox.mutate(mutant)
del mutant.fitness.values
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
pop[:] = offspring
fits = [ind.fitness.values[0] for ind in pop]
print "fitness-- ",max(fits)
print "-- End of (successful) evolution --"
best_ind = tools.selBest(pop, 1)[0]
triangle_to_image(best_ind).save('best.jpg')
this will stop the code when a fitness value desired is reached or a particular number of generations are over
you can set is in such a way that it stops when fitness doesn't change for some time i.e when it reaches local maxim and gets stuck there
line 12
this example stops when fitness crosses 0.67
and then saves the result
this is the way to do it when you are not using something like hall of fame
dont know how to do it then if you find it tell me too
Honestly I was looking into that issue recently too.
Following the research I've done recently here is what I found:
There's a DEAP example which implements CMA-ES algorithm. It has stopping criteria included (Python DEAP, how to stop the evolution when the fitness doesn't increase after X generations?)
There is a dissertation thesis worth reading on that: https://heal.heuristiclab.com/system/files/diss%20gkr2.pdf
The above mentioned solution implements what's mentioned in Issue: https://github.com/DEAP/deap/issues/271
I haven't yet tried none of the above but I'm more than sure that it will work.

Query Board challenge on Python, need some pointers

So, I have this challenge on CodeEval, but I seem don't know where to start, so I need some pointers (and answers if you can) to help me figure out this challenge.
DESCRIPTION:
There is a board (matrix). Every cell of the board contains one integer, which is 0 initially.
The next operations can be applied to the Query Board:
SetRow i x: it means that all values in the cells on row "i" have been change value to "x" after this operation.
SetCol j x: it means that all values in the cells on column "j" have been changed to value "x" after this operation.
QueryRow i: it means that you should output the sum of values on row "i".
QueryCol j: it means that you should output the sum of values on column "j".
The board's dimensions are 256x256
i and j are integers from 0 to 255
x is an integer from 0 to 31
INPUT SAMPLE:
Your program should accept as its first argument a path to a filename. Each line in this file contains an operation of a query. E.g.
SetCol 32 20
SetRow 15 7
SetRow 16 31
QueryCol 32
SetCol 2 14
QueryRow 10
SetCol 14 0
QueryRow 15
SetRow 10 1
QueryCol 2
OUTPUT SAMPLE:
For each query, output the answer of the query. E.g.
5118
34
1792
3571
I'm not that great on Python, but this challenge is pretty interesting, although I didn't have any clues on how to solve it. So, I need some help from you guys.
Thanks!
You could use a sparse matrix for this; addressed by (col, row) tuples as keys in a dictionary, to save memory. 64k cells is a big list otherwise (2MB+ on a 64-bit system):
matrix = {}
This is way more efficient, as the challenge is unlikely to set values for all rows and columns on the board.
Setting a column or row is then:
def set_col(col, x):
for i in range(256):
matrix[i, col] = x
def set_row(row, x):
for i in range(256):
matrix[row, i] = x
and summing a row or column is then:
def get_col(col):
return sum(matrix.get((i, col), 0) for i in range(256))
def get_row(row):
return sum(matrix.get((row, i), 0) for i in range(256))
WIDTH, HEIGHT = 256, 256
board = [[0] * WIDTH for i in range(HEIGHT)]
def set_row(i, x):
global board
board[i] = [x]*WIDTH
... implement each function, then parse each line of input to decide which function to call,
for line in inf:
dat = line.split()
if dat[0] == "SetRow":
set_row(int(dat[1]), int(dat[2]))
elif ...
Edit: Per Martijn's comments:
total memory usage for board is about 2.1MB. By comparison, after 100 random row/column writes, matrix is 3.1MB (although it tops out there and doesn't get any bigger).
yes, global is unnecessary when modifying a global object (just don't try to assign to it).
while dispatching from a dict is good and efficient, I did not want to inflict it on someone who is "not that great on Python", especially for just four entries.
For sake of comparison, how about
time = 0
WIDTH, HEIGHT = 256, 256
INIT = 0
rows = [(time, INIT) for _ in range(WIDTH)]
cols = [(time, INIT) for _ in range(HEIGHT)]
def set_row(i, x):
global time
time += 1
rows[int(i)] = (time, int(x))
def set_col(i, x):
global time
time += 1
cols[int(i)] = (time, int(x))
def query_row(i):
rt, rv = rows[int(i)]
total = rv * WIDTH + sum(cv - rv for ct, cv in cols if ct > rt)
print(total)
def query_col(j):
ct, cv = cols[int(j)]
total = cv * HEIGHT + sum(rv - cv for rt, rv in rows if rt > ct)
print(total)
ops = {
"SetRow": set_row,
"SetCol": set_col,
"QueryRow": query_row,
"QueryCol": query_col
}
inf = """SetCol 32 20
SetRow 15 7
SetRow 16 31
QueryCol 32
SetCol 2 14
QueryRow 10
SetCol 14 0
QueryRow 15
SetRow 10 1
QueryCol 2""".splitlines()
for line in inf:
line = line.split()
op = line.pop(0)
ops[op](*line)
which only uses 4.3k of memory for rows[] and cols[].
Edit2:
using your code from above for matrix, set_row, set_col,
import sys
for n in range(256):
set_row(n, 1)
print("{}: {}".format(2*(n+1)-1, sys.getsizeof(matrix)))
set_col(n, 1)
print("{}: {}".format(2*(n+1), sys.getsizeof(matrix)))
which returns (condensed:)
1: 12560
2: 49424
6: 196880
22: 786704
94: 3146000
... basically the allocated memory quadruples at each step. If I change the memory measure to include key-tuples,
def get_matrix_size():
return sys.getsizeof(matrix) + sum(sys.getsizeof(key) for key in matrix)
it increases more smoothly, but still takes a bit jump at the above points:
5 : 127.9k
6 : 287.7k
21 : 521.4k
22 : 1112.7k
60 : 1672.0k
61 : 1686.1k <-- approx expected size on your reported problem set
93 : 2121.1k
94 : 4438.2k

Categories