An algorithm for efficiently replacing an array elements with groups

An algorithm for efficiently replacing an array elements with groups - python

There are N elements, each element has its own cost. And there are M groups. Each group includes several indices of elements from the array and has its own cost.
input for example
6
100 5
200 5
300 5
400 5
500 5
600 3
2
4 6
100 200 300 700
3 5
300 400 500
The first number N is the number of elements. The next N lines contain the index and cost of a particular item. Then comes the number M (number of groups). After it comes 2*M lines . These lines contain the number of elements in the group, the cost of the group itself, and the indices of the elements.
I want to find the minimum cost for which can purchase all N items.
In the example, it is most advantageous to take both groups and purchase an element with the number 600 separately. The answer is 14. (6+5+3)
Here is my solution
from queue import PriorityQueue
N = int(input())
dct = {}
groups = PriorityQueue()
for i in range(N):
a,c = [int(j) for j in input().split()]
dct[a] = c
M = int(input())
for i in range(M):
k,c = [int(j) for j in input().split()]
s = 0
tmp = []
for j in input().split():
j_=int(j)
if j_ in dct:
s+=dct[j_]
tmp.append(j_)
d = c-s
if d<0:
groups.put([d, c, tmp])
s = 0
while not groups.empty():
#print(dct)
#for i in groups.queue:
# print(i)
g = groups.get()
if g[0]>0:
break
#print('G',g)
#print('-------')
for i in g[2]:
if i in dct:
del(dct[i])
s += g[1]
groups_ = PriorityQueue()
for i in range(len(groups.queue)):
g_ = groups.get()
s_ = 0
tmp_ = []
for i in g_[2]:
if i in dct:
s_+=dct[i]
tmp_.append(i)
d = g_[1]-s_
groups_.put([d, g_[1], tmp_])
groups = groups_
for i in dct:
s+=dct[i]
print(s)
But it is not completely true.
For example, for such a test, it gives an answer of 162. But the correct answer is 160. It is most beneficial to take only the first and second groups and take an element with index 0 separately.
20
0 24
1 32
2 33
3 57
4 57
5 50
6 50
7 41
8 2
9 73
10 81
11 73
12 55
13 3
14 54
15 43
16 98
17 8
18 41
19 97
5
17 61
17 9 11 15 1 13 14 7 20 2 3 16 12 5 8 4 6
13 75
20 15 5 9 10 11 7 8 18 2 4 19 16
10 96
3 9 4 18 11 6 8 5 2 14
9 92
18 1 6 9 19 8 4 16 10
19 77
14 17 18 3 2 4 7 6 8 9 10 20 13 12 15 19 1 16 5
I also tried brute-force search, but such a solution would be too slow
from itertools import combinations
N = int(input())
dct = {}
s = 0
for i in range(N):
a,c = [int(j) for j in input().split()]
dct[a] = c
s += c
m = s
M = int(input())
groups = []
for i in range(M):
k,c = [int(j) for j in input().split()]
s = 0
tmp = []
for j in input().split():
j_=int(j)
if j_ in dct:
s+=dct[j_]
tmp.append(j_)
groups.append( [c, tmp] )
for u in range(1,M+1):
for i in list(combinations(groups, u)):
s = 0
tmp = dct.copy()
for j in i:
s += j[0]
for t in j[1]:
if t in tmp:
del(tmp[t])
for j in tmp:
s += tmp[j]
#print(i,s)
if s < m:
m = s
print(m)
I think that this problem is solved with the help of dynamic programming. Perhaps this is some variation of the typical Knapsack problem. Tell me which algorithm is better to use.

The so-called set cover problem(which is NP-Hard) seems like a special case of your problem. Therefore, I am afraid there is no efficient algorithm that solves it.

As already stated, this is a hard problem for which no "efficient" algorithm exists.
You can approach this as a graph problem, where the nodes of the graph are all possible combinations of groups (where each element on its own is also a group). Two nodes u and v are connected with a directed edge when there is a group g such that the union of the keys in u and in g, corresponds to the set of keys in v.
Then perform a Dijkstra search in this graph, starting from the node that represents the state where no groups are selected at all (cost 0, no keys). This search will minimise the cost, and you can use the extra optimisation that a group g is never considered twice in the same path. As soon as a state (node) is visited that covers all the keys, you can exit the algorithm -- typical for the Dijkstra algorithm -- as this represents the minimal cost to cover all the keys.
Such an algorithm is still quite costly, as at each addition of an edge to a path, a union of keys must be calculated. And,... quite some memory is needed to keep all states in the heap.
Here is a potential implementation:
from collections import namedtuple
import heapq
# Some named tuple types, to make the code more readable
Group = namedtuple("Group", "cost numtodo keys")
Node = namedtuple("Node", "cost numtodo keys nextgroupid")
def collectinput():
inputNumbers = lambda: [int(j) for j in input().split()]
groups = []
keys = []
N, = inputNumbers()
for i in range(N):
key, cost = inputNumbers()
keys.append(key)
# Consider these atomic keys also as groups (with one key)
# The middle element of this tuple may seem superficial, but it improves sorting
groups.append(Group(cost, N-1, [key]))
keys = set(keys)
M, = inputNumbers()
for i in range(M):
cost = inputNumbers()[-1]
groupkeys = [key for key in inputNumbers() if key in keys]
groups.append(Group(cost, N-len(groupkeys), groupkeys))
return keys, groups
def solve(keys, groups):
N = len(keys)
groups.sort() # sort by cost, if equal, by number of keys left
# The starting node of the graph search
heap = [Node(0, N, [], 0)]
while len(heap):
node = heapq.heappop(heap)
if node.numtodo == 0:
return node.cost
for i in range(node.nextgroupid, len(groups)):
group = groups[i]
unionkeys = list(set(node.keys + group.keys))
if len(unionkeys) > len(node.keys):
heapq.heappush(heap, Node(node.cost + group.cost, N-len(unionkeys), unionkeys, i+1))
# Main
keys, groups = collectinput()
cost = solve(keys, groups)
print("solution: {}".format(cost))
This outputs 160 for the second problem you posted.

Related

Remove part of a string with coordinates in python

Hello I have a list of tuple such as :
indexes_to_delete=((6,9),(20,22),(2,4))
and a sequence that I can open using Biopython :
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
and from indexes_to_delete file I would like to remove the part from :
6 to 9
20 to 22
and
2 to 4
so if I follow these coordinate I should have a new_sequence :
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
so if I remove the coordinates I get :
A E J K L M N O P Q R S W X Y Z
1 5 10 11 12 13 14 15 16 17 18 19 23 24 25 26

indexes_to_delete=((6,9),(20,22),(2,4))
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
s = ''.join(ch for i, ch in enumerate(Sequence1, 1) if not any(a <= i <= b for a, b in indexes_to_delete))
print(s)
Prints:
AEJKLMNOPQRSWXYZ

Here is another approach using several modules.
from string import ascii_uppercase
from intspan import intspan
from operator import itemgetter
indexes_to_delete=((6,9),(20,22),(2,4))
# add dummy 'a' so count begins with 1 for uppercase letters
array = ['a'] + list(ascii_uppercase)
indexes_to_keep = intspan.from_ranges(indexes_to_delete).complement(low = 1, high=26)
slice_of = itemgetter(*indexes_to_keep)
print(' '.join(slice_of(array)))
print(' '.join(map(str,indexes_to_keep)))
Prints:
A E J K L M N O P Q R S W X Y Z
1 5 10 11 12 13 14 15 16 17 18 19 23 24 25 26

def delete_indexes(sequence, indexes_to_delete):
# first convert the sequence to a dictionary
seq_dict = {i+1: sequence[i] for i in range(len(sequence))}
# collect all the keys that need to be removed
keys_to_delete = []
for index_range in indexes_to_delete:
start, end = index_range
keys_to_delete += range(start, end+1)
if not keys_to_delete:
return seq_dict
# reomove the keys from the original dictionary
for key in keys_to_delete:
seq_dict.pop(key)
return seq_dict
You can use this function to get the new sequence.
new_sequence = delete_indexes(Sequence1, indexes_to_delete)
Of course, the new_sequence is still a python dictionary. You can convert it to list or str, or whatever. For example, to convert it into a str as the old Sequence1:
print(''.join(list(new_sequence.values())))
Out[7]:
AEJKLMNOPQRSWXYZ
You can get their coordinates using new_sequence.keys().

A bit more readable version:
indexes_to_delete=((6,9),(20,22),(2,4))
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
newSequence1 = ""
for idx, char in enumerate(Sequence1):
for startIndex, endIndex in indexes_to_delete:
if startIndex <= idx+1 <= endIndex:
break
else:
newSequence1 += char
print(newSequence1)
Prints: AEJKLMNOPQRSWXYZ

Ulam Spiral (for diagonal numbers) writing program python

I am writing a code the represent the Ulam Spiral Diagonal Numbers and this is the code I typed myself
t = 1
i = 2
H = [1]
while i < 25691 :
for n in range(4):
t += i
H.append(t)
i += 2
print(H)
The number "25691" in the code is the side lenght of the spiral.If it was 7 then the spiral would contain 49 numbers etc.
Here H will give you the all numbers in diagonal. But I wonder is there a much faster way to do this.
For example if I increase the side lenght large amount it really takes forever to calculate the next H.
Code Example:
t = 1
i = 2
H = [1]
for j in range(25000,26000):
while i < j :
for n in range(4):
t += i
H.append(t)
i += 2
For example my computer cannot calculate it so, is there a faster way to do this ?

You dont need to calculate the intermediate values:
Diagonal, horizontal, and vertical lines in the number spiral correspond to polynomials of the form
where b and c are integer constants.
wikipedia
You can find b and c by solving a linear system of equations for two numbers.
17 16 15 14 13
18 5 4 3 12 ..
19 6 1 2 11 28
20 7 8 9 10 27
21 22 23 24 25 26
Eg for the line 1,2,11,28 etc:
f(0) = 4*0*0+0*b+c = 1 => c = 1
f(1) = 4*1*1+1*b+1 = 2 => 5+b = 2 => b = -3
f(2) = 4*2*2+2*(-3)+1 = 11
f(3) = 4*3*3+3*(-3)+1 = 28

How to make a grid with integers in python?

I have the following code which has to print out a board with numbers according to the size the user specified (for instance 3 means a 3 x 3 board):
n = d * d
count = 1
board = []
for i in range(d):
for j in range(d):
number = n - count
if number >= 0 :
tile = number
board.append[tile]
else:
exit(1)
count += 1
print(board)
I need to get this in a grid, so that the board is 3 x 3 in size ike this:
8 7 6
5 4 3
2 1 0
What I tried to do is to get each row in a list (so [8 7 6] [5 4.. etc) and then print those lists in a grid. In order to do that, I guess I would have to create an empty list and then add the numbers to that list, stopping after every d, so that each list is the specified length.
I now have a list of the numbers I want, but how do I seperate them into a grid?
I would really appreciate any help!

Here a function that takes the square size and print it.
If you need explanation don't hesitate to ask.
def my_print_square(d):
all_ = d * d
x = list(range(all_))
x.sort(reverse=True) # the x value is a list with all value sorted reverse.
i=0
while i < all_:
print(" ".join(map(str, x[i:i+d])))
i += d
my_print_square(5)
24 23 22 21 20
19 18 17 16 15
14 13 12 11 10
9 8 7 6 5
4 3 2 1 0

By default the print() function adds "\n" to the end of the string you want to print. You can override this by passing in the end argument.
print(string, end=" ")
In this case we are adding a space instead of a line break.
And then we have to print the linebreaks manually with print() at the end of each row.
n = d * d
count = 1
max_len = len(str(n-1))
form = "%" + str(max_len) + "d"
for i in range(d):
for j in range(d):
number = n - count
if number >= 0 :
tile = number
else:
exit(1)
count += 1
print(form%(tile), end=" ")
print()
EDIT: by figuring out the maximum length of the numbers we can adjust the format in which they're printed. This should support any size of board.

You can create the board as a nested list, where each list is a row in the board. Then concatenate them at the end:
def get_board(n):
# get the numbers
numbers = [i for i in range(n * n)]
# create the nested list representing the board
rev_board = [numbers[i:i+n][::-1] for i in range(0, len(numbers), n)]
return rev_board
board = get_board(3)
# print each list(row) of the board, from end to start
print('\n'.join(' '.join(str(x) for x in row) for row in reversed(board)))
Which outputs:
8 7 6
5 4 3
2 1 0
If you want to align the numbers for 4 or 5 sized grids, just use a %d format specifier:
board = get_board(4)
for line in reversed(board):
for number in line:
print("%2d" % number, end = " ")
print()
Which gives an aligned grid:
15 14 13 12
11 10 9 8
7 6 5 4
3 2 1 0

Improving runtime of truck pick up and drop off

If anyone can help with improving the runtime that would be great!
I have a truck that has a max capacity of C and a beginning stock on it of S1 The truck goes through a fixed route Depot --> 1 --> 2 --> ... --> N-1 --> N --> Depot
Each station i=1…n has a current stock items of Xi and the objective stock items of Xi* At each station the truck can decide to drop-off or take the amount of items possible according to the situation. Let Yi be the number of items left after the truck visited station i The total cost is TC (as written in the code).
I implemented a dynamic programming code whereas xd is the number of units taken or dropped at each station and s is the number of items on the truck:
run on -min(c-s,xi)<= xd <= s: f(i,s) = f(i+1, s-xd) - so if xd is in minus it means the truck took items from a station.
this is the code - the problem is that it's running for days and not returning an answer.
anyone know a way to implement it better?
n = 50
c=10
s1 = 6
xi = [59,33,14,17,26,31,91,68,3,53,53,73,86,24,98,37,55,14,97,61,57,23,65,24,50,31,39,31,24,60,92,80,48,28,47,81,19,82,3,74,50,89,86,37,98,11,12,94,6,61]
x_star = [35,85,51,88,44,20,79,68,97,7,68,19,50,19,42,45,8,9,61,60,80,4,96,57,100,22,2,51,56,100,6,84,96,69,18,31,86,6,39,6,78,73,14,45,100,43,89,4,76,70]
c_plus = [4.6,1.3,2.7,0.5,2.7,5,2.7,2.6,4.1,4,3.2,3.1,4.8,3.1,0.8,1,0.5,5,5,4.6,2.5,4.1,2.1,2.9,1.4,3.9,0.5,1.7,4.9,0.6,2.8,4.9,3.3,4.7,3.6,2.4,3.4,1.5,1.2,0.5,4.3,4.3,3.9,4.8,1.2,4.8,2,2.2,5,4.5]
c_minus = [8.7,7.5,11.7,6.9,11.7,14.4,7.5,11.1,1.2,1.5,12,8.1,2.7,8.7,9.3,1.5,0.3,1.5,1.2,12.3,5.7,0.6,8.7,8.1,0.6,3.9,0.3,5.4,14.7,0,10.8,6.6,8.4,9.9,14.7,2.7,1.2,10.5,9.3,14.7,11.4,5.4,6,13.2,3.6,7.2,3,4.8,9,8.1]
dict={}
values={}
def tc(i,xd):
yi = xi[i-1] + xd
if yi>=x_star[i-1]:
tc = c_plus[i-1]*(yi-x_star[i-1])
else:
tc = c_minus[i-1]*(x_star[i-1]-yi)
return tc
def func(i,s):
if i==n+1:
return 0
else:
a=[]
b=[]
start = min(c-s,xi[i-1])*-1
for xd in range(start,s+1):
cost = tc(i,xd)
f= func(i+1,s-xd)
a.append(cost+f)
b.append(xd)
min_cost = min(a)
index = a.index(min_cost)
xd_optimal = b[index]
if i in values:
if values[i]>min_cost:
dict[i] = xd_optimal
values[i] = min_cost
else:
values[i] = min_cost
dict[i] = xd_optimal
return min_cost
best_cost = func(1,s1)
print best_cost
print dict

First, the solution:
The function is called very often with exactly the same parameters. Thus, I added a cache that avoids repeating the calculations for recurring parameter sets. This returns the answer almost instantly on my computer.
cache = {}
def func(i,s):
if i==n+1:
return 0
else:
try:
return cache[(i,s)]
except KeyError:
pass
a=[]
...
cache[(i,s)] = min_cost
return min_cost
And here is how I found out what to do...
I modified your code to produce some debug output:
...
count = 0
def func(i,s):
global count
count += 1
print count, ':', i, s
...
Setting n to 2 results in the following output:
1 : 1 6
2 : 2 10
3 : 3 10
4 : 3 9
5 : 3 8
6 : 3 7
7 : 3 6
8 : 3 5
9 : 3 4
10 : 3 3
11 : 3 2
12 : 3 1
13 : 3 0
14 : 2 9
15 : 3 10
16 : 3 9
17 : 3 8
18 : 3 7
19 : 3 6
20 : 3 5
21 : 3 4
22 : 3 3
23 : 3 2
24 : 3 1
25 : 3 0
26 : 2 8
27 : 3 10
28 : 3 9
29 : 3 8
30 : 3 7
31 : 3 6
32 : 3 5
...
You will notice that the function is called very often with the same set of parameters.
After (i=2, s=10) it runs through all combinations of (i=3, s=x). It does that again after (i=2, s=9). The whole thing finishes after 133 recursions. Setting n=3 takes 1464 recursions, and setting n=4 takes 16105 recursions. You can see where that leads to...
Remark: I have absolutely no idea how your optimization works. Instead I simply treated the symptoms :)

Binning values into groups with a minimum size using pandas

I'm trying to bin a sample of observations into n discrete groups, then combine these groups until each subgroup has a mimimum of 6 members. So far, I've generated bins, and grouped my DataFrame into them:
# df is a DataFrame containing 135 measurments
bins = np.linspace(df.heights.min(), df.heights.max(), 21)
grp = df.groupby(np.digitize(df.heights, bins))
grp.size()
1 4
2 1
3 2
4 3
5 2
6 8
7 7
8 6
9 19
10 12
11 13
12 12
13 7
14 12
15 12
16 2
17 3
18 6
19 3
21 1
So I can see that I need to combine groups 1 - 3, 3 - 5, and 16 - 21, while leaving the others intact, but I don't know how to do this programmatically.

You can do this:
df = pd.DataFrame(np.random.random_integers(1,200,135), columns=['heights'])
bins = np.linspace(df.heights.min(), df.heights.max(), 21)
grp = df.groupby(np.digitize(df.heights, bins))
sizes = grp.size()
def f(vals, max):
sum = 0
group = 1
for v in vals:
sum += v
if sum <= max:
yield group
else:
group +=1
sum = v
yield group
#I've changed 6 by 30 for the example cause I don't have your original dataset
grp.size().groupby([g for g in f(sizes, 30)])
And if you do print grp.size().groupby([g for g in f(sizes, 30)]).cumsum() you will see that the cumulative sums is grouped as expected.
Also if you want to group the original values you can do something like:
dat = np.random.random_integers(0,200,135)
dat = np.array([78,116,146,111,147,78,14,91,196,92,163,144,107,182,58,89,77,134,
83,126,94,70,121,175,174,88,90,42,93,131,91,175,135,8,142,166,
1,112,25,34,119,13,95,182,178,200,97,8,60,189,49,94,191,81,
56,131,30,107,16,48,58,65,78,8,0,11,45,179,151,130,35,64,
143,33,49,25,139,20,53,55,20,3,63,119,153,14,81,93,62,162,
46,29,84,4,186,66,90,174,55,48,172,83,173,167,66,4,197,175,
184,20,23,161,70,153,173,127,51,186,114,27,177,96,93,105,169,158,
83,155,161,29,197,143,122,72,60])
df = pd.DataFrame({'heights':dat})
bins = np.digitize(dat,np.linspace(0,200,21))
grp = df.heights.groupby(bins)
m = 15 #you should put 6 here, the minimun
s = 0
c = 1
def f(x):
global c,s
res = pd.Series([c]*x.size,index=x.index)
s += x.size
if s>m:
s = 0
c += 1
return res
g = grp.apply(f)
print df.groupby(g).size()
#another way of doing the same, just a matter of taste
m = 15 #you should put 6 here, the minimun
s = 0
c = 1
def f2(x):
global c,s
res = [c]*x.size #here is the main difference with f
s += x.size
if s>m:
s = 0
c += 1
return res
g = grp.transform(f2) #call it this way
print df.groupby(g).size()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

An algorithm for efficiently replacing an array elements with groups - python

The so-called set cover problem(which is NP-Hard) seems like a special case of your problem. Therefore, I am afraid there is no efficient algorithm that solves it.

Related

Remove part of a string with coordinates in python

Ulam Spiral (for diagonal numbers) writing program python

How to make a grid with integers in python?

Improving runtime of truck pick up and drop off

Binning values into groups with a minimum size using pandas

Categories

Resources