I have this Numpy code:
def uniq(seq):
"""
Like Unix tool uniq. Removes repeated entries.
:param seq: numpy.array. (time,) -> label
:return: seq
"""
diffs = np.ones_like(seq)
diffs[1:] = seq[1:] - seq[:-1]
idx = diffs.nonzero()
return seq[idx]
Now, I want to extend this to support 2D arrays and make it use Theano. It should be fast on the GPU.
I will get an array with multiple sequences as multiple batches in the format (time,batch), and a time_mask which specifies indirectly the length of each sequence.
My current try:
def uniq_with_lengths(seq, time_mask):
# seq is (time,batch) -> label
# time_mask is (time,batch) -> 0 or 1
num_batches = seq.shape[1]
diffs = T.ones_like(seq)
diffs = T.set_subtensor(diffs[1:], seq[1:] - seq[:-1])
time_range = T.arange(seq.shape[0]).dimshuffle([0] + ['x'] * (seq.ndim - 1))
idx = T.switch(T.neq(diffs, 0) * time_mask, time_range, -1)
seq_lens = T.sum(T.ge(idx, 0), axis=0) # (batch,) -> len
max_seq_len = T.max(seq_lens)
# I don't know any better way without scan.
def step(batch_idx, out_seq_b1):
out_seq = seq[T.ge(idx[:, batch_idx], 0).nonzero(), batch_idx][0]
return T.concatenate((out_seq, T.zeros((max_seq_len - out_seq.shape[0],), dtype=seq.dtype)))
out_seqs, _ = theano.scan(
step,
sequences=[T.arange(num_batches)],
outputs_info=[T.zeros((max_seq_len,), dtype=seq.dtype)]
)
# out_seqs is (batch,max_seq_len)
return out_seqs.T, seq_lens
How to construct out_seqs directly?
I would do something like out_seqs = seq[idx] but I'm not exactly sure how to express that.
Here's a quick answer that only addresses part of your task:
def compile_theano_uniq(x):
diffs = x[1:] - x[:-1]
diffs = tt.concatenate([tt.ones_like([x[0]], dtype=diffs.dtype), diffs])
y = diffs.nonzero_values()
return theano.function(inputs=[x], outputs=y)
theano_uniq = compile_theano_uniq(tt.vector(dtype='int32'))
The key is nonzero_values().
Update: I can't imagine any way to do this without using theano.scan. To be clear, and using 0 as padding, I'm assuming that given the input
1 1 2 3 3 4 0
1 2 2 2 3 3 4
1 2 3 4 5 0 0
you would want the output to be
1 2 3 4 0 0 0
1 2 3 4 0 0 0
1 2 3 4 5 0 0
or even
1 2 3 4 0
1 2 3 4 0
1 2 3 4 5
You could identify the indexes of the items you want to keep without using scan. Then either a new tensor needs to be constructed from scratch or the values you want to keep some how moved to make the sequences contiguous. Neither approaches seem feasible without theano.scan.
Related
This is my Dijkstra's implementation. It's passing all cases in the pytest input.n.txt files but when I submit to the grading software (that doesn't provide the test or any output) I get an invalid result.
Here's my solution (passes all provided test cases, but not hidden ones).
# Uses python3
import queue
import sys
from math import inf
def dijkstra(adj, cost, s, t):
seen = set([s])
dist = [inf] * len(adj)
dist[s] = 0
prev = [None] * len(adj)
prev[s] = s
q = queue.Queue()
q.put(s)
while not q.empty():
n = q.get()
# print(n)
edges = []
for i, adjacent in enumerate(adj[n]):
edges.append([adjacent, cost[n][i]])
for i, edge in enumerate(edges):
d = dist[n] + edge[1]
if d < dist[edge[0]]:
dist[edge[0]] = d
edge[1] = d
prev[edge[0]] = n
edges = sorted(edges, key=lambda x: x[1])
for (e, w) in edges:
if not e in seen:
seen.add(e)
q.put(e)
# print(dist)
# print(prev)
return dist[t] if dist[t] is not inf else -1
def parse(input):
data = list(map(int, input.split()))
n, m = data[0:2]
data = data[2:]
edges = list(zip(zip(data[0 : (3 * m) : 3], data[1 : (3 * m) : 3]), data[2 : (3 * m) : 3]))
data = data[3 * m :]
adj = [[] for _ in range(n)]
cost = [[] for _ in range(n)]
for ((a, b), w) in edges:
adj[a - 1].append(b - 1)
cost[a - 1].append(w)
s, t = data[0] - 1, data[1] - 1
return dijkstra(adj, cost, s, t)
if __name__ == "__main__":
input = sys.stdin.read()
print(parse(input))
def test_parse():
assert 3 == parse(open("input.txt").read())
assert 6 == parse(open("input.1.txt").read())
assert -1 == parse(open("input.2.txt").read())
assert 3 == parse(open("input.3.txt").read())
assert 0 == parse(open("input.4.txt").read())
assert 0 == parse(open("input.5.txt").read())
The format of the input is as follows...
number_of_vertices number_of_edges
from to weight
from to weight
start end
input.txt
4 4
1 2 1
4 1 2
2 3 2
1 3 5
1 3
input.1.txt
5 9
1 2 4
1 3 2
2 3 2
3 2 1
2 4 2
3 5 4
5 4 1
2 5 3
3 4 4
1 5
input.2.txt
3 3
1 2 7
1 3 5
2 3 2
3 2
input.3.txt
5 5
1 2 1
1 3 2
2 3 1
2 4 6
3 4 1
1 4
input.4.txt
5 6
1 2 1
1 3 2
2 3 1
2 4 6
3 4 1
1 1 2
1 1
input.5.txt
4 4
1 2 1
2 3 1
3 4 1
4 1 1
1 1
My program passes ALL of these. And I've tried messing around with all the edge cases I can think of testing but still it fails with a "Wrong answer" error in the testing software.
One of the comments of the thread of somebody who DID solve it:
Wow! I really struggled to put this one together, not because I didn't
understand the Dijkstra algorithm but because of the difficulty in
adjusting the priority of an item already added to a Python
PriorityQueue class (whose use was implied by importing queue in the
start code) or by keeping track of its position in the priority queue,
which made translating the algorithm, as presented in the lectures,
verbatim difficult.
In case it is helpful to others, the way I got around this was to move
from thinking in terms of inserting vertices to the priority queue to
inserting references to the vertices, with most updated distance at
the time of insertion as the priority, instead. That way we don't need
to adjust the priority of an item already added to the queue at a
later time.
We may end up inserting several references to the same vertex to the
queue, but we will, of course, encounter the reference with the least
distance first, and we can ignore any future references to the same
vertex that we might encounter afterwards. Further, we can abort the
algorithm as soon as we've popped a reference to the destination
vertex.
This still runs pretty efficiently (for me, a maximum time of about a
twentieth of that allowed), and is, in retrospect, a small adjustment
in viewing the problem.
Your algorithm uses a queue; Dijkstra's algorithm does not use a queue.
At each iteration you must select the unconfirmed vertex with the shortest path distance. This can be done using a min-priority queue, where the path distance is the priority, but note also that each vertex may have to be added to the priority queue more than once if it is discovered via different paths of different distances. (Your classmate initially tried to do this the hard way - by updating the priority of a vertex already in the priority queue, instead of just allowing each vertex to be present in the priority queue multiple times.)
So your algorithm is not a proper implementation of Dijkstra's algorithm, because it confirms the vertices in the order they are discovered, rather than in order of path distance from the source vertex.
I have a matrix as shown below (taken from a txt file with an argument), and every cell has neighbors. Once you pick a cell, that cell and all neighboring cells that containing the same number will disappear.
1 0 4 7 6 8
0 5 4 4 5 5
2 1 4 4 4 6
4 1 3 7 4 4
I've tried to do this with using recursion. I separated function four parts which are up(), down() , left() and right(). But I got an error message: RecursionError: maximum recursion depth exceeded in comparison
cmd=input("Row,column:")
cmdlist=command.split(",")
row,column=int(cmdlist[0]),int(cmdlist[1])
num=lines[row-1][column-1]
def up(x,y):
if lines[x-2][y-1]==num and x>1:
left(x,y)
right(x,y)
lines[x-2][y-1]=None
def left(x,y):
if lines[x-1][y-2]==num and y>1:
up(x,y)
down(x,y)
lines[x-1][y-2]=None
def right(x,y):
if lines[x-1][y]==num and y<len(lines[row-1]):
up(x,y)
down(x,y)
lines[x-1][y]=None
def down(x,y):
if lines[x][y-1]==num and x<len(lines):
left(x,y)
right(x,y)
lines[x][y-1]=None
up(row,column)
down(row,column)
for i in lines:
print(str(i).strip("[]").replace(",","").replace("None"," "))
When I give the input (3,3) which represents the number of "4", the output must be like this:
1 0 7 6 8
0 5 5 5
2 1 6
4 1 3 7
I don't need fixed code, just the main idea will be enough. Thanks a lot.
Recursion error happens when your recursion does not terminate.
You can solve this without recursing using set's of indexes:
search all indexes that contain the looked for number into all_num_idx
add the index you are currently at (your input) to a set tbd (to be deleted)
loop over the tbd and add all indexed from all_num_idx that differ only in -1/+1 in row or col to any index thats already in the set
do until tbd does no longer grow
delete all indexes from tbd:
t = """4 0 4 7 6 8
0 5 4 4 5 5
2 1 4 4 4 6
4 1 3 7 4 4"""
data = [k.strip().split() for k in t.splitlines()]
row,column=map(int,input("Row,column:").strip().split(";"))
num = data[row][column]
len_r =len(data)
len_c = len(data[0])
all_num_idx = set((r,c) for r in range(len_r) for c in range(len_c) if data[r][c]==num)
tbd = set( [ (row,column)] ) # inital field
tbd_size = 0 # different size to enter while
done = set() # we processed those already
while len(tbd) != tbd_size: # loop while growing
tbd_size=len(tbd)
for t in tbd:
if t in done:
continue
# only 4-piece neighbourhood +1 or -1 in one direction
poss_neighbours = set( [(t[0]+1,t[1]), (t[0],t[1]+1),
(t[0]-1,t[1]), (t[0],t[1]-1)] )
# 8-way neighbourhood with diagonals
# poss_neighbours = set((t[0]+a,t[1]+b) for a in range(-1,2) for b in range(-1,2))
tbd = tbd.union( poss_neighbours & all_num_idx)
# reduce all_num_idx by all those that we already addded
all_num_idx -= tbd
done.add(t)
# delete the indexes we collected
for r,c in tbd:
data[r][c]=None
# output
for line in data:
print(*(c or " " for c in line) , sep=" ")
Output:
Row,column: 3,4
4 0 7 6 8
0 5 5 5
2 1 6
4 1 3 7
This is a variant of a "flood-fill-algorythm" flooding only cells of a certain value. See https://en.wikipedia.org/wiki/Flood_fill
Maybe you should replace
def right(x,y):
if lines[x-1][y]==num and y<len(lines[row-1]):
up(x,y)
down(x,y)
lines[x-1][y]=None
by
def right(x,y):
if lines[x-1][y]==num and y<len(lines[row-1]):
lines[x-1][y]=None
up(x - 1,y)
down(x - 1,y)
right(x - 1, y)
and do the same for all the other functions.
Putting lines[x-1][y]=None ensure that your algorithm stops and changing the indices ensure that the next step of your algorithm will start from the neighbouring cell.
In Python 2.7 I need a method that returns all possible products of a list or tuple of int. Ie. if input is (2, 2, 3, 4), then I'd want a output like
(3, 4, 4), 2 * 2 = 4
(2, 4, 6), 2 * 3 = 6
(2, 3, 8), 2 * 4 = 8
(3, 4, 4), 2 * 2 = 4
(2, 2, 12), 3 * 4 = 12
(2, 24), 2 * 3 * 4 = 24
(3, 16), 2 * 2 * 4 = 16
(4, 12), 2 * 2 * 3 = 12
(48), 2 * 2 * 3 * 4 = 48
wrapped up in a list or tuple. I figure that a nice implementation is probably possible using combinations from itertools, but I'd appreciate any help. Note that I am only interested in distinct lists, where order of int plays no role.
EDIT
Some futher explanation for some clarification. Take the first output list. Input is (2, 2, 3, 4) (always). Then I take 2 and 2 out of the list and multiply them, so now I am left with a list (3, 4, 4). 3 and 4 from the input and the last 4 from the product.
I haven't tried anything yet since I just can't spin my head around that kind of loop. But I can't stop thinking about the problem, so I'll add some code if I do get a suggestion.
Your problem is basically one of find all subsets of a given set (multiset in your case). Once you have the subsets its straight forward to construct the output you've asked for.
For a set A find all the subsets [S0, S1, ..., Si]. For each subset Si, take (A - Si) | product(Si), where | is union and - is a set difference. You might not be interested in subsets of size 0 and 1, so you can just exclude those.
Finding subsets is a well known problem so I'm sure you can find resources on how to do that. Keep in mind that there are 2**N setbsets of a set with N elements.
Suppose you have a vector of 4 numbers (for instance (2,2,3,4)).
You can generate a grid (as that one showed below):
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
Now remove the rows with all '0' and the rows with only one '1'.
0 0 1 1
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
Now you can substitute the '1' with the respective element in the vector.
If your vector is (2,2,3,4) it becomes:
0 0 3 4
0 2 0 4
0 2 3 0
0 2 3 4
2 0 0 4
2 0 3 0
2 0 3 4
2 2 0 0
2 2 0 4
2 2 3 0
2 2 3 4
Try to implement this in Python.
Below a pseudo code:
for i from 0 to 2^VECTOR_LEN:
bin=convert_to_binary(i)
if sum_binary_digit(bin) > 1:
print(exec_moltiplication(bin,vector)
# if you want you can also use the bin vector as mask for the updating
# of your tuple of int with the result of the product and append it
# in a list (as in your example).
# For example if bin is (1 1 0 0) you can edit (2 2 3 4) in (4 3 4)
# and append (4 3 4) inside the list or if it is (1 0 1 0) you can
# update (2 2 3 4) in (6 2 4)
WHERE:
vector: is the vector containing the numbers
VECTOR_LEN is the length of vector
convert_to_binary(num) is a function that convert an integer (num) to binary
sum_binary_digit(bin) is a function that sum the 1s in your binary number (bin)
exec_multiplication(vector,bin) take in input the vector (vector) and the binary (bin) and returns the value of the multiplication.
I can't give you the algo(as i don't know it myself), but there is lib which can achieve this task...
Looking at you given input numbers, they seem to be factors, so if we multiply all of these factors we get a number(say x), now using sympy, we can get all of the divisors of that number:--
import numpy
ls = [2,2,3,4]
x = numpy.prod(ls)
from sympy import divisors
divisors_x = divisors(x)
Here you go!! this the list(divisors_x )
You can break this down into three steps:
get all the permutations of the list of numbers
for each of those permutations, create all the possible partitions
for each sublist in the partitions, calculate the product
For the permutations, you can use itertools.permutations, but as far as I know, there is no builtin function for partitions, but that's not too difficult to write (or to find):
def partitions(lst):
if lst:
for i in range(1, len(lst) + 1):
for p in partitions(lst[i:]):
yield [lst[:i]] + p
else:
yield []
For a list like (1,2,3,4), this will generate [(1),(2),(3),(4)], [(1),(2),(3,4)], [(1),(2,3),(4)], [(1),(2,3,4)], and so on, but not, e.g. [(1,3),(2),(4)]; that's why we also need the permutations. However, for all the permutations, this will create many partitions that are effectively duplicates, like [(1,2),(3,4)] and [(4,3),(1,2)] (182 for your data), but unless your lists are particularly long, this should not be too much of a problem.
We can combine the second and third step; this way we can weed out all the duplicates as soon as they arise:
data = (2, 2, 3, 4)
res = {tuple(sorted(reduce(operator.mul, lst) for lst in partition))
for permutation in itertools.permutations(data)
for partition in partitions(permutation)}
Afterwards, res is {(6, 8), (2, 4, 6), (2, 2, 3, 4), (2, 2, 12), (48,), (3, 4, 4), (4, 12), (3, 16), (2, 24), (2, 3, 8)}
Alternatively, you can combine it all in one, slightly more complex algorithm. This still generates some duplicates, due to the two 2 in your data set, that can again be removed by sorting and collecting in a set. The result is the same as above.
def all_partitions(lst):
if lst:
x = lst[0]
for partition in all_partitions(lst[1:]):
# x can either be a partition itself...
yield [x] + partition
# ... or part of any of the other partitions
for i, _ in enumerate(partition):
partition[i] *= x
yield partition
partition[i] //= x
else:
yield []
res = set(tuple(sorted(x)) for x in all_partitions(list(data)))
Working on a project for CS1 that prints out a grid made of 0s and adds shapes of certain numbered sizes to it. Before it adds a shape it needs to check if A) it will fit on the grid and B) if something else is already there. The issue I am having is that when run, the function that checks to make sure placement for the shapes is valid will always do the first and second shapes correctly, but any shape added after that will only "see" the first shape added when looking for a collision. I checked to see if it wasnt taking in the right list after the first time but that doesnt seem to be it. Example of the issue....
Shape Sizes = 4, 3, 2, 1
Python Outputs:
4 4 4 4 1 2 3 0
4 4 4 4 2 2 3 0
4 4 4 4 3 3 3 0
4 4 4 4 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
It Should Output:
4 4 4 4 3 3 3 1
4 4 4 4 3 3 3 0
4 4 4 4 3 3 3 0
4 4 4 4 2 2 0 0
0 0 0 0 2 2 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
What's going on here? Full Code is below...
def binCreate(size):
binlist = [[0 for col in range(size)] for row in range(size)]
return binlist
def binPrint(lst):
for row in range(len(lst)):
for col in range(len(lst[row])):
print(lst[row][col], end = " ")
print()
def itemCreate(fileName):
lst = []
for i in open(fileName):
i = i.split()
lst = i
lst = [int(i) for i in lst]
return lst
def main():
size = int(input("Bin Size: "))
fileName = str(input("Item Size File: "))
binList = binCreate(size)
blockList = itemCreate(fileName)
blockList.sort(reverse = True)
binList = checker(binList, len(binList), blockList)
binPrint(binList)
def isSpaceFree(binList, r, c, size):
if r + size > len(binList[0]):
return False
elif c + size > len(binList[0]):
return False
for row in range(r, r + size):
for col in range(c, c + size):
if binList[r][c] != 0:
return False
elif binList[r][c] == size:
return False
return True
def checker(binList, gSize, blockList):
for i in blockList:
r = 0
c = 0
comp = False
while comp != True:
check = isSpaceFree(binList, r, c, i)
if check == True:
for x in range(c, c+ i):
for y in range(r, r+ i):
binList[x][y] = i
comp = True
else:
print(c)
print(r)
r += 1
if r > gSize:
r = 0
c += 1
if c > gSize:
print("Imcompadible")
comp = True
print(i)
binPrint(binList)
input()
return binList
Your code to test for open spaces looks in binList[r][c] (where r is a row value and c is a column value). However, the code that sets the values once an open space has been found sets binList[x][y] (where x is a column value and y is a row value).
The latter is wrong. You want to set binList[y][x] instead (indexing by row, then column).
That will get you a working solution, but it will still not be exactly what you say you expect (you'll get a reflection across the diagonal). This is because your code updates r first, then c only when r has exceeded the bin size. If you want to place items to the right first, then below, you need to swap them.
I'd suggest using two for loops for r and c, rather than a while too, but to make it work in an elegant way you'd probably need to factor out the "find one item's place" code so you could return from the inner loop (rather than needing some complicated code to let you break out of both of the nested loops).
I have two arrays (a and b) with n integer elements in the range (0,N).
typo: arrays with 2^n integers where the largest integer takes the value N = 3^n
I want to calculate the sum of every combination of elements in a and b (sum_ij_ = a_i_ + b_j_ for all i,j). Then take modulus N (sum_ij_ = sum_ij_ % N), and finally calculate the frequency of the different sums.
In order to do this fast with numpy, without any loops, I tried to use the meshgrid and the bincount function.
A,B = numpy.meshgrid(a,b)
A = A + B
A = A % N
A = numpy.reshape(A,A.size)
result = numpy.bincount(A)
Now, the problem is that my input arrays are long. And meshgrid gives me MemoryError when I use inputs with 2^13 elements. I would like to calculate this for arrays with 2^15-2^20 elements.
that is n in the range 15 to 20
Is there any clever tricks to do this with numpy?
Any help will be highly appreciated.
--
jon
try chunking it. your meshgrid is an NxN matrix, block that up to 10x10 N/10xN/10 and just compute 100 bins, add them up at the end. this only uses ~1% as much memory as doing the whole thing.
Edit in response to jonalm's comment:
jonalm: N~3^n not n~3^N. N is max element in a and n is number of
elements in a.
n is ~ 2^20. If N is ~ 3^n then N is ~ 3^(2^20) > 10^(500207).
Scientists estimate (http://www.stormloader.com/ajy/reallife.html) that there are only around 10^87 particles in the universe. So there is no (naive) way a computer can handle an int of size 10^(500207).
jonalm: I am however a bit curios about the pv() function you define. (I
do not manage to run it as text.find() is not defined (guess its in another
module)). How does this function work and what is its advantage?
pv is a little helper function I wrote to debug the value of variables. It works like
print() except when you say pv(x) it prints both the literal variable name (or expression string), a colon, and then the variable's value.
If you put
#!/usr/bin/env python
import traceback
def pv(var):
(filename,line_number,function_name,text)=traceback.extract_stack()[-2]
print('%s: %s'%(text[text.find('(')+1:-1],var))
x=1
pv(x)
in a script you should get
x: 1
The modest advantage of using pv over print is that it saves you typing. Instead of having to
write
print('x: %s'%x)
you can just slap down
pv(x)
When there are multiple variables to track, it's helpful to label the variables.
I just got tired of writing it all out.
The pv function works by using the traceback module to peek at the line of code
used to call the pv function itself. (See http://docs.python.org/library/traceback.html#module-traceback) That line of code is stored as a string in the variable text.
text.find() is a call to the usual string method find(). For instance, if
text='pv(x)'
then
text.find('(') == 2 # The index of the '(' in string text
text[text.find('(')+1:-1] == 'x' # Everything in between the parentheses
I'm assuming n ~ 3^N, and n~2**20
The idea is to work module N. This cuts down on the size of the arrays.
The second idea (important when n is huge) is to use numpy ndarrays of 'object' type because if you use an integer dtype you run the risk of overflowing the size of the maximum integer allowed.
#!/usr/bin/env python
import traceback
import numpy as np
def pv(var):
(filename,line_number,function_name,text)=traceback.extract_stack()[-2]
print('%s: %s'%(text[text.find('(')+1:-1],var))
You can change n to be 2**20, but below I show what happens with small n
so the output is easier to read.
n=100
N=int(np.exp(1./3*np.log(n)))
pv(N)
# N: 4
a=np.random.randint(N,size=n)
b=np.random.randint(N,size=n)
pv(a)
pv(b)
# a: [1 0 3 0 1 0 1 2 0 2 1 3 1 0 1 2 2 0 2 3 3 3 1 0 1 1 2 0 1 2 3 1 2 1 0 0 3
# 1 3 2 3 2 1 1 2 2 0 3 0 2 0 0 2 2 1 3 0 2 1 0 2 3 1 0 1 1 0 1 3 0 2 2 0 2
# 0 2 3 0 2 0 1 1 3 2 2 3 2 0 3 1 1 1 1 2 3 3 2 2 3 1]
# b: [1 3 2 1 1 2 1 1 1 3 0 3 0 2 2 3 2 0 1 3 1 0 0 3 3 2 1 1 2 0 1 2 0 3 3 1 0
# 3 3 3 1 1 3 3 3 1 1 0 2 1 0 0 3 0 2 1 0 2 2 0 0 0 1 1 3 1 1 1 2 1 1 3 2 3
# 3 1 2 1 0 0 2 3 1 0 2 1 1 1 1 3 3 0 2 2 3 2 0 1 3 1]
wa holds the number of 0s, 1s, 2s, 3s in a
wb holds the number of 0s, 1s, 2s, 3s in b
wa=np.bincount(a)
wb=np.bincount(b)
pv(wa)
pv(wb)
# wa: [24 28 28 20]
# wb: [21 34 20 25]
result=np.zeros(N,dtype='object')
Think of a 0 as a token or chip. Similarly for 1,2,3.
Think of wa=[24 28 28 20] as meaning there is a bag with 24 0-chips, 28 1-chips, 28 2-chips, 20 3-chips.
You have a wa-bag and a wb-bag. When you draw a chip from each bag, you "add" them together and form a new chip. You "mod" the answer (modulo N).
Imagine taking a 1-chip from the wb-bag and adding it with each chip in the wa-bag.
1-chip + 0-chip = 1-chip
1-chip + 1-chip = 2-chip
1-chip + 2-chip = 3-chip
1-chip + 3-chip = 4-chip = 0-chip (we are mod'ing by N=4)
Since there are 34 1-chips in the wb bag, when you add them against all the chips in the wa=[24 28 28 20] bag, you get
34*24 1-chips
34*28 2-chips
34*28 3-chips
34*20 0-chips
This is just the partial count due to the 34 1-chips. You also have to handle the other
types of chips in the wb-bag, but this shows you the method used below:
for i,count in enumerate(wb):
partial_count=count*wa
pv(partial_count)
shifted_partial_count=np.roll(partial_count,i)
pv(shifted_partial_count)
result+=shifted_partial_count
# partial_count: [504 588 588 420]
# shifted_partial_count: [504 588 588 420]
# partial_count: [816 952 952 680]
# shifted_partial_count: [680 816 952 952]
# partial_count: [480 560 560 400]
# shifted_partial_count: [560 400 480 560]
# partial_count: [600 700 700 500]
# shifted_partial_count: [700 700 500 600]
pv(result)
# result: [2444 2504 2520 2532]
This is the final result: 2444 0s, 2504 1s, 2520 2s, 2532 3s.
# This is a test to make sure the result is correct.
# This uses a very memory intensive method.
# c is too huge when n is large.
if n>1000:
print('n is too large to run the check')
else:
c=(a[:]+b[:,np.newaxis])
c=c.ravel()
c=c%N
result2=np.bincount(c)
pv(result2)
assert(all(r1==r2 for r1,r2 in zip(result,result2)))
# result2: [2444 2504 2520 2532]
Check your math, that's a lot of space you're asking for:
2^20*2^20 = 2^40 = 1 099 511 627 776
If each of your elements was just one byte, that's already one terabyte of memory.
Add a loop or two. This problem is not suited to maxing out your memory and minimizing your computation.