How to vectorize this operation

How to vectorize this operation - python

Say I have two lists (always the same length):
l0 = [0, 4, 4, 4, 0, 0, 0, 8, 8, 0]
l1 = [0, 1, 1, 1, 0, 0, 0, 8, 8, 8]
I have the following rules for intersections and unions I need to apply when comparing these lists element-wise:
# union and intersect
uni = [0]*len(l0)
intersec = [0]*len(l0)
for i in range(len(l0)):
if l0[i] == l1[i]:
uni[i] = l0[i]
intersec[i] = l0[i]
else:
intersec[i] = 0
if l0[i] == 0:
uni[i] = l1[i]
elif l1[i] == 0:
uni[i] = l0[i]
else:
uni[i] = [l0[i], l1[i]]
Thus, the desired output is:
uni: [0, [4, 1], [4, 1], [4, 1], 0, 0, 0, 8, 8, 8]
intersec: [0, 0, 0, 0, 0, 0, 0, 8, 8, 0]
While this works, I need to do this with several hundred very large lists (each, with thousands of elements), so I am looking for a way to vectorize this. I tried using np.where and various masking strategies, but that went nowhere fast. Any suggestions would be most welcome.
* EDIT *
Regarding
uni: [0, [4, 1], [4, 1], [4, 1], 0, 0, 0, 8, 8, 8]
versus
uni: [0, [4, 1], [4, 1], [4, 1], 0, 0, 0, 8, 8, [0, 8]]
I'm still fighting the 8 versus [0, 8] in my mind. The lists are derived from BIO tags in system annotations (see IOB labeling of text chunks), where each list element is a character index in a document and the vakue is an assigned enumerated label. 0 represents a label representing no annotation (i.e., used for determining negatives in a confusion matrix); while non zero elements represent assigned enumerated labels for that character. Since I am ignoring true negatives, I think I can say 8 is equivalent to [0, 8]. As to whether this simplifies things, I am not yet sure.
* EDIT 2 *
I'm using [0, 8] to keep things simple and to keep the definitions of intersection and union consistent with set theory.

I would stay away from calling them 'intersection' and 'union', since those operations have well-defined meanings on sets and the operation you're looking to perform is neither of them.
However, to do what you want:
l0 = [0, 4, 4, 4, 0, 0, 0, 8, 8, 0]
l1 = [0, 1, 1, 1, 0, 0, 0, 8, 8, 8]
values = [
(x
if x == y else 0,
0
if x == y == 0
else x if y == 0
else y if x == 0
else [x, y])
for x, y in zip(l0, l1)
]
result_a, result_b = map(list, zip(*values))
print(result_a)
print(result_b)
This is more than enough for thousands, or even millions of elements since the operation is so basic. Of course, if we're talking billions, you may want to look at numpy anyway.

Semi vectorized solution for union and full for intersection:
import numpy as np
l0 = np.array(l0)
l1 = np.array(l1)
intersec = np.zeros(l0.shape[0])
intersec_idx = np.where(l0==l1)
intersec[intersec_idx] = l0[intersec_idx]
intersec = intersec.astype(int).tolist()
union = np.zeros(l0.shape[0])
union_idx = np.where(l0==l1)
union[union_idx] = l0[union_idx]
no_union_idx = np.where(l0!=l1)
union = union.astype(int).tolist()
for idx in no_union_idx[0]:
union[idx] = [l0[idx], l1[idx]]
and the output:
>>> intersection
[0, 0, 0, 0, 0, 0, 0, 8, 8, 0]
>>> union
[0, [4, 1], [4, 1], [4, 1], 0, 0, 0, 8, 8, [0, 8]]
NB: I think your original union solution is incorrect. See the last output 8 vs [0,8]

Related

Find first n non zero values in in numpy 2d array

I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?

Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)

Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))

Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))

Get column ID of element in loop

Im trying to create a function that will transform a regular Matrix into CSR form (I don't want to use the scipy.sparse one).
To do this, I'm using a nested for-loop to run through a given matrix to create a new matrix with three rows.
The first row ('Values') should contain all non-zero values. The second ('Cols') should contain the column index for each number in 'Values'. The third row should contain the index value in 'Values' for the first non-zero value on each row.
My question regards the second and third rows:
Is there a way of getting the column ID for the element 'i' in the for-loop?
M=array([[4,0,39],
[0,5,0],
[0,0,7]])
def Convert(x):
CSRMatrix = []
Values = []
Cols = []
Rows = []
for k in x:
for i in k:
if i != 0:
Values.append(i)
Cols.append({#the column index value of 'i'})
Rows.append[#theindex in 'Values' of the first non-zero element on each row]
CSRMatrix.append(Values)
CSRMatrix.append(Cols)
CSRMatrix.append(Rows)
return(CSRMatrix)
Convert(M)

I'm not sure of what you want exactly for Cols.append() because of the way you commented it in the code between curly braces.
Is it a dict containing the index:value of all non 0 value? Or a list of sets containing the indexes of all non 0 values (which would be weird), or is it all the indexes of each row in your array?
Anyway I put the 2 most likely candidates (dict and list of indexes for each row) test each one and delete the unwanted one and if none are right please add some more specifics:
import numpy as np
m = np.array([[4,0,39],
[0,5,0],
[0,0,7]])
def Convert(x):
CSRMatrix = []
Values = []
Cols = []
Rows = []
for num in x:
for i in range(len(num)):
if num[i] != 0:
Values.append(num[i])
Cols.append({i:num[i]}) # <- if dict. Remove if not what you wanted
Rows.append(i)
Cols.append(i) # <- list of all indexes in the array for each row. Remove if not what you wanted
CSRMatrix.append(Values)
CSRMatrix.append(Cols)
CSRMatrix.append(Rows)
return(CSRMatrix)
x = Convert(m)
print(x)

enumerate() passes an index for every iteration.
Thereby the second row can be easily created by appending num2.
For the third row you have to check again if you have already added a value in that row. If not append num2 and set the non_zero check to False. For the next row non_zero check is set to True again.
def Convert(x):
CSRMatrix = []
Values = []
Cols = []
Rows = []
for num, k in enumerate(x):
non_zero = True
for num2, i in enumerate(k):
if i != 0:
Values.append(i)
Cols.append(num2)
if non_zero:
Rows.append(num2)
non_zero = False
CSRMatrix.append(Values)
CSRMatrix.append(Cols)
CSRMatrix.append(Rows)
return (CSRMatrix)

Here is a numpythonic implementation, use the nonzero method to directly obtain the row and column index of non-zero elements, and then use a comparison to generate a mask. Finally, use nonzero for the mask to get the row indices:
>>> M = np.array([[ 4, 0, 39],
... [ 0, 5, 0],
... [ 0, 0, 7]])
>>> r, c = M.nonzero()
>>> mask = np.concatenate(([True], r[1:] != r[:-1]))
>>> [M[r, c], c, *mask.nonzero()]
[array([ 4, 39, 5, 7]), array([0, 2, 1, 2]), array([0, 2, 3])]
Test of a larger array:
>>> a = np.random.choice(10, size=(8, 8), p=[0.73] + [0.03] * 9)
>>> a
array([[0, 0, 0, 0, 8, 0, 0, 1],
[1, 0, 5, 4, 0, 0, 9, 0],
[0, 0, 9, 0, 0, 0, 0, 1],
[0, 0, 0, 8, 9, 0, 0, 4],
[0, 0, 5, 0, 0, 6, 0, 0],
[0, 8, 0, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 0, 0, 9],
[0, 9, 0, 0, 0, 4, 0, 0]])
>>> r, c = a.nonzero()
>>> mask = np.concatenate(([True], r[1:] != r[:-1]))
>>> pp([a[r, c], c, *mask.nonzero()])
[array([8, 1, 1, 5, 4, 9, 9, 1, 8, 9, 4, 5, 6, 8, 9, 9, 9, 4]),
array([4, 7, 0, 2, 3, 6, 2, 7, 3, 4, 7, 2, 5, 1, 7, 7, 1, 5], dtype=int64),
array([ 0, 2, 6, 8, 11, 13, 15, 16], dtype=int64)]

Index targeting on new list from old list

So let's say I have a list that looks like:
x = [1, 0, 0, 1, 1, 1, 0, 0, 0, 0]
I then have another list with indices that needs to be removed from list x:
x_remove = [1, 4, 5]
I can then use the numpy command delete to remove this from x and end up with:
x_final = np.delete(x, x_remove)
>>> x_final = [0, 0, 1, 0, 0, 0, 0]
So far so good. Now I then figure out that I don't want to use the entire list x, but start perhaps from index 2. So basically:
x_new = x[2:]
>>> x_new = [0, 1, 1, 1, 0, 0, 0, 0]
I do however still need to remove the indices from the x_remove list, but now, as you can see, the indices are not the same placement as before, so the wrong items are removed. And same thing will happen if I do it the other way around (i.e. first removing the indices, and then use slice to start at index 2). So basically it will/should look like:
x_new_final = [0, 1, 1, 0, 0] (first use slice, and the remove list)
x_new_final_v2 = [1, 0, 0, 0, 0] (first use remove list, and then slice)
x_new_final_correct_one = [0, 1, 0, 0, 0, 0] (as it should be)
So is there some way in which I can start my list at various indices (through slicing), and still use the delete command to remove the correct indices that would correspond to the full list ?

You could change the x_remove list depending on the slice location. For example:
slice_location = 2
x = [1, 0, 0, 1, 1, 1, 0, 0, 0, 0]
x_remove = [1, 4, 5]
x_new=x[slice_location:]
x_remove = [x-slice_location for x in x_remove if x-slice_location>0]
x_new = np.delete(x, x_remove)

x = [1, 0, 0, 1, 1, 1, 0, 0, 0, 0]
x_remove = [1, 4, 5]
for index,value in enumerate(x):
for remove_index in x_remove:
if(index == remove_index-1):
x[index] = ""
final_list = [final_value for final_value in x if(final_value != "")]
print(final_list)
Try it this simple way...

First let's explore alternatives for the simple removal (with out this change in starting position issue):
First make an x with unique and easily recognized values:
In [787]: x = list(range(10))
In [788]: x
Out[788]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A list comprehension method - maybe not the fastest, but fairly clear and bug free:
In [789]: [v for i,v in enumerate(x) if i not in x_remove]
Out[789]: [0, 2, 3, 6, 7, 8, 9]
Your np.delete approach:
In [790]: np.delete(x, x_remove)
Out[790]: array([0, 2, 3, 6, 7, 8, 9])
That has a downside of converting x to an array, which is not a trivial task (time wise). It also makes a new array. My guess is that it is slower.
Try in place removeal:
In [791]: y=x[:]
In [792]: for i in x_remove:
...: del y[i]
...:
In [793]: y
Out[793]: [0, 2, 3, 4, 6, 8, 9]
oops - wrong. We need to start from the end (largest index). This is a well known Python 'recipe':
In [794]: y=x[:]
In [795]: for i in x_remove[::-1]:
...: del y[i]
...:
...:
In [796]: y
Out[796]: [0, 2, 3, 6, 7, 8, 9]
Under the covers np.delete is taking a masked approach:
In [797]: arr = np.array(x)
In [798]: arr
Out[798]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [799]: mask = np.ones(arr.shape, bool)
In [800]: mask[x_remove] = False
In [801]: mask
Out[801]:
array([ True, False, True, True, False, False, True, True, True,
True])
In [802]: arr[mask]
Out[802]: array([0, 2, 3, 6, 7, 8, 9])
Now to the question of applying x_remove to a slice of x. The slice of x does not have a record of the slice parameters. That is you can't readily determine that y = x[2:] is missing two values. (Well, I could deduce it by comparing some attributes of x and y, but not from y alone).
So regardless of how you do the delete, you will have to first adjust the values of x_remove.
In [803]: x2 = np.array(x_remove)-2
In [804]: x2
Out[804]: array([-1, 2, 3])
In [805]: [v for i,v in enumerate(x[2:]) if i not in x2]
Out[805]: [2, 3, 6, 7, 8, 9]
This works ok, but that -1 is potentially a problem. We don't want it mean the last element. So we have to first filter out the negative indicies to be safe.
In [806]: np.delete(x[2:], x2)
/usr/local/bin/ipython3:1: FutureWarning: in the future negative indices will not be ignored by `numpy.delete`.
#!/usr/bin/python3
Out[806]: array([2, 3, 6, 7, 8, 9])
If delete didn't ignore negative indices, it could get a mask like this - with a False at the end:
In [808]: mask = np.ones(arr[2:].shape, bool)
In [809]: mask[x2] = False
In [810]: mask
Out[810]: array([ True, True, False, False, True, True, True, False])

Backtracking is failing in SUDOKU

I have been trying to implement Sudoku in Python, but the backtracking is not working at all. When I input a 4x4 grid of 0's, I get output, but most of the time it fails to provide the result for a 3x3. This test case progresses correctly until it reaches the last element of the second row.
import math
solution=[[3,0,6,5,0,8,4,0,0],
[5,2,0,0,0,0,0,0,0],
[0,8,7,0,0,0,0,3,1],
[0,0,3,0,1,0,0,8,0],
[9,0,0,8,6,3,0,0,5],
[0,5,0,0,9,0,6,0,0],
[1,3,0,0,0,0,2,5,0],
[0,0,0,0,0,0,0,7,4],
[0,0,5,2,0,6,3,0,0]]
#solution=[[0 for x in range(4)] for y in range(4)]
N=9
row=0
col=0
def positionFound():
global row,col
for x in range(N):
for y in range(N):
if solution[x][y] is 0:
row,col=x,y
return row,col
return False
def isSafe(row,col,num):
global N
for c in range(N):
if solution[row][c] is num:
return False
for r in range(N):
if solution[r][col] is num:
return False
r=row-row%int(math.sqrt(N))
c=col-col%int(math.sqrt(N))
for x in range(r,r+int(math.sqrt(N))):
for y in range(c,c+int(math.sqrt(N))):
if solution[x][y] is num:
return False
return True
back=1
def sudoku(solution):
global row,col
if positionFound() is False:
print('SUCCESS')
for x in solution:
print(x)
return True
for number in range(1,N+1):
if isSafe(row,col,number):
solution[row][col]=number
if sudoku(solution) is True:
return True
solution[row][col]=0
return False
sudoku(solution)
for x in solution:
print(x)
OUTPUT:
[3, 1, 6, 5, 2, 8, 4, 9, 7]
[5, 2, 4, 1, 3, 7, 8, 6, 0]
[0, 8, 7, 0, 0, 0, 0, 3, 1]
[0, 0, 3, 0, 1, 0, 0, 8, 0]
[9, 0, 0, 8, 6, 3, 0, 0, 5]
[0, 5, 0, 0, 9, 0, 6, 0, 0]
[1, 3, 0, 0, 0, 0, 2, 5, 0]
[0, 0, 0, 0, 0, 0, 0, 7, 4]
[0, 0, 5, 2, 0, 6, 3, 0, 0]

The reason your backtracking isn't working is that you haven't implemented backtracking. Once you fail to place a number in a given location, you have no provision to return your [row, col] cursor to the previous position. You need to involve a way to know what the previous filled position was and resume with the next legal number for that position. Your recursion holds previous board positions in the stack, but you've lost the cursor position -- and your re-try loop assumes that it gets reset.
One strong possibility is to make row and col local variables, keeping them coordinated with the solution grid they describe. Make them part of the parameter passing, so the stack maintains those values for you.

Calculate the sum of all possible combinations of n lists and check compare to specific value

I want to write a code (in python 3) that is able to calculate the sum of all possible combinations of a varying number of lists. The result of the sum needs to be checked with a specified value. For all combinations where the sum adds up to the specified value, I would like to create a new list containing just those values.
For example:
value = 5
a = [1, 2, 3, 4]
b = [2, 3, 4, 5]
1 + 2 = 3 - x
1 + 3 = 4 - x
1 + 4 = 5 - correct
1 + 5 = 6 - x
2 + 2 = 4 - x
2 + 3 = 5 - correct
...
The result should be for example:
res = [[1, 4], [2, 3], [3, 2], [4, 1]]
I know that a simple option would be to use nested for-loops. The problem is that at the time of writing the code I dont know how many lists there will be, resulting in the need to define all possible cases. This is something I dont want to do. By the time I am running the code, I do know how many lists there are. The length of the lists will always be the same (26 elements).
The lists that need to be checked are stored in a list in the following way. For example:
list = [[1, 2, 3, 4], [2, 3, 4, 5]]
An example of an actual set of lists that I would like to solve this problem for is:
list = [[0, 2, 0, 0, 5, 0, 0, 8, 0, 0, 11, 0, 0, 14, 0, 0, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 2, 0, 0, 5, 0, 0, 8, 0, 0, 11, 0, 0, 14, 0, 0, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
All the zero values are the result of other refinements in the total number of options that do not meet other criterias.
I hope someone can push me into the right direction.
Thanks in advance!

With some list of lists l (don't name something list, there's a built in function named list)
l = [[1, 2, 3, 4], [2, 3, 4, 5]]
We can use itertools.product to get all combinations of items between the lists, then map the sum function onto those combinations. Then checking for membership is easy.
from itertools import product
if value in map(sum, product(*l)):
print('Yes!')
else:
print('No :(')
If you want to save the sums for multiple checks, I recommend saving them into a set
sum_set = set(map(sum, product(*l)))
if value in sum_set:
...
The * in product(*l) takes the is the unpacking operator. It gives the elements of l to product as individual arguments product([1,2,3,4], [2,3,4,5])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to vectorize this operation - python

Related

Find first n non zero values in in numpy 2d array

Get column ID of element in loop

Index targeting on new list from old list

Backtracking is failing in SUDOKU

Calculate the sum of all possible combinations of n lists and check compare to specific value

Categories

Resources