vectorized matrix list application in numpy - python

the problem i am trying to solve is as follows. I am given a matrix of arbitrary dimension representing indices of a list, and then a list. I would like to get back a matrix with the list elements swapped for the indices. I can't figure out how to do that in a vectorized way:
i.e if z = [[0,1], [1,0]] and list = [20,10], i'd want [[20,10], [10,20]] returned.

When they both are np.array, you can do indexing in a natural way:
import numpy as np
z = np.array([[0, 1], [1, 0]])
a = np.array([20, 10])
output = a[z]
print(output)
# [[20 10]
# [10 20]]

Related

How can I find indexes with same elements in 2d numpy array?

I'm working on a machine vision project. By reflecting laser light on the picture, I detect the pixels that the laser light falls on the picture with the help of Opencv. I keep these pixel values ​​as 2d numpy array. However, I want to make the x, y values ​​unique by determining the pixel values ​​whose x axis values ​​are the same and taking the average of them. Pixel values ​​are kept sequentially in numpy array.
For example:
[[659 253]
[660 253]
[660 256]
[661 253]
[662 253]
[663 253]
[664 253]
[665 253]]
First of all, my goal is to identify all lists in which the first element of each list is the same. When using Opencv, pixel values ​​are kept in numpy arrays to be more useful. I'm trying to write an indexing method myself. I created a numpy array for myself to make it simpler.
x = np.array([[1, 2], [1, 78], [1, 3], [1, 6], [4, 3], [5, 6], [5, 3]], np.int32)
I followed a method like this to find the values ​​whose first element is the same from the lists in the x array.
for i in range (len (x)):
if x [i]! = x [-1] and x [i] [0] == x [i + 1] [0]:
print (x [i], x [i + 1])
I want to check if the first element in the first list is in the next lists by browsing the x array list. In order not to face an index out of range error, I used x [i]! = x [-1]. I was expecting this loop to return below result to me.
[1,2] [1,78]
[1,78] [1,3]
[1,3] [1,6]
[5,6] [5,3]
I would later remove duplicate elements from the list but I got
ValueError: The truth value of an array with more than one element is ambiguous.Use a.any() or a.all()
I am not familiar with numpy arrays so I could not get the solution I wanted. Is it possible to get the result I want using numpy array methods? Thanks for your time.
Approach 1
This is a numpy way to do this:
x_sorted = x[np.argsort(x[:,0])]
marker_idx = np.flatnonzero(np.diff(x_sorted[:,0]))+1
output = np.split(x_sorted, marker_idx)
Approach 2
You can also use a package numpy_indexed which is designed to solve groupby problems with less script and without loss of performance:
import numpy_indexed as npi
npi.group_by(x[:, 0]).split(x)
Approach 3
You can get groups of indices but this might not be the best option because of list comprehension:
import pandas as pd
[x[idx] for idx in pd.DataFrame(x).groupby([0]).indices.values()]
Output
[array([[ 1, 2],
[ 1, 78],
[ 1, 3],
[ 1, 6],
[ 1, 234]]),
array([[4, 3]]),
array([[5, 6],
[5, 3]])]
Try the following, using itertools.groupby:
x.sort(axis=0)
for l in [list([tuple(p) for p in k]) for i,k in itertools.groupby(x, key=lambda x: x[0])]:
print(l)
Output:
[(1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
[(3, 6), (3, 78)]
[(5, 234)]
You can use np.unique with its return_inverse argument, which is effectively a sorting index, and return_counts, which is going to help build the split points:
_, ind, cnt = np.unique(x[:, 0], return_index=True, return_counts=True)
The index i arranges u into x. To sort the other way, you need to invert the index. Luckily, np.argsort is its own inverse:
ind = np.argsort(ind)
To get the splitpoints of the data, you can use np.cumsum on the count. You don't need the last element because it is always going to mark the end of the array:
spp = np.cumsum(cnt[:-1])
Finally, you can use np.split to get the list of sub-arrays that you want:
result = np.split(x[ind, :], spp, axis=0)
TL;DR
_, ind, cnt = np.unique(x[:, 0], return_index=True, return_counts=True)
np.split(x[np.argsort(ind), :], np.cumsum(cnt[:-1]), axis=0)

How can I speed-up a matrix rotation by 90 degrees clockwise?

Rotate a square matrix by 90 degree with O(1) extra space. I have used Python to solve it. I wanted to know if I can improve my code further.
def rotate_by_90(m):
# unpacking arguments with zip(*) in reverse with [ : :-1]
tuples = zip(*m[::-1])
# flattening tuples to list with [list(i)]
return [list(i) for i in tuples]
def makeMatrix(array, size):
# validating size of matrix for given array
if (size**2!=len(array)):
return -1
# make sub array of length size using array slicing
else:
matrix = [array[i:i+size] for i in range(0, len(array), size)]
return rotate_by_90(matrix)
arr = [1,2,3,4]
dimension = 2
result = makeMatrix(arr, dimension)
# Original Matrix: [[1, 2], [3, 4]]
# Result: [[3, 1], [4, 2]]
No external libraries or zip needed. Use this for interviews:
Flip it by the x axis.
Swap the coordinates diagonally
def rotate_by_90(m):
a.reverse()
for i in range(len(a)):
for j in range(i):
a[i][j], a[j][i] = a[j][i], a[i][j]
return a
If you are looking for speed you are using the wrong type of array. You need to switch to numpy arrays instead of lists. And in Numpy there is of course a function for operations like this: rot90

Numpy: smart matrix multiplication to sparse result matrix

In python with numpy, say I have two matrices:
S, a sparse x*x matrix
M, a dense x*y matrix
Now I want to do np.dot(M, M.T) which will return a dense x*x matrix S_.
However, I only care about the cells that are nonzero in S, which means that it would not make a difference for my application if I did
S_ = S*S_
Obviously, that would be a waste of operations as I would like to leave out the irrelevant cells given in S alltogether. Remember that in matrix multiplication
S_[i,j] = np.sum(M[i,:]*M[:,j])
So I want to do this operation only for i,j such that S[i,j]=True.
Is this supported somehow by numpy implementations that run in C so that I do not need to implement it with python loops?
EDIT 1 [solved]: I still have this problem, actually M is now also sparse.
Now, given rows and cols of S, I implemented it like this:
data = np.array([ M[rows[i],:].dot(M[cols[i],:]).data[0] for i in xrange(len(rows)) ])
S_ = csr( (data, (rows,cols)) )
... but it is still slow. Any new ideas?
EDIT 2: jdehesa has given a great solution, but I would like to save more memory.
The solution was to do the following:
data = M[rows,:].multiply(M[cols,:]).sum(axis=1)
and then build a new sparse matrix from rows, cols and data.
However, when running the above line, scipy builds a (contiguous) numpy array with as many elements as nnz of the first submatrix plus nnz of the second submatrix, which can lead to MemoryError in my case.
In order to save more memory, I would like to multiply iteratively each row with its respective 'partner' column, then sum over and discard the result vector. Using simple python to implement this, basically I am back to the extremely slow version.
Is there a fast way of solving this problem?
Here is how you can do it with NumPy/SciPy, both for dense and sparse M matrices:
import numpy as np
import scipy.sparse as sp
# Coordinates where S is True
S = np.array([[0, 1],
[3, 6],
[3, 4],
[9, 1],
[4, 7]])
# Dense M matrix
# Random big matrix
M = np.random.random(size=(1000, 2000))
# Take relevant rows and compute values
values = np.sum(M[S[:, 0]] * M[S[:, 1]], axis=1)
# Make result matrix from values
result = np.zeros((len(M), len(M)), dtype=values.dtype)
result[S[:, 0], S[:, 1]] = values
# Sparse M matrix
# Construct sparse M as COO matrix or any other way
M = sp.coo_matrix(([10, 20, 30, 40, 50], # Data
([0, 1, 3, 4, 6], # Rows
[4, 4, 5, 5, 8])), # Columns
shape=(1000, 2000))
# Convert to CSR for fast row slicing
M_csr = M.tocsr()
# Take relevant rows and compute values
values = M_csr[S[:, 0]].multiply(M_csr[S[:, 1]]).sum(axis=1)
values = np.squeeze(np.asarray(values))
# Construct COO sparse matrix from values
result = sp.coo_matrix((values, (S[:, 0], S[:, 1])), shape=(M.shape[0], M.shape[0]))

NxN python arrays subsets

I need to carry out some operation on a subset of an NxN array. I have the center of the sub-array, x and y, and its size.
So I can easily do:
subset = data[y-size:y+size,x-size:x+size]
And this is fine.
What I ask is if there is the possibility to do the same without writing an explicit loop if x and y are both 1D arrays of positions.
Thanks!
Using a simple example of a 5x5 array and setting size=1 we can get:
import numpy as np
data = np.arange(25).reshape((5,5))
size = 1
x = np.array([1,4])
y = np.array([1,4])
subsets = [data[j-size:j+size,i-size:i+size] for i in x for j in y]
print(subsets)
Which returns a list of numpy arrays:
[array([[0, 1],[5, 6]]),
array([[15, 16],[20, 21]]),
array([[3, 4],[8, 9]]),
array([[18, 19],[23, 24]])]
Which I hope is what you are looking for.
To get the list of subset assuming you have the list of positions xList and yList, this will do the tric:
subsetList = [ data[y-size:y+size,x-size:x+size] for x,y in zip(xList,yList) ]

Create matrix with 2 arrays in numpy

I want to find a command in numpy for a column vector times a row vector equals to a matrix
[1,1,1,1 ] ^T * [ 2,3 ] = [[2,3],[2,3],[2,3],[2,3]]
First, let's define your 1-D numpy arrays:
In [5]: one = np.array([ 1,1,1,1 ]); two = np.array([ 2,3 ])
Now, lets multiply them:
In [6]: one[:, np.newaxis] * two[np.newaxis, :]
Out[6]:
array([[2, 3],
[2, 3],
[2, 3],
[2, 3]])
This used numpy's newaxis to add the appropriate axes to get a 4x2 output matrix.
The problem you are encountering is that both of your vectors are neither column nor row vectors - they're just vectors. If you look at len(vec.shape) it's 1.
What you can do is use numpy.reshape to turn your column vector into shape (m, 1) and your row vector into shape (1, n).
import numpy as np
colu = np.reshape(u, (u.shape[0], 1))
rowv = np.reshape(v, (1, v.shape[0]))
Now when you multiply colu and rowv you'll get a matrix with shape (m, n).
If you need a matrix - use matrices. This way you can use your expression nearly verbatim:
np.matrix([1,1,1,1]).T * np.matrix([2,3])
You might want to use numpy.kron(a,b) it takes the Kronecker product of two arrays. You can see the b vector as a block. The function puts this block, multiplied by the corresponding coefficient of the a vector, on the position of that coefficient. You can also use it for matrices.
For your example it would look like:
import numpy as np
vecA = np.array([[1],[1],[1],[1]])
vecB = np.array([2,3])
Out = np.kron(vecA,vecB)
this returns
>>> Out
array([[2, 3],
[2, 3],
[2, 3],
[2, 3]])
Hope this helps you.

Categories