transform broadcasting to something calculateable. matrix np.multipy

transform broadcasting to something calculateable. matrix np.multipy - python

I'm trying to calculate this type of calculation:
arr = np.arange(4)
# array([0, 1, 2, 3])
arr_t =arr.reshape((-1,1))
# array([[0],
# [1],
# [2],
# [3]])
mult_arr = np.multiply(arr,arr_t) # <<< the multiplication
# array([[0, 0, 0, 0],
# [0, 1, 2, 3],
# [0, 2, 4, 6],
# [0, 3, 6, 9]])
to eventually perform it in a bigger matrix index of single row, and to sum all the matrices that are reproduced by the calculation:
arr = np.random.random((600,150))
arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
summed
Till now its all pure awesomeness, the problem starts when I try to covert on a bigger dataset, for example this array instead :
arr = np.random.random((6000,1500))
I get the following error - MemoryError: Unable to allocate 101. GiB for an array with shape (6000, 1500, 1500) and data type float64
which make sense, but my question is:
can I get around this anyhow without being forced to use loops that slow down the process entirely ??
my question is mainly about performance and solution that require long running tasks more then 30 secs is not an option.

Looks like you are simply trying to perform a dot product:
arr.T#arr
or
arr.T.dot(arr)
checking this is what you want
arr = np.random.random((600,150))
arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
np.allclose((arr.T#arr), summed)
# True

Related

Numpy double-slice assignment with integer indexing followed by boolean indexing

I already know that Numpy "double-slice" with fancy indexing creates copies instead of views, and the solution seems to be to convert them to one single slice (e.g. This question). However, I am facing this particular problem where i need to deal with an integer indexing followed by boolean indexing and I am at a loss what to do. The problem (simplified) is as follows:
a = np.random.randn(2, 3, 4, 4)
idx_x = np.array([[1, 2], [1, 2], [1, 2]])
idx_y = np.array([[0, 0], [1, 1], [2, 2]])
print(a[..., idx_y, idx_x].shape) # (2, 3, 3, 2)
mask = (np.random.randn(2, 3, 3, 2) > 0)
a[..., idx_y, idx_x][mask] = 1 # assignment doesn't work
How can I make the assignment work?

Not sure, but an idea is to do the broadcasting manually and adding the mask respectively just like Tim suggests. idx_x and idx_y both have the same shape (3,2) which will be broadcasted to the shape (6,6) from the cartesian product (3*2)^2.
x = np.broadcast_to(idx_x.ravel(), (6,6))
y = np.broadcast_to(idx_y.ravel(), (6,6))
# this should be the same as
x,y = np.meshgrid(idx_x, idx_y)
Now reshape the mask to the broadcasted indices and use it to select
mask = mask.reshape(6,6)
a[..., x[mask], y[mask]] = 1
The assignment now works, but I am not sure if this is the exact assignment you wanted.

Ok apparently I am making things complicated. No need to combine the indexing. The following code solves the problem elegantly:
b = a[..., idx_y, idx_x]
b[mask] = 1
a[..., idx_y, idx_x] = b
print(a[..., idx_y, idx_x][mask]) # all 1s

EDIT: Use #Kevin's solution which actually gets the dimensions correct!
I haven't tried it specifically on your sample code but I had a similar issue before. I think I solved it by applying the mask to the indices instead, something like:
a[..., idx_y[mask], idx_x[mask]] = 1
-that way, numpy can assign the values to the a array correctly.
EDIT2: Post some test code as comments remove formatting.
a = np.arange(27).reshape([3, 3, 3])
ind_x = np.array([[0, 0], [1, 2]])
ind_y = np.array([[1, 2], [1, 1]])
x = np.broadcast_to(ind_x.ravel(), (4, 4))
y = np.broadcast_to(ind_y.ravel(), (4, 4)).T
# x1, y2 = np.meshgrid(ind_x, ind_y) # above should be the same as this
mask = a[:, ind_y, ind_x] % 2 == 0 # what should this reshape to?
# a[..., x[mask], y[mask]] = 1 # Then you can mask away (may also need to reshape a or the masked x or y)

Replacing array at i`th dimension

Let's say I have a two-dimensional array
import numpy as np
a = np.array([[1, 1, 1], [2,2,2], [3,3,3]])
and I would like to replace the third vector (in the second dimension) with zeros. I would do
a[:, 2] = np.array([0, 0, 0])
But what if I would like to be able to do that programmatically? I mean, let's say that variable x = 1 contained the dimension on which I wanted to do the replacing. How would the function replace(arr, dimension, value, arr_to_be_replaced) have to look if I wanted to call it as replace(a, x, 2, np.array([0, 0, 0])?
numpy has a similar function, insert. However, it doesn't replace at dimension i, it returns a copy with an additional vector.
All solutions are welcome, but I do prefer a solution that doesn't recreate the array as to save memory.

arr[:, 1]
is basically shorthand for
arr[(slice(None), 1)]
that is, a tuple with slice elements and integers.
Knowing that, you can construct a tuple of slice objects manually, adjust the values depending on an axis parameter and use that as your index. So for
import numpy as np
arr = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
axis = 1
idx = 2
arr[:, idx] = np.array([0, 0, 0])
# ^- axis position
you can use
slices = [slice(None)] * arr.ndim
slices[axis] = idx
arr[tuple(slices)] = np.array([0, 0, 0])

Numpy reshape - automatic filling or removal

I would like to find a reshape function that is able to transform my arrays of different dimensions in arrays of the same dimension. Let me explain it:
import numpy as np
a = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,3]]])
b = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,4]]])
c = np.array([[[1,2,3,3],[1,2,3,3]]])
I would like to be able to make b,c shapes equal to a shape. However, np.reshape throws an error because as explained here (Numpy resize or Numpy reshape) the function is explicitly made to handle the same dimensions.
I would like some version of that function that adds zeros at the start of the first dimension if the shape is smaller or remove the start if the shape is bigger. My example will look like this:
b = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,4]]])
c = np.array([[[0,0,0,0],[0,0,0,0]],[[1,2,3,3],[1,2,3,3]]])
Do I need to write my own function to do that?

This is similar to above solution but will also work also if lower dimensions don't match
def custom_reshape(a, b):
result = np.zeros_like(a).ravel()
result[-min(a.size, b.size):] = b.ravel()[-min(a.size, b.size):]
return result.reshape(a.shape)
custom_reshape(a,b)

I would write a function like this:
def align(a,b):
out = np.zeros_like(a)
x = min(a.shape[0], b.shape[0])
out[-x:] = b[-x:]
return out
Output:
align(a,b)
# array([[[1, 2, 3, 3],
# [1, 2, 3, 3]],
# [[1, 2, 3, 3],
# [1, 2, 3, 4]]])
align(a,c)
# array([[[0, 0, 0, 0],
# [0, 0, 0, 0]],
# [[1, 2, 3, 3],
# [1, 2, 3, 3]]])

creating a scipy.lil_matrix using a python generator efficiently

I have a generator that generates single dimension numpy.arrays of the same length. I would like to have a sparse matrix containing that data. Rows are generated in the same order I'd like to have them in the final matrix. csr matrix is preferable over lil matrix, but I assume the latter will be easier to build in the scenario I'm describing.
Assuming row_gen is a generator yielding numpy.array rows, the following code works as expected.
def row_gen():
yield numpy.array([1, 2, 3])
yield numpy.array([1, 0, 1])
yield numpy.array([1, 0, 0])
matrix = scipy.sparse.lil_matrix(list(row_gen()))
Because the list will essentially ruin any advantages of the generator, I'd like the following to have the same end result. More specifically, I cannot hold the entire dense matrix (or a list of all matrix rows) in memory:
def row_gen():
yield numpy.array([1, 2, 3])
yield numpy.array([1, 0, 1])
yield numpy.array([1, 0, 0])
matrix = scipy.sparse.lil_matrix(row_gen())
However it raises the following exception when run:
TypeError: no supported conversion for types: (dtype('O'),)
I also noticed the trace includes the following:
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__
A = csr_matrix(A, dtype=dtype).tolil()
Which makes me think using scipy.sparse.lil_matrix will end up creating a csr matrix and only then convert that to a lil matrix. In that case I would rather just create csr matrix to begin with.
To recap, my question is: What is the most efficient way to create a scipy.sparse matrix from a python generator or numpy single dimensional arrays?

Let's look at the code for sparse.lil_matrix. It checks the first argument:
if isspmatrix(arg1): # is is already a sparse matrix
...
elif isinstance(arg1,tuple): # is it the shape tuple
if isshape(arg1):
if shape is not None:
raise ValueError('invalid use of shape parameter')
M, N = arg1
self.shape = (M,N)
self.rows = np.empty((M,), dtype=object)
self.data = np.empty((M,), dtype=object)
for i in range(M):
self.rows[i] = []
self.data[i] = []
else:
raise TypeError('unrecognized lil_matrix constructor usage')
else:
# assume A is dense
try:
A = np.asmatrix(arg1)
except TypeError:
raise TypeError('unsupported matrix type')
else:
from .csr import csr_matrix
A = csr_matrix(A, dtype=dtype).tolil()
self.shape = A.shape
self.dtype = A.dtype
self.rows = A.rows
self.data = A.data
As per the documentation - you can construct it from another sparse matrix, from a shape, and from a dense array. The dense array constructor first makes a csr matrix, and then converts it to lil.
The shape version constructs an empty lil with data like:
In [161]: M=sparse.lil_matrix((3,5),dtype=int)
In [163]: M.data
Out[163]: array([[], [], []], dtype=object)
In [164]: M.rows
Out[164]: array([[], [], []], dtype=object)
It should be obvious that passing a generator isn't going work - it isn't a dense array.
But having created a lil matrix, you can fill in elements with a regular array assignment:
In [167]: M[0,:]=[1,0,2,0,0]
In [168]: M[1,:]=[0,0,2,0,0]
In [169]: M[2,3:]=[1,1]
In [170]: M.data
Out[170]: array([[1, 2], [2], [1, 1]], dtype=object)
In [171]: M.rows
Out[171]: array([[0, 2], [2], [3, 4]], dtype=object)
In [172]: M.A
Out[172]:
array([[1, 0, 2, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 1, 1]])
and you can assign values to the sublists directly (I think this is faster, but a little more dangerous):
In [173]: M.data[1]=[1,2,3]
In [174]: M.rows[1]=[0,2,4]
In [176]: M.A
Out[176]:
array([[1, 0, 2, 0, 0],
[1, 0, 2, 0, 3],
[0, 0, 0, 1, 1]])
Another incremental approach is to construct the 3 arrays or lists of coo format, and then make a coo or csr from those.
sparse.bmat is another option, and its code is a good example of building the coo inputs. I'll let you look at that yourself.

NumPy: A General Vectorized Method to Apply a Function Returning a Matrix to Each Row of a Matrix

I am looking for a vectorized method to apply a function returning a 2-dimensional array to each row of a 2-dimensional array and produce a 3-dimensional array.
More specifically, I have a function that takes a vector of length p and returns a 2-dimensional array (m by n). The following is a stylized version of my function:
import numpy as np
def test_func(x, m, n):
# this function is just an example and does not do anything useful.
# but, the dimensions of input and output is what I want to convey.
np.random.seed(x.sum())
return np.random.randint(5, size=(m, n))
I have a t by p 2-dimensional input data:
t = 5
p = 6
input_data = np.arange(t*p).reshape(t, p)
input_data
Out[403]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
I want to apply test_func to each row of the input_data. Since test_func returns a matrix, I expect to create a 3-dimensional (t by m by n) array. I can produce my desired result with the following code:
output_data = np.array([test_func(x, m=3, n=2) for x in input_data])
output_data
Out[405]:
array([[[0, 4],
[0, 4],
[3, 3],
[1, 0]],
[[1, 0],
[1, 0],
[4, 1],
[2, 4]],
[[3, 3],
[3, 0],
[1, 4],
[0, 2]],
[[2, 4],
[2, 1],
[3, 2],
[3, 1]],
[[3, 4],
[4, 3],
[0, 3],
[3, 0]]])
However, this code does not seem to be the most optimal code. It has an explicit for which reduces the speed and it uses an intermediary list which unnecessarily allocates extra memory. So, I like to find a vectorized solution. My best guess was the following code, but it does not work.
output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
Traceback (most recent call last):
File "<ipython-input-406-5bef44da348f>", line 1, in <module>
output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
File "C:\Anaconda\lib\site-packages\numpy\lib\shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (3,2) into shape (3)
Would you please suggest an efficient way to this problem.
UPDATE
Below is the actual function that I want to apply. It performs Multidimensional Classical Scaling. The objective of the question was not to optimize the internal workings of the function, but to find a generalize method for vectorizing the function apply. But, in the spirit of full disclosure I put the actual function here. Note that this function only works if p == m*(m-1)/2
def mds_classical_scaling(v, m, n):
# create a symmetric distance matrix from the elements in vector v
D = np.zeros((m, m))
D[np.triu_indices(4, k=1)] = v
D = (D + D.T)
# Transform the symmetric matrix
A = -0.5 * (D**2)
# Create centering matrix
H = np.eye(m) - np.ones((m, m))/m
# Doubly center A and store in B
B = H*A*H
# B should be positive definite otherwise the function
# would not work.
mu, V = eig(B)
#index of largest eigen values
ndx = (-mu).argsort()
# calculate the point configuration from largest eigen values
# and corresponding eigen vectors
Mu1 = diag(mu[ndx][:n])
V1 = V[:, ndx[:n]]
X = V1*sqrt(Mu1)
return X
Any performance boost I get from vectorization is negligible comparing to the actual function. The main reason was learning:)

ali_m's comment is spot-on: for serious speed gains, you should be more specific about what the function does.
That being said, if you still want to use np.apply_along_axis to get a (possibly) small speed-boost, then consider (after rereading that function's docstring) that you can easily
wrap your function to produce 1D arrays,
use np.apply_along_axis with that wrapper and
reshape the resulting array:
def test_func_wrapper(*args, **kwargs):
return test_func(*args, **kwargs).ravel()
output = np.apply_along_axis(test_func_wrapper, m=3, n=2, axis=1, arr=input_data)
np.allclose(output.reshape(5,3, -1), output_data)
# output: True
Note that this is a generic way to speed up such loops. You'll probably get better performance if you use functionality more specific to the actual problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

transform broadcasting to something calculateable. matrix np.multipy - python

Looks like you are simply trying to perform a dot product: arr.T#arr or arr.T.dot(arr) checking this is what you want arr = np.random.random((600,150)) arr_t =arr.reshape((-1,arr.shape[1],1)) mult = np.multiply(arr[:,None],arr_t) summed = np.sum(mult,axis=0) np.allclose((arr.T#arr), summed) # True

Related

Numpy double-slice assignment with integer indexing followed by boolean indexing

Replacing array at i`th dimension

Numpy reshape - automatic filling or removal

creating a scipy.lil_matrix using a python generator efficiently

NumPy: A General Vectorized Method to Apply a Function Returning a Matrix to Each Row of a Matrix

Categories

Resources