Numpy reshape - automatic filling or removal - python

I would like to find a reshape function that is able to transform my arrays of different dimensions in arrays of the same dimension. Let me explain it:
import numpy as np
a = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,3]]])
b = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,4]]])
c = np.array([[[1,2,3,3],[1,2,3,3]]])
I would like to be able to make b,c shapes equal to a shape. However, np.reshape throws an error because as explained here (Numpy resize or Numpy reshape) the function is explicitly made to handle the same dimensions.
I would like some version of that function that adds zeros at the start of the first dimension if the shape is smaller or remove the start if the shape is bigger. My example will look like this:
b = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,4]]])
c = np.array([[[0,0,0,0],[0,0,0,0]],[[1,2,3,3],[1,2,3,3]]])
Do I need to write my own function to do that?

This is similar to above solution but will also work also if lower dimensions don't match
def custom_reshape(a, b):
result = np.zeros_like(a).ravel()
result[-min(a.size, b.size):] = b.ravel()[-min(a.size, b.size):]
return result.reshape(a.shape)
custom_reshape(a,b)

I would write a function like this:
def align(a,b):
out = np.zeros_like(a)
x = min(a.shape[0], b.shape[0])
out[-x:] = b[-x:]
return out
Output:
align(a,b)
# array([[[1, 2, 3, 3],
# [1, 2, 3, 3]],
# [[1, 2, 3, 3],
# [1, 2, 3, 4]]])
align(a,c)
# array([[[0, 0, 0, 0],
# [0, 0, 0, 0]],
# [[1, 2, 3, 3],
# [1, 2, 3, 3]]])

Related

transform broadcasting to something calculateable. matrix np.multipy

I'm trying to calculate this type of calculation:
arr = np.arange(4)
# array([0, 1, 2, 3])
arr_t =arr.reshape((-1,1))
# array([[0],
# [1],
# [2],
# [3]])
mult_arr = np.multiply(arr,arr_t) # <<< the multiplication
# array([[0, 0, 0, 0],
# [0, 1, 2, 3],
# [0, 2, 4, 6],
# [0, 3, 6, 9]])
to eventually perform it in a bigger matrix index of single row, and to sum all the matrices that are reproduced by the calculation:
arr = np.random.random((600,150))
arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
summed
Till now its all pure awesomeness, the problem starts when I try to covert on a bigger dataset, for example this array instead :
arr = np.random.random((6000,1500))
I get the following error - MemoryError: Unable to allocate 101. GiB for an array with shape (6000, 1500, 1500) and data type float64
which make sense, but my question is:
can I get around this anyhow without being forced to use loops that slow down the process entirely ??
my question is mainly about performance and solution that require long running tasks more then 30 secs is not an option.
Looks like you are simply trying to perform a dot product:
arr.T#arr
or
arr.T.dot(arr)
checking this is what you want
arr = np.random.random((600,150))
arr_t =arr.reshape((-1,arr.shape[1],1))
mult = np.multiply(arr[:,None],arr_t)
summed = np.sum(mult,axis=0)
np.allclose((arr.T#arr), summed)
# True

Numpy double-slice assignment with integer indexing followed by boolean indexing

I already know that Numpy "double-slice" with fancy indexing creates copies instead of views, and the solution seems to be to convert them to one single slice (e.g. This question). However, I am facing this particular problem where i need to deal with an integer indexing followed by boolean indexing and I am at a loss what to do. The problem (simplified) is as follows:
a = np.random.randn(2, 3, 4, 4)
idx_x = np.array([[1, 2], [1, 2], [1, 2]])
idx_y = np.array([[0, 0], [1, 1], [2, 2]])
print(a[..., idx_y, idx_x].shape) # (2, 3, 3, 2)
mask = (np.random.randn(2, 3, 3, 2) > 0)
a[..., idx_y, idx_x][mask] = 1 # assignment doesn't work
How can I make the assignment work?
Not sure, but an idea is to do the broadcasting manually and adding the mask respectively just like Tim suggests. idx_x and idx_y both have the same shape (3,2) which will be broadcasted to the shape (6,6) from the cartesian product (3*2)^2.
x = np.broadcast_to(idx_x.ravel(), (6,6))
y = np.broadcast_to(idx_y.ravel(), (6,6))
# this should be the same as
x,y = np.meshgrid(idx_x, idx_y)
Now reshape the mask to the broadcasted indices and use it to select
mask = mask.reshape(6,6)
a[..., x[mask], y[mask]] = 1
The assignment now works, but I am not sure if this is the exact assignment you wanted.
Ok apparently I am making things complicated. No need to combine the indexing. The following code solves the problem elegantly:
b = a[..., idx_y, idx_x]
b[mask] = 1
a[..., idx_y, idx_x] = b
print(a[..., idx_y, idx_x][mask]) # all 1s
EDIT: Use #Kevin's solution which actually gets the dimensions correct!
I haven't tried it specifically on your sample code but I had a similar issue before. I think I solved it by applying the mask to the indices instead, something like:
a[..., idx_y[mask], idx_x[mask]] = 1
-that way, numpy can assign the values to the a array correctly.
EDIT2: Post some test code as comments remove formatting.
a = np.arange(27).reshape([3, 3, 3])
ind_x = np.array([[0, 0], [1, 2]])
ind_y = np.array([[1, 2], [1, 1]])
x = np.broadcast_to(ind_x.ravel(), (4, 4))
y = np.broadcast_to(ind_y.ravel(), (4, 4)).T
# x1, y2 = np.meshgrid(ind_x, ind_y) # above should be the same as this
mask = a[:, ind_y, ind_x] % 2 == 0 # what should this reshape to?
# a[..., x[mask], y[mask]] = 1 # Then you can mask away (may also need to reshape a or the masked x or y)

Replacing array at i`th dimension

Let's say I have a two-dimensional array
import numpy as np
a = np.array([[1, 1, 1], [2,2,2], [3,3,3]])
and I would like to replace the third vector (in the second dimension) with zeros. I would do
a[:, 2] = np.array([0, 0, 0])
But what if I would like to be able to do that programmatically? I mean, let's say that variable x = 1 contained the dimension on which I wanted to do the replacing. How would the function replace(arr, dimension, value, arr_to_be_replaced) have to look if I wanted to call it as replace(a, x, 2, np.array([0, 0, 0])?
numpy has a similar function, insert. However, it doesn't replace at dimension i, it returns a copy with an additional vector.
All solutions are welcome, but I do prefer a solution that doesn't recreate the array as to save memory.
arr[:, 1]
is basically shorthand for
arr[(slice(None), 1)]
that is, a tuple with slice elements and integers.
Knowing that, you can construct a tuple of slice objects manually, adjust the values depending on an axis parameter and use that as your index. So for
import numpy as np
arr = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
axis = 1
idx = 2
arr[:, idx] = np.array([0, 0, 0])
# ^- axis position
you can use
slices = [slice(None)] * arr.ndim
slices[axis] = idx
arr[tuple(slices)] = np.array([0, 0, 0])

How to perform stencil computations element-wise on a matrix in Theano?

I have the following blurring kernel I need to apply to every pixel in an RGB image
[ 0.0625 0.025 0.375 0.025 0.0625 ]
So, the pseudo-code looks something like this in Numpy
for i in range(rows):
for j in range(cols):
for k in range(3):
final[i][j][k] = image[i-2][j][k]*0.0625 + \
image[i-1][j][k]*0.25 + \
image[i][j][k]*0.375 + \
image[i+1][j][k]*0.25 + \
image[i+2][j][k]*0.0625
I've tried searching for a question similar to this but never found these sort of data accesses in the computation.
How do I perform the above function for a Theano tensor matrix?
You can use Conv2D function for this task. see the reference here and may be you also can read the example tutorial here. Notes for this solution:
Because your kernel is symmetrical, you can ignore filter_flip parameter
Conv2D is using 4D input and kernel shape as parameters, so you need to reshape it first
Conv2D sum every channel (I think in your case 'k' variable is for RGB right? it's called channel) so you should separate it first
This is my example code, I use simpler kernel here:
import numpy as np
import theano
import theano.tensor as T
from theano.tensor.nnet import conv2d
# original image
img = [[[1, 2, 3, 4], #R channel
[1, 1, 1, 1], #
[2, 2, 2, 2]], #
[[1, 1, 1, 1], #G channel
[2, 2, 2, 2], #
[1, 2, 3, 4]], #
[[1, 1, 1, 1], #B channel
[1, 2, 3, 4], #
[2, 2, 2, 2],]]#
# separate and reshape each channel to 4D
R = np.asarray([[img[0]]], dtype='float32')
G = np.asarray([[img[1]]], dtype='float32')
B = np.asarray([[img[2]]], dtype='float32')
# 4D kernel from the original : [1,0,1]
kernel = np.asarray([[[[1],[0],[1]]]], dtype='float32')
# theano convolution
t_img = T.ftensor4("t_img")
t_kernel = T.ftensor4("t_kernel")
result = conv2d(
input = t_img,
filters=t_kernel,
filter_shape=(1,1,1,3),
border_mode = 'half')
f = theano.function([t_img,t_kernel],result)
# compute each channel
R = f(R,kernel)
G = f(G,kernel)
B = f(B,kernel)
# reshape again
img = np.asarray([R,G,B])
img = np.reshape(img,(3,3,4))
print img
If you have anything to discuss about the code, please comment. Hope it helps.

NumPy: A General Vectorized Method to Apply a Function Returning a Matrix to Each Row of a Matrix

I am looking for a vectorized method to apply a function returning a 2-dimensional array to each row of a 2-dimensional array and produce a 3-dimensional array.
More specifically, I have a function that takes a vector of length p and returns a 2-dimensional array (m by n). The following is a stylized version of my function:
import numpy as np
def test_func(x, m, n):
# this function is just an example and does not do anything useful.
# but, the dimensions of input and output is what I want to convey.
np.random.seed(x.sum())
return np.random.randint(5, size=(m, n))
I have a t by p 2-dimensional input data:
t = 5
p = 6
input_data = np.arange(t*p).reshape(t, p)
input_data
Out[403]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
I want to apply test_func to each row of the input_data. Since test_func returns a matrix, I expect to create a 3-dimensional (t by m by n) array. I can produce my desired result with the following code:
output_data = np.array([test_func(x, m=3, n=2) for x in input_data])
output_data
Out[405]:
array([[[0, 4],
[0, 4],
[3, 3],
[1, 0]],
[[1, 0],
[1, 0],
[4, 1],
[2, 4]],
[[3, 3],
[3, 0],
[1, 4],
[0, 2]],
[[2, 4],
[2, 1],
[3, 2],
[3, 1]],
[[3, 4],
[4, 3],
[0, 3],
[3, 0]]])
However, this code does not seem to be the most optimal code. It has an explicit for which reduces the speed and it uses an intermediary list which unnecessarily allocates extra memory. So, I like to find a vectorized solution. My best guess was the following code, but it does not work.
output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
Traceback (most recent call last):
File "<ipython-input-406-5bef44da348f>", line 1, in <module>
output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
File "C:\Anaconda\lib\site-packages\numpy\lib\shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (3,2) into shape (3)
Would you please suggest an efficient way to this problem.
UPDATE
Below is the actual function that I want to apply. It performs Multidimensional Classical Scaling. The objective of the question was not to optimize the internal workings of the function, but to find a generalize method for vectorizing the function apply. But, in the spirit of full disclosure I put the actual function here. Note that this function only works if p == m*(m-1)/2
def mds_classical_scaling(v, m, n):
# create a symmetric distance matrix from the elements in vector v
D = np.zeros((m, m))
D[np.triu_indices(4, k=1)] = v
D = (D + D.T)
# Transform the symmetric matrix
A = -0.5 * (D**2)
# Create centering matrix
H = np.eye(m) - np.ones((m, m))/m
# Doubly center A and store in B
B = H*A*H
# B should be positive definite otherwise the function
# would not work.
mu, V = eig(B)
#index of largest eigen values
ndx = (-mu).argsort()
# calculate the point configuration from largest eigen values
# and corresponding eigen vectors
Mu1 = diag(mu[ndx][:n])
V1 = V[:, ndx[:n]]
X = V1*sqrt(Mu1)
return X
Any performance boost I get from vectorization is negligible comparing to the actual function. The main reason was learning:)
ali_m's comment is spot-on: for serious speed gains, you should be more specific about what the function does.
That being said, if you still want to use np.apply_along_axis to get a (possibly) small speed-boost, then consider (after rereading that function's docstring) that you can easily
wrap your function to produce 1D arrays,
use np.apply_along_axis with that wrapper and
reshape the resulting array:
def test_func_wrapper(*args, **kwargs):
return test_func(*args, **kwargs).ravel()
output = np.apply_along_axis(test_func_wrapper, m=3, n=2, axis=1, arr=input_data)
np.allclose(output.reshape(5,3, -1), output_data)
# output: True
Note that this is a generic way to speed up such loops. You'll probably get better performance if you use functionality more specific to the actual problem.

Categories