Numpy rank 1 arrays - python

I am Matlab/Octave user. Numpy documentation says the array is much more advisable to use rather than matrix. Is there a convenient way to deal with rank-1 arrays, without reshaping it constantly?
Example:
data = np.loadtxt("ex1data1.txt", usecols=(0,1), delimiter=',',dtype=None)
X = data[:, 0]
y = data[:, 1]
m = len(y)
print X.shape, y.shape
>>> (97L, ) (97L, )
I can't add new column to X using concatenate, vstack, append, except np.c_ which is slower, without reshaping X:
X = np.concatenate((np.ones((m, 1)), X), axis = 1)
>>> ValueError: all the input arrays must have same number of dimensions
X - y, couldn't be done without reshaping y np.reshape(y, (-1, 1))

A simpler equivalent to np.reshape(y, (-1, 1)) is y[:, np.newaxis]. Since np.newaxis is an alias for None, y[:, None] also works. It's also worth mentioning np.expand_dims(y, axis=1).

Related

Writing a Transpose a vector in python

I have to write a python function where i need to compute the vector
For A is n by n and xn is n by 1
r_n = Axn - (xn^TAxn)xn
Im using numpy but .T doesn't work on vectors and when I just do
r_n = A#xn - (xn#A#xn)#xn but xn#A#xn gives me a scaler.
I've tried changing the A with the xn but nothing seems to work.
Making a 3x1 numpy array like this...
import numpy as np
a = np.array([1, 2, 3])
...and then attempting to take its transpose like this...
a_transpose = a.T
...will, confusingly, return this:
# [1 2 3]
If you want to define a (column) vector whose transpose you can meaningfully take, and get a row vector in return, you need to define it like this:
a = np.reshape(np.array([1, 2, 3]), (3, 1))
print(a)
# [[1]
# [2]
# [3]]
a_transpose = a.T
print(a_transpose)
# [[1 2 3]]
If you want to define a 1 x n array whose transpose you can take to get an n x 1 array, you can do it like this:
a = np.array([[1, 2, 3]])
and then get its transpose by calling a.T.
If A is (n,n) and xn is (n,1):
A#xn - (xn#A#xn)#xn
(n,n)#(n,1) - ((n,1)#(n,n)#(n,1)) # (n,1)
(n,1) error (1 does not match n)
If xn#A#xn gives scalar that's because xn is (n,) shape; as per np.matmul docs that's a 2d with two 1d arrays
(n,)#(n,n)#(n,) => (n,)#(n,) -> scalar
I think you want
(1,n) # (n,n) # (n,1) => (1,1)
Come to think of it that (1,1) array should be same single values as the scalar.
Sample calculation; 1st with the (n,) shape:
In [6]: A = np.arange(1,10).reshape(3,3); x = np.arange(1,4)
In [7]: A#x
Out[7]: array([14, 32, 50]) # (3,3)#(3,)=>(3,)
In [8]: x#A#x # scalar
Out[8]: 228
In [9]: (x#A#x)#x
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 (x#A#x)#x
ValueError: matmul: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)
matmul does not like to work with scalars. But we can use np.dot instead, or simply multiply:
In [10]: (x#A#x)*x
Out[10]: array([228, 456, 684]) # (3,)
In [11]: A#x - (x#A#x)*x
Out[11]: array([-214, -424, -634])
Change the array to (3,1):
In [12]: xn = x[:,None]; xn.shape
Out[12]: (3, 1)
In [13]: A#xn - (xn.T#A#xn)*xn
Out[13]:
array([[-214],
[-424],
[-634]]) # same numbers but in (3,1) shape

Creating 3d Tensor Array from 2d Array (Python)

I have two numpy arrays (4x4 each). I would like to concatenate them to a tensor of (4x4x2) in which the first 'sheet' is the first array, second 'sheet' is the second array, etc. However, when I try np.stack the output of d[1] is not showing the correct values of the first matrix.
import numpy as np
x = array([[ 3.38286851e-02, -6.11905173e-05, -9.08147798e-03,
-2.46860166e-02],
[-6.11905173e-05, 1.74237508e-03, -4.52140165e-04,
-1.22904439e-03],
[-9.08147798e-03, -4.52140165e-04, 1.91939979e-01,
-1.82406361e-01],
[-2.46860166e-02, -1.22904439e-03, -1.82406361e-01,
2.08321422e-01]])
print(np.shape(x)) # 4 x 4
y = array([[ 6.76573701e-02, -1.22381035e-04, -1.81629560e-02,
-4.93720331e-02],
[-1.22381035e-04, 3.48475015e-03, -9.04280330e-04,
-2.45808879e-03],
[-1.81629560e-02, -9.04280330e-04, 3.83879959e-01,
-3.64812722e-01],
[-4.93720331e-02, -2.45808879e-03, -3.64812722e-01,
4.16642844e-01]])
print(np.shape(y)) # 4 x 4
d = np.dstack((x,y))
np.shape(d) # indeed it is 4,4,2... but if I do d[1] then it is not the first x matrix.
d[1] # should be y
If you do np.dstack((x, y)), which is the same as the more explicit np.stack((x, y), axis=-1), you are concatenating along the last, not the first axis (i.e., the one with size 2):
(x == d[..., 0]).all()
(y == d[..., 1]).all()
Ellipsis (...) is a python object that means ": as many times as necessary" when used in an index. For a 3D array, you can equivalently access the leaves as
d[:, :, 0]
d[:, :, 1]
If you want to access the leaves along the first axis, your array must be (2, 4, 4):
d = np.stack((x, y), axis=0)
(x == d[0]).all()
(y == d[1]).all()
Use np.stack instead of np.dstack:
>>> d = np.stack([y, x])
>>> np.all(d[0] == y)
True
>>> np.all(d[1] == x)
True
>>> d.shape
(2, 4, 4)

Filter multidimensional numpy array using the percentile of each slice

I have a numpy array of shape x,y,z which represents z matrixes of x by y. I can slice each of the matrixes and then use clip with percentiles to filter out outliers:
mx = array[:, :, 0] # taking the first matrix
filtered_mx = np.clip(mx, np.percentile(mx, 1), np.percentile(mx, 99))
Is there some efficient way to do the same without doing it on a slice at a time?
You can pass arrays to np.clip, so it is possible to have different limits across the z dimension of mx:
import numpy as np
# Create random mx
x, y, z = 10, 11, 12
mx = np.random.random((x, y, z))
# Calculate the percentiles across the x and y dimension
perc01 = np.percentile(mx, 1, axis=(0, 1), keepdims=True)
perc99 = np.percentile(mx, 99, axis=(0, 1), keepdims=True)
# Clip array with different limits across the z dimension
filtered_mx = np.clip(mx, a_min=perc01, a_max=perc99)

Concatenate with broadcast

Consider the following arrays:
a = np.array([0,1])[:,None]
b = np.array([1,2,3])
print(a)
array([[0],
[1]])
print(b)
b = np.array([1,2,3])
Is there a simple way to concatenate these two arrays in a way that the latter is broadcast, in order to obtain the following?
array([[0, 1, 2, 3],
[1, 1, 2, 3]])
I've seen there is this closed issue with a related question. An alternative is proposed involving np.broadcast_arrays, however I cannot manage to adapt it to my example. Is there some way to do this, excluding the np.tile/np.concatenate solution?
You can do it in the following way
import numpy as np
a = np.array([0,1])[:,None]
b = np.array([1,2,3])
b_new = np.broadcast_to(b,(a.shape[0],b.shape[0]))
c = np.concatenate((a,b_new),axis=1)
print(c)
Here a more general solution:
def concatenate_broadcast(arrays, axis=-1):
def broadcast(x, shape):
shape = [*shape] # weak copy
shape[axis] = x.shape[axis]
return np.broadcast_to(x, shape)
shapes = [list(a.shape) for a in arrays]
for s in shapes:
s[axis] = 1
broadcast_shape = np.broadcast(*[
np.broadcast_to(0, s)
for s in shapes
]).shape
arrays = [broadcast(a, broadcast_shape) for a in arrays]
return np.concatenate(arrays, axis=axis)
I had a similar problem where I had two matrices x and y of size (X, c1) and (Y, c2) and wanted the result to be the matrix of size (X * Y, c1 + c2) where the rows of the result were all the concatenations of rows from x and rows from y.
I, like the original poster, was disappointed to discover that concatenate() would not do broadcasting for me. I thought of using the solution above, except X and Y could potentially be large, and that solution would use a large temporary array.
I finally came up with the following:
result = np.empty((x.shape[0], y.shape[0], x.shape[1] + y.shape[1]), dtype=x.dtype)
result[...,:x.shape[0]] = x[:,None,:]
result[...,x.shape[0]:] = y[None,:,:]
result = result.reshape((-1, x.shape[1] + y.shape[1]))
I create a result array of size (X, Y, c1 + c2), I broadcast in the contents of x and y, and then reshape the results to the right size.

Issue while reshaping a torch tensor from [10,200,1] to [2000,1,1]

I am having a problem when trying to reshape a torch tensor Yp of dimensions [10,200,1] to [2000,1,1]. The tensor is obtained from a numpy array y of dimension [2000,1]. I am doing the following:
Yp = reshape(Yp, (-1,1,1))
I try to subtract the result to a torch tensor version of y by doing:
Yp[0:2000,0] - torch.from_numpy(y[0:2000,0])
I expect the result to be an array of zeros, but that is not the case. Calling different orders when reshaping (order = 'F' or 'C') does not solve the problem, and strangely outputs the same result when doing the subtraction. I only manage to get an array of zeros by calling on the tensor Yp the ravel method with order = 'F'.
What am I doing wrong? I would like to solve this using reshape!
I concur with #linamnt's comment (though the actual resulting shape is [2000, 1, 2000]).
Here is a small demonstration:
import torch
import numpy as np
# Your inputs according to question:
y = np.random.rand(2000, 1)
y = torch.from_numpy(y[0:2000,0])
Yp = torch.reshape(y, (10,200,1))
# Your reshaping according to question:
Yp = torch.reshape(Yp, (-1,1,1))
# (note: Tensor.view() may suit your need more if you don't want to copy values)
# Your subtraction:
y_diff = Yp - y
print(y_diff.shape)
# > torch.Size([2000, 1, 2000])
# As explained by #linamnt, unwanted broadcasting is done
# since the dims of your tensors don't match
# If you give both your tensors the same shape, e.g. [2000, 1, 1] (or [2000]):
y_diff = Yp - y.view(-1, 1, 1)
print(y_diff.shape)
# > torch.Size([2000, 1, 1])
# Checking the result tensor contains only 0 (by calculing its abs. sum):
print(y_diff.abs().sum())
# > 0

Categories