I have a 3D Numpy matrix named stocks that has shape (A, P, T), with A corresponding to the number of stock symbols, P corresponding to number of prices for the stock at a given point in time, and T corresponding to the time.
stocks = np.array([ [ [1,2,3],[4,5,6],[7,8,9] ], [ [10,11,12],[13,14,15],[16,17,18] ], [ [19,20,21],[22,23,24],[25,26,27] ], [ [28,29,30],[31,32,33],[34,35,36] ] ])
I would like to return a 2D Numpy matrix of shape (P, T) where each element is the average of the stock price at a given time.
How would I do this? Thanks in advance!
import numpy as np
# shape (A, P, T)
stocks = np.array([ [ [1,2,3],[4,5,6],[7,8,9] ], [ [10,11,12],[13,14,15],[16,17,18] ], [ [19,20,21],[22,23,24],[25,26,27] ], [ [28,29,30],[31,32,33],[34,35,36] ] ])
# this will give you the mean calculated along the first dimension
# the shape will be (P, T)
out_0 = np.mean(stocks, axis=0)
# this will be of the shape (A, T)
out_1 = np.mean(stocks, axis=1)
# this will be of the shape (A, P)
out_2 = np.mean(stocks, axis=2)
Related
I want to apply a transformation matrix to a set of points. So the set of points:
points = np.array([[0 ,20], [0, 575], [0, 460]])
And I want to use the matrix I calculated with cv2.getPerspectiveTransform() which is a 3x3 matrix.
matrix = np.array([
[ -4. , -3. , 1920. ],
[ -2.25 , -1.6875 , 1080. ],
[ -0.0020833, -0.0015625, 1. ]])
Then I pass the array and a matrix to the following function:
def poly_points_transform(poly_points, matrix):
poly_points_transformed = np.empty_like(poly_points)
for i in range(len(poly_points)):
point = np.array([[poly_points[i]]])
transformed_point = cv2.perspectiveTransform(point, matrix)
np.append(poly_points_transformed, transformed_point)
return poly_points_transformed
Now It doesn't throw an error, but it just copies the src array to the poly_points_transformed. It might be something really rudimentary and stupid. If it is the case, I am sorry, but could someone give me a hint on what is wrong? Thanks in advance
We may solve it with one line of code:
transformed_point = cv2.perspectiveTransform(np.array([points], np.float64), matrix)[0]
As Micka commented cv2.perspectiveTransform takes a list of points (and returns a list of points as output).
np.array([points]) is used because cv2.perspectiveTransform expects 3D array.
For details see trouble getting cv.transform to work.
np.float64 is used in case the dtype of points is int32 (the method accepts float64 and float32 types).
[0] is used for removing the redundant dimension (convert from 3D to 2D).
For fixing the loop, replace np.append(poly_points_transformed, transformed_point) with:
poly_points_transformed[i] = transformed_point[0].
Since the array is initialized to poly_points_transformed = np.empty_like(poly_points), we can't use np.append().
Code sample:
import cv2
import numpy as np
points = np.array([[0.0 ,20.0], [0.0, 575.0], [0.0, 460.0]])
matrix = np.array([
[ -4. , -3. , 1920. ],
[ -2.25 , -1.6875 , 1080. ],
[ -0.0020833, -0.0015625, 1. ]])
# transformed_point = cv2.perspectiveTransform(np.array([points], np.float64), matrix)[0]
def poly_points_transform(poly_points, matrix):
poly_points_transformed = np.empty_like(poly_points)
for i in range(len(poly_points)):
point = np.array([[poly_points[i]]])
transformed_point = cv2.perspectiveTransform(point, matrix)
poly_points_transformed[i] = transformed_point[0] #np.append(poly_points_transformed, transformed_point)
return poly_points_transformed
poly_points_transformed = poly_points_transform(points, matrix)
The result is:
poly_points_transformed =
array([[1920., 1080.],
[1920., 1080.],
[1920., 1080.]])
Why are we getting [1920.0, 1080.0] value for all the transformed points?
Lets transform the middle point mathematically:
Multiply matrix by point (with 1 in the third index)
[ -4. , -3. , 1920. ] [ 0]
[ -2.25 , -1.6875 , 1080. ] * [575] =
[ -0.0020833, -0.0015625, 1. ] [ 1]
p = matrix # np.array([[0.0], [575.0], [1.0]]) =
[1.950000e+02]
[1.096875e+02]
[1.015625e-01]
Now divide the coordinates by the last element (converting homogeneous coordinates to Euclidian coordinates):
[1.950000e+02/1.015625e-01] [1920]
[1.096875e+02/1.015625e-01] = p / p[2] = [1080]
[1.015625e-01/1.015625e-01] [ 1]
The equivalent Euclidian point is [1920, 1080].
The transformation matrix may be wrong, because it transforms all the input points (with x coordinate equals 0) to the same output point...
I have two arrays named x and y. The goal is to iterate them as the input for pandas calculation.
Here's an example.
Iterating each x and y and appending the calculation result to the res list is slow.
The calculation is to get the exponential of each column modified by a and then sum together, multiply with b. Anyway, this calculation can be replaced by any other calculations.
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 1)),columns=['data'])
x = np.linspace(1, 24, 4)
y = np.linspace(10, 1500, 5)
res = []
for a in x:
for b in y:
res.append(np.exp(-df/a).sum().values[0]*b)
res = np.array(res).reshape(4, 5)
expected output:
array([[ 11.67676844, 446.63639283, 881.59601721, 1316.5556416 ,
1751.51526599],
[ 37.52524129, 1435.34047927, 2833.15571725, 4230.97095523,
5628.78619321],
[ 42.79406912, 1636.87314392, 3230.95221871, 4825.0312935 ,
6419.1103683 ],
[ 44.93972433, 1718.94445549, 3392.94918665, 5066.95391781,
6740.95864897]])
You can use numpy broadcasting:
res = np.array(res).reshape(4, 5)
print (res)
[[ 11.67676844 446.63639283 881.59601721 1316.5556416 1751.51526599]
[ 37.52524129 1435.34047927 2833.15571725 4230.97095523 5628.78619321]
[ 42.79406912 1636.87314392 3230.95221871 4825.0312935 6419.1103683 ]
[ 44.93972433 1718.94445549 3392.94918665 5066.95391781 6740.95864897]]
res = np.exp(-df.to_numpy()/x).sum(axis=0)[:, None] * y
print (res)
[[ 11.67676844 446.63639283 881.59601721 1316.5556416 1751.51526599]
[ 37.52524129 1435.34047927 2833.15571725 4230.97095523 5628.78619321]
[ 42.79406912 1636.87314392 3230.95221871 4825.0312935 6419.1103683 ]
[ 44.93972433 1718.94445549 3392.94918665 5066.95391781 6740.95864897]]
I think what you want is:
z = -df['data'].to_numpy()
res = np.exp(z/x[:, None]).sum(axis=1)[:, None]*y
output:
array([[ 11.67676844, 446.63639283, 881.59601721, 1316.5556416 ,
1751.51526599],
[ 37.52524129, 1435.34047927, 2833.15571725, 4230.97095523,
5628.78619321],
[ 42.79406912, 1636.87314392, 3230.95221871, 4825.0312935 ,
6419.1103683 ],
[ 44.93972433, 1718.94445549, 3392.94918665, 5066.95391781,
6740.95864897]])
Say I have a matrix Y of random float numbers from 0 to 10 with shape (10, 3):
import numpy as np
np.random.seed(99)
Y = np.random.uniform(0, 10, (10, 3))
print(Y)
Output:
[[6.72278559 4.88078399 8.25495174]
[0.31446388 8.08049963 5.6561742 ]
[2.97622499 0.46695721 9.90627399]
[0.06825733 7.69793028 7.46767101]
[3.77438936 4.94147452 9.28948392]
[3.95454044 9.73956297 5.24414715]
[0.93613093 8.13308413 2.11686786]
[5.54345785 2.92269116 8.1614236 ]
[8.28042566 2.21577372 6.44834702]
[0.95181622 4.11663239 0.96865261]]
I am now given a matrix X with same shape that can be seen as obtained by adding small noises to Y and then shuffling the rows:
X = np.random.normal(Y, scale=0.1)
np.random.shuffle(X)
print(X)
Output:
[[ 4.04067271 9.90959141 5.19126867]
[ 5.59873104 2.84109306 8.11175891]
[ 0.10743952 7.74620162 7.51100441]
[ 3.60396019 4.91708372 9.07551354]
[ 0.9400948 4.15448712 1.04187208]
[ 2.91884302 0.47222752 10.12700505]
[ 0.30995155 8.09263241 5.74876947]
[ 1.11247872 8.02092335 1.99767444]
[ 6.68543696 4.8345869 8.17330513]
[ 8.38904822 2.11830619 6.42013343]]
Now I want to sort the matrix X based on Y by row. I already know each pair of column values in each matching pair of rows are not different from each other more than a tolerance of 0.5. I managed to write the following code and it is working fine.
def sort_X_by_Y(X, Y, tol):
idxs = [next(i for i in range(len(X)) if all(abs(X[i] - row) <= tol)) for row in Y]
return X[idxs]
print(sort_X_by_Y(X, Y, tol=0.5))
Output:
[[ 6.68543696 4.8345869 8.17330513]
[ 0.30995155 8.09263241 5.74876947]
[ 2.91884302 0.47222752 10.12700505]
[ 0.10743952 7.74620162 7.51100441]
[ 3.60396019 4.91708372 9.07551354]
[ 4.04067271 9.90959141 5.19126867]
[ 1.11247872 8.02092335 1.99767444]
[ 5.59873104 2.84109306 8.11175891]
[ 8.38904822 2.11830619 6.42013343]
[ 0.9400948 4.15448712 1.04187208]]
However, in reality I am sorting (1000, 3) matrices and my code is way too slow. I feel like there should be more numpyic way to code this. Any suggestions?
This is a vectorized version of your algorithm. It runs ~26.5x faster than your implementation for 1000 samples. But an additional boolean array with shape (1000,1000,3) is created. There is a chance that rows will have similar values within the tolerance and a wrong row is selected.
tol = .5
X[(np.abs(Y[:, np.newaxis] - X) <= tol).all(2).argmax(1)]
Output
array([[ 6.68543696, 4.8345869 , 8.17330513],
[ 0.30995155, 8.09263241, 5.74876947],
[ 2.91884302, 0.47222752, 10.12700505],
[ 0.10743952, 7.74620162, 7.51100441],
[ 3.60396019, 4.91708372, 9.07551354],
[ 4.04067271, 9.90959141, 5.19126867],
[ 1.11247872, 8.02092335, 1.99767444],
[ 5.59873104, 2.84109306, 8.11175891],
[ 8.38904822, 2.11830619, 6.42013343],
[ 0.9400948 , 4.15448712, 1.04187208]])
More robust solutions with L1-norm
X[np.abs(Y[:, np.newaxis] - X).sum(2).argmin(1)]
Or L2-norm
X[((Y[:, np.newaxis] - X)**2).sum(2).argmin(1)]
I would like to interpolate between two lists in which 1st one contains numbers and second one contains arrays.
I tried using interp1d from scipy, but it did not work
from scipy import interpolate
r = [2,3,4]
t = [5,6,7]
f = [r,t]
q = [10,20]
c = interpolate.interp1d(q, f)
I would like to get an array, for example at value 15, which should be interpolated values between r and t arrays
Error message:
ValueError: x and y arrays must be equal in length along interpolation axis.
In the simple example of the OP it does not make a difference whether one takes 1D or 2D interpolation. If more vectors come into play, however, it makes a difference. Here both options, using numpy and taking care of floating point.
from scipy.interpolate import interp1d
from scipy.interpolate import interp2d
import numpy as np
r = np.array( [ 1, 1, 2], np.float )
s = np.array( [ 2, 3, 4], np.float )
t = np.array( [ 5, 6, 12], np.float ) # length of r,s,t,etc must be equal
f = np.array( [ r, s, t ] )
q = np.array( [ 0, 10, 20 ], np.float ) # length of q is length of f
def interpolate_my_array1D( x, xData, myArray ):
out = myArray[0].copy()
n = len( out )
for i in range(n):
vec = myArray[ : , i ]
func = interp1d( xData, vec )
out[ i ] = func( x )
return out
def interpolate_my_array2D( x, xData, myArray ):
out = myArray[0].copy()
n = len( out )
xDataLoc = np.concatenate( [ [xx] * n for xx in xData ] )
yDataLoc = np.array( range( n ) * len( xData ), np.float )
zDataLoc = np.concatenate( myArray )
func = interp2d( xDataLoc, yDataLoc, zDataLoc )
out = np.fromiter( ( func( x, yy ) for yy in range(n) ), np.float )
return out
print interpolate_my_array1D( 15., q, f )
print interpolate_my_array2D( 15., q, f )
giving
>> [3.5 4.5 5.5]
>> [2.85135135 4.17567568 6.05405405]
Following is the link to the interp1d function in scipy documentation interpolate SciPy.
From the docs you can see that the function does not take a list of list as an input. the inputs need to be either numpy arrays or list of primitive values.
The question:
For this problem, you are given a list of matrices called As, and your job is to find the QR factorization for each of them.
Implement qr_by_gram_schmidt: This function takes as input a matrix A and computes a QR decomposition, returning two variables, Q and R where A=QR, with Q orthogonal and R zero below the diagonal.
A is an n×m matrix with n≥m (i.e. more rows than columns).
You should implement this function using the modified Gram-Schmidt procedure.
INPUT:
As: List of arrays
OUTPUT:
Qs: List of the Q matrices output by qr_by_gram_schmidt, in the same order as As. For a matrix A of shape n×m, Q should have shape n×m.
Rs: List of the R matrices output by qr_by_gram_schmidt, in the same order as As. For a matrix A of shape n×m, R should have shape m×m
I have written the code for the QR factorization which I believe is correct:
import numpy as np
def qr_by_gram_schmidt(A):
m = np.shape(A)[0]
n = np.shape(A)[1]
Q = np.zeros((m, m))
R = np.zeros((n, n))
for j in xrange(n):
v = A[:,j]
for i in xrange(j):
R[i,j] = Q[:,i].T * A[:,j]
v = v.squeeze() - (R[i,j] * Q[:,i])
R[j,j] = np.linalg.norm(v)
Q[:,j] = (v / R[j,j]).squeeze()
return Q, R
How do I write the loop to calculate the the QR factorization of each of the matrices in As and storing them in that order?
edit: The code has some error too. I will appreciate it if you can help me in debugging it.
Thanks
I didn't check your GS code, but had to make a change (may not be correct!) to make it compile. You just have to set up a list of your matrices, I made 2 of them and then loop through that list and apply your function.
import numpy as np
def gs(A):
m = np.shape(A)[0]
n = np.shape(A)[1]
Q = np.zeros((m, m))
R = np.zeros((n, n))
print m,n,Q,R
for j in xrange(n):
v = A[:,j]
for i in xrange(j):
R[i,j] = np.dot(Q[:,i].T , A[:,j]) # I made an arbitrary change here!!!
v = v.squeeze() - (R[i,j] * Q[:,i])
R[j,j] = np.linalg.norm(v)
Q[:,j] = (v / R[j,j]).squeeze()
return Q, R
As= np.random.rand(2,3,3) # list of 2 (3x3) matrices
print As
for A in As:
print gs(A)
Output:
[[[ 0.9599614 0.02213113 0.43343881]
[ 0.44202415 0.6816688 0.88321052]
[ 0.93098107 0.80528361 0.88473308]]
[[ 0.41794678 0.10762796 0.42110659]
[ 0.89598082 0.81225543 0.52947205]
[ 0.0621515 0.59826789 0.14021332]]]
(array([[ 0.68158915, -0.67980134, 0.27075149],
[ 0.31384477, 0.60583989, 0.73106736],
[ 0.66101262, 0.41331364, -0.626286 ]]), array([[ 1.40841649, 0.76132516, 1.15743793],
[ 0. , 0.73077208, 0.60610414],
[ 0. , 0. , 0.20894464]]))
(array([[ 0.42190511, -0.39510208, 0.81602109],
[ 0.90446656, 0.121136 , -0.40898205],
[ 0.06274013, 0.91061541, 0.40846452]]), array([[ 0.99061796, 0.81760207, 0.66535379],
[ 0. , 0.6006613 , 0.02543844],
[ 0. , 0. , 0.18435946]]))