I am preprocessing my data to make this work:
model = LogisticRegression()
model.fit(X, Y)
I am struggling to reshape my numpy.ndarray.
At this point, for Y I have:
Y
array([array([[52593.4410802]]), array([[52593.4410802]])], dtype=object)
Y.shape
(2,)
type(Y)
<class 'numpy.ndarray'>
And for X, I have:
X
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
X.shape
(2,)
type(X)
<class 'numpy.ndarray'>
I would like to get my X and transform so each data becomes a column/feature (idea of transpose). So each value would became a feature something like this idea:
X[0][0]
array([34.07824204])
X[0][1]
array([33.36032467])
# Sudo code idea:
# X_new = [0][0],[0][1],...
# X_new = append(X_new,[1][0],[1][1]...)
What I have tried:
nsamples, nx, ny = X.shape
d2_train_dataset = X.reshape((nsamples,nx*ny))
Also, I tried to reshape and transpose but it will not give what I need:
X
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
X.T
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
As suggested in one of the comments, I tried, without sucess to:
(I get the output as input)
X.flatten()
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
As I can understand from Y, your labels are continuous, not discrete. Your data suggest that you need a regression model, but you are trying to fit a binary classifier, logistic regression. As a regression algorithm, you may use linear regression, Support Vector Regression or any other regression model.
Before reshaping, get rid of your arrays in arrays.
You can do this easily with numpy.stack. For instance
import numpy
from numpy import array
Y = array([array([[52593.4410802]]), array([[52593.4410802]])], dtype=object)
Y = numpy.stack(Y)
print(Y.shape)
print(Y)
gives:
(2,1,1)
[[[52593.4410802]]
[[52593.4410802]]]
From this, you can reshape to what you need.
Related
I have the following features :
array([[290., 50.],
[290., 46.],
[285., 44.],
...,
[295., 46.],
[299., 46.],
[ 0., 0.]])
after transforming it with:
from sklearn.preprocessing import StandardScaler
self.scaler = StandardScaler()
self.scaled_features = self.scaler.fit_transform(self.features)
I have scaled_features:
array([[ 0.27489919, 0.71822864],
[ 0.27489919, 0.26499222],
[ 0.18021955, 0.03837402],
...,
[ 0.36957884, 0.26499222],
[ 0.44532255, 0.26499222],
[-5.2165202 , -4.94722653]])
Now I wish to get a sample from the self.scaler so I send my new feature example t:
t = [299.0, 46.0]
new_data = np.array(t).reshape(-1, 1)
new_data_scaled = self.scaler.transform(t)
I get
non-broadcastable output operand with shape (2,1) doesn't match the broadcast shape (2,2)
what am I doing wrong? Why new_data is not scalled?
There are two things, first you are putting the list t into transform and not new_data. Second, new_date has shape (2,1) but should have shape (1,2). So if you change it to
t = [299.0, 46.0]
new_data = np.array(t).reshape(1, -1)
new_data_scaled = self.scaler.transform(new_data)
you should get scaled data.
I am working on polynomial train-test fit problem and want to convert a list object into a numpy array of the form (4, 100). (i.e., 4 rows, 100 columns)
I have the following code:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from numpy import array
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
results = []
pred_data = np.linspace(0,10,100)
degree = [1,3,6,9]
y_train1 = y_train.reshape(-1,1)
for i in degree:
poly = PolynomialFeatures(degree=i)
pred_poly1 = poly.fit_transform(pred_data[:,np.newaxis])
X_F1_poly = poly.fit_transform(X_train[:,np.newaxis])
linreg = LinearRegression().fit(X_F1_poly, y_train1)
pred = linreg.predict(pred_poly1)
results.append(pred)
dataArray = np.array(results).reshape(4, 100)
return dataArray
The code works fine and returns an array of (4, 100), but the output looks like something of 100 rows and 4 columns, and once I removed the ".reshape(4, 100)" part from the np.array function, the dimension of the output becomes (4, 100, 1). (I apologize for my ignorance, what does the 1 in (4, 100, 1) stand for?)
I guess there's something wrong with my list comprehension that I couldn't figure out at the moment. Could anyone help point me the error on my code or make recommendation on how to convert/reshape the output array into the desired (4, 100) format?
Thank you.
Lets run a simplified version of your code, leaving out the details of what the sklearn polyfit is doing:
In [248]: results = []
...: pred_data = np.linspace(0,10,100)
...: degree = [1,3,6,9]
...:
In [249]: for i in degree:
...: results.append(pred_data[:,np.newaxis])
...:
In [250]: len(results)
Out[250]: 4
In [251]: results[0].shape
Out[251]: (100, 1)
In [252]: arr = np.array(results)
In [253]: arr.shape
Out[253]: (4, 100, 1)
pred_data is (100,) (by linespace construction). newaxis makes it (100,1). Do something with it, and collect the result 4x, the result is a list of 4 (100,1) arrays. Join those into one array and we get a 3d (4,100,1) array.
The display of arr starts as:
array([[[ 0. ],
[ 0.1010101 ],
[ 0.2020202 ],
...
[ 9.7979798 ],
[ 9.8989899 ],
[ 10. ]]])
The inner elements are [...], consistent with that last size 1 dimension.
I can remove the last dimension in various ways
arr.reshape(4,100)
arr[:,:,0]
np.squeeze(arr)
I don't know enough of the sklearn code to know whether you really need pred_data[:,np.newaxis]. I have seen shapes like (#samples, #features) in other sklearn questions. So a shape like (100,1) might be correct if you have 100 samples and 1 feature.
my code is:
import numpy as np
import scipy.io as spio
x=np.zeros((22113,1),float)
x= spio.loadmat('C:\\Users\\dell\\Desktop\\Rabia Ahmad spring 2016\\'
'FYP\\1. Matlab Work\\record work\\kk.mat')
print(x)
x = np.reshape(len(x),1);
h = np.array([0.9,0.3,0.1],float)
print(h)
h = h.reshape(len(h),1);
dd = np.convolve(h,x)
and the error I encounter is "ValueError: object too deep for desired array"
kindly help me in this reguard.
{'__globals__': [], '__version__': '1.0', 'ans': array([[ 0.13580322,
0.13580322], [ 0.13638306, 0.13638306], [ 0.13345337, 0.13345337],
..., [ 0.13638306, 0.13638306], [ 0.13345337, 0.13345337], ..., [
0.13638306, 0.13638306], [ 0.13345337, 0.13345337], ..., [-0.09136963,
-0.09136963], [-0.12442017, -0.12442017], [-0.15542603, -0.15542603]])}
See {}? That means x from the loadmat is a dictionary.
x['ans'] will be an array
array([[ 0.13580322,
0.13580322], [ 0.13638306, 0.13638306], [ 0.13345337, 0.13345337],...]])
which, if I count the [] right is a (n,2) array of floats.
The following line does not make sense:
x = np.reshape(len(x),1);
I suspect you mean x = x.reshape(...) as you do with h. But that would give an error with the dictionary x.
When you say the shape of x is (9,) and its dtype is uint16 - where in your code you verifying that?
x = np.reshape(len(x),1); doesn't do anything useful. That completely discards the data in x, and creates an array of shape (1,), with the only element being len(x).
In your code, you reshape h to (3, 1), which is a 2D array, not a 1D array, which is why convolve complains.
Remove both of your reshapes, and instead just pass squeeze=True to scipy.io.loadmat - this is needed because matlab does not have the concept as 1d arrays, and squeeze tells scipy to try and flatten (N, 1) and (1, N) arrays to (N,) arrays
I want solve linear equation Ax= b, each A contains in 3d matrix. For-example,
In Ax = B,
Suppose A.shape is (2,3,3)
i.e. = [[[1,2,3],[1,2,3],[1,2,3]] [[1,2,3],[1,2,3],[1,2,3]]]
and B.shape is (3,1)
i.e. [1,2,3]^T
And I want to know each 3-vector x of Ax = B i.e.(x_1, x_2, x_3).
What comes to mind is multiply B with np.ones(2,3) and use function dot with the inverse of each A element. But It needs loop to do this.(which consumes lots of time when matrix size going up high) (Ex. A[:][:] = [1,2,3])
How can I solve many Ax = B equation without loop?
I made elements of A and B are same, but as you probably know, it is just example.
For invertible matrices, we could use np.linalg.inv on the 3D array A and then use tensor matrix-multiplication with B so that we lose the last and first axes of those two arrays respectively, like so -
np.tensordot( np.linalg.inv(A), B, axes=((-1),(0)))
Sample run -
In [150]: A
Out[150]:
array([[[ 0.70454189, 0.17544101, 0.24642533],
[ 0.66660371, 0.54608536, 0.37250876],
[ 0.18187631, 0.91397945, 0.55685133]],
[[ 0.81022308, 0.07672197, 0.7427768 ],
[ 0.08990586, 0.93887203, 0.01665071],
[ 0.55230314, 0.54835133, 0.30756205]]])
In [151]: B = np.array([[1],[2],[3]])
In [152]: np.linalg.solve(A[0], B)
Out[152]:
array([[ 0.23594665],
[ 2.07332454],
[ 1.90735086]])
In [153]: np.linalg.solve(A[1], B)
Out[153]:
array([[ 8.43831557],
[ 1.46421396],
[-8.00947932]])
In [154]: np.tensordot( np.linalg.inv(A), B, axes=((-1),(0)))
Out[154]:
array([[[ 0.23594665],
[ 2.07332454],
[ 1.90735086]],
[[ 8.43831557],
[ 1.46421396],
[-8.00947932]]])
Alternatively, the tensor matrix-multiplication could be replaced by np.matmul, like so -
np.matmul(np.linalg.inv(A), B)
On Python 3.x, we could use # operator for the same functionality -
np.linalg.inv(A) # B
I have x,y data:
import numpy as np
x = np.array([ 2.5, 1.25, 0.625, 0.3125, 0.15625, 0.078125])
y = np.array([ 2448636.,1232116.,617889.,310678.,154454.,78338.])
X = np.vstack((x, np.zeros(len(x))))
popt,res,rank,val = np.linalg.lstsq(X.T,y)
popt,res,rank,val
Gives me:
(array([ 981270.29919414, 0. ]),
array([], dtype=float64),
1,
array([ 2.88639894, 0. ]))
Why are the residuals zero ? If I add ones instead of zero the residuals are calculated:
X = np.vstack((x, np.ones(len(x)))) # added ones instead of zeros
popt,res,rank,val = np.linalg.lstsq(X.T,y)
popt,res,rank,val
(array([ 978897.28500355, 4016.82089552]),
array([ 42727293.12864216]),
2,
array([ 3.49623683, 1.45176681]))
Additionally, If I calculate the sum of squared residuals in excel i get 9261214 if the intercept is set zero and 5478137 if ones are added to x.
lstsq is going to have a tough time fitting to that column of zeros: any value of the corresponding parameter (presumably intercept) will do.
To fix the intercept to 0, if that's what you need to do, just send the x array, but make sure that it's the right shape for lstsq:
In [214]: popt,res,rank,val = np.linalg.lstsq(np.atleast_2d(x).T,y)
In [215]: popt
Out[215]: array([ 981270.29919414])
In [216]: res
Out[216]: array([ 92621214.2278382])