I am working on polynomial train-test fit problem and want to convert a list object into a numpy array of the form (4, 100). (i.e., 4 rows, 100 columns)
I have the following code:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from numpy import array
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
results = []
pred_data = np.linspace(0,10,100)
degree = [1,3,6,9]
y_train1 = y_train.reshape(-1,1)
for i in degree:
poly = PolynomialFeatures(degree=i)
pred_poly1 = poly.fit_transform(pred_data[:,np.newaxis])
X_F1_poly = poly.fit_transform(X_train[:,np.newaxis])
linreg = LinearRegression().fit(X_F1_poly, y_train1)
pred = linreg.predict(pred_poly1)
results.append(pred)
dataArray = np.array(results).reshape(4, 100)
return dataArray
The code works fine and returns an array of (4, 100), but the output looks like something of 100 rows and 4 columns, and once I removed the ".reshape(4, 100)" part from the np.array function, the dimension of the output becomes (4, 100, 1). (I apologize for my ignorance, what does the 1 in (4, 100, 1) stand for?)
I guess there's something wrong with my list comprehension that I couldn't figure out at the moment. Could anyone help point me the error on my code or make recommendation on how to convert/reshape the output array into the desired (4, 100) format?
Thank you.
Lets run a simplified version of your code, leaving out the details of what the sklearn polyfit is doing:
In [248]: results = []
...: pred_data = np.linspace(0,10,100)
...: degree = [1,3,6,9]
...:
In [249]: for i in degree:
...: results.append(pred_data[:,np.newaxis])
...:
In [250]: len(results)
Out[250]: 4
In [251]: results[0].shape
Out[251]: (100, 1)
In [252]: arr = np.array(results)
In [253]: arr.shape
Out[253]: (4, 100, 1)
pred_data is (100,) (by linespace construction). newaxis makes it (100,1). Do something with it, and collect the result 4x, the result is a list of 4 (100,1) arrays. Join those into one array and we get a 3d (4,100,1) array.
The display of arr starts as:
array([[[ 0. ],
[ 0.1010101 ],
[ 0.2020202 ],
...
[ 9.7979798 ],
[ 9.8989899 ],
[ 10. ]]])
The inner elements are [...], consistent with that last size 1 dimension.
I can remove the last dimension in various ways
arr.reshape(4,100)
arr[:,:,0]
np.squeeze(arr)
I don't know enough of the sklearn code to know whether you really need pred_data[:,np.newaxis]. I have seen shapes like (#samples, #features) in other sklearn questions. So a shape like (100,1) might be correct if you have 100 samples and 1 feature.
Related
I was taking the Machine Learning course by Andrew Ng and in one of the practice labs, they perform this operation for Linear Regression.
x = np.arange(0, 20, 1)
y = 1 + x**2
X = x.reshape(-1, 1)
I checked out the shape of the arrays after the op
>>> print(x.shape,X.shape)
(20,) (20, 1)
What is the difference between x and X, and why can't we simply just use x.T instead of changing it to X ?
X = x.reshape(-1, 1)
gives you array like this 2d dimensions, this because each perceptron takes array of number not single number so you need to pass [1] not 1
[
[1]
[2]
]
x=np.arange(0, 20, 1)
[1
,
2
,
3
]
this can't be passed to ANN you need to reshape it to be (rows and column) as this is only 1 dimension
I have several 3-dimensional numpy arrays that I want to join together to feed them as a training set for my LSTM neural network. They are mostly of shape (1,m,n)
I want to join them so that, for e.g. np.arr(1,50,20) + np.arr(1,50,20) = np.arr(2,50,20) and np.arr(1,50,20) + np.arr(3,50,20) = np.arr(4,50,20)
Which of the stack functions of numpy would suit my problem? Or is there another way to solve it more efficiently?
Use numpy concatenate with the first axis.
import numpy as np
rng = np.random.default_rng()
a = rng.integers(0, 10, (1, 3, 20))
b = rng.integers(-10, -1, (2, 3, 20))
c = np.concatenate((a, b), axis=0)
print(c.shape)
(3, 3, 20)
Use np.vstack
x = np.array([[[2,3,5],[4,5,1]]])
y = np.array([[[1,5,8],[8,0,9]]])
x.shape
(1,2,3)
np.vstack((x,y)).shape
(2,2,3)
I am preprocessing my data to make this work:
model = LogisticRegression()
model.fit(X, Y)
I am struggling to reshape my numpy.ndarray.
At this point, for Y I have:
Y
array([array([[52593.4410802]]), array([[52593.4410802]])], dtype=object)
Y.shape
(2,)
type(Y)
<class 'numpy.ndarray'>
And for X, I have:
X
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
X.shape
(2,)
type(X)
<class 'numpy.ndarray'>
I would like to get my X and transform so each data becomes a column/feature (idea of transpose). So each value would became a feature something like this idea:
X[0][0]
array([34.07824204])
X[0][1]
array([33.36032467])
# Sudo code idea:
# X_new = [0][0],[0][1],...
# X_new = append(X_new,[1][0],[1][1]...)
What I have tried:
nsamples, nx, ny = X.shape
d2_train_dataset = X.reshape((nsamples,nx*ny))
Also, I tried to reshape and transpose but it will not give what I need:
X
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
X.T
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
As suggested in one of the comments, I tried, without sucess to:
(I get the output as input)
X.flatten()
array([array([[34.07824204],
[33.36032467],
[24.61158084],
...,
[34.62648953],
[34.49591937],
[34.40951467]]),
array([[ 4.50136316],
[ 7.46307729],
[17.07135805],
...,
[57.98715047],
[54.5733181 ],
[50.13691107]])], dtype=object)
As I can understand from Y, your labels are continuous, not discrete. Your data suggest that you need a regression model, but you are trying to fit a binary classifier, logistic regression. As a regression algorithm, you may use linear regression, Support Vector Regression or any other regression model.
Before reshaping, get rid of your arrays in arrays.
You can do this easily with numpy.stack. For instance
import numpy
from numpy import array
Y = array([array([[52593.4410802]]), array([[52593.4410802]])], dtype=object)
Y = numpy.stack(Y)
print(Y.shape)
print(Y)
gives:
(2,1,1)
[[[52593.4410802]]
[[52593.4410802]]]
From this, you can reshape to what you need.
I've split an image up into 16 figures to plot regression and now I want to join it back together into one image.
I've written a for loop to do this but I'm having trouble understanding the advice from previous questions and where I'm going wrong. Please could someone explain why my input arrays do not have the same number of dimensions.
from scipy import interpolate
allArrays = np.array([])
for i in range(len(a)):
fig = plt.figure()
ax = fig.add_axes([0.,0.,1.,1.])
if np.amax(a[i]) > 0:
x, y = np.where(a[i]>0)
f = interpolate.interp1d(y, x)
xnew = np.linspace(min(y), max(y), num=40)
ynew = f(xnew)
plt.plot(xnew, ynew, '-')
plt.ylim(256, 0)
plt.xlim(0,256)
fig.canvas.draw()
X = np.array(fig.canvas.renderer._renderer)
myArray = color.rgb2gray(X)
print(myArray.shape)
allArrays = np.concatenate([allArrays, myArray])
print(allArrays.shape)
else:
plt.xlim(0,256)
plt.ylim(0,256)
fig.canvas.draw()
X = np.array(fig.canvas.renderer._renderer)
myArray = color.rgb2gray(X)
print(myArray.shape)
allArrays = np.concatenate([allArrays, myArray])
print(allArrays.shape)
i += 1
Output: myArray.shape (480, 640)
Error message: all the input arrays must have same number of dimensions
I'm sure it's really simple but I can't figure it out. Thanks.
In [226]: allArrays = np.array([])
In [227]: allArrays.shape
Out[227]: (0,)
In [228]: allArrays.ndim
Out[228]: 1
In [229]: myArray=np.ones((480,640))
In [230]: myArray.shape
Out[230]: (480, 640)
In [231]: myArray.ndim
Out[231]: 2
1 does not equal 2 in most worlds!
To concatenate with myArray on the default axis 0, allArrays would have to start as np.zeros((0,640), myArray.dtype). After n iterations it would grow to (n*480, 640).
In the linked answer, the new arrays are all 1d, so starting with shape (0,) is ok. But wim's answer is better - collect all arrays in a list, and do one concatenate at the end.
Repeated concatenate in a loop is hard to get right (you have to understand shapes and dimensions), and slower than list appends.
There was a problem in the simplest example of linear regression. At the output, the coefficients are zero, what do I do wrong? Thanks for the help.
import sklearn.linear_model as lm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = [25,50,75,100]
y = [10.5,17,23.25,29]
pred = [27,41,22,33]
df = pd.DataFrame({'x':x, 'y':y, 'pred':pred})
x = df['x'].values.reshape(1,-1)
y = df['y'].values.reshape(1,-1)
pred = df['pred'].values.reshape(1,-1)
plt.scatter(x,y,color='black')
clf = lm.LinearRegression(fit_intercept =True)
clf.fit(x,y)
m=clf.coef_[0]
b=clf.intercept_
print("slope=",m, "intercept=",b)
Output:
slope= [ 0. 0. 0. 0.] intercept= [ 10.5 17. 23.25 29. ]
Think it through for a second. Given that you have multiple coefficients returned suggests you have multiple factors. Since it's a single regression, the problem lies in the shape of your input data. Your original reshaping made the class think you had 4 variables and only one observation per variable.
Try something like this:
import sklearn.linear_model as lm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = np.array([25,99,75,100, 3, 4, 6, 80])[..., np.newaxis]
y = np.array([10.5,17,23.25,29, 1, 2, 33, 4])[..., np.newaxis]
clf = lm.LinearRegression()
clf.fit(x,y)
clf.coef_
Output:
array([[ 0.09399429]])
As #jrjames83 has already explained in his answer after reshaping (.reshape(1,-1)) you were feeding a data set containing one sample (row) and four features (columns):
In [103]: x.shape
Out[103]: (1, 4)
most probably you wanted to reshape it this way:
In [104]: x = df['x'].values.reshape(-1, 1)
In [105]: x.shape
Out[105]: (4, 1)
so that you would have four samples and one feature...
alternatively you could pass DataFrame columns to your model as follows (no need to pollute your memory with additional variables):
In [98]: clf = lm.LinearRegression(fit_intercept =True)
In [99]: clf.fit(df[['x']],df['y'])
Out[99]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [100]: clf.coef_
Out[100]: array([0.247])
In [101]: clf.intercept_
Out[101]: 4.5