Sup guys, I'm new to Python and new to Neural Networks as well. I'm trying to implement a Neural Network to predict the Close price of Bitcoin in a day, based on Open price in the same day. So I get a CSV file, and I'm trying to use 'Open' column as entry, and 'Close' column as target, you can see this in the code below:
from sklearn.neural_network import MLPClassifier
import numpy as np
import pandas as pd
dataset = pd.read_csv('BTC_USD.csv')
X = dataset['Open']
y = dataset['Close']
NeuralNetwork = MLPClassifier(verbose = True,
max_iter = 1000,
tol = 0,
activation = 'logistic')
NeuralNetwork.fit(X, y)
When I run the code I get this error:
ValueError: Expected 2D array, got 1D array instead:
array=[4.95100000e-02 4.95100000e-02 8.58400000e-02 ... 6.70745996e+03
6.66883984e+03 7.32675977e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
After this error, I did some research here in stackoverflow, and I tried some solutions proposed in other posts, like this one:
from sklearn.neural_network import MLPClassifier
import numpy as np
import pandas as pd
dataset = pd.read_csv('BTC_USD.csv')
X = np.array(dataset[['Open']])
X = X.reshape(-1, 1)
y = np.array(dataset[['Close']])
y = y.reshape(-1, 1)
NeuralNetwork = MLPClassifier(verbose = True,
max_iter = 1000,
tol = 0,
activation = 'logistic')
NeuralNetwork.fit(X, y)
After running this code, I get this new error:
ValueError: Unknown label type: (array([4.95100000e-02, 8.58400000e-02, 8.08000000e-02, ...,
6.66883984e+03, 6.30685010e+03, 7.49379980e+03]),)
and this ''warning'' at the first line (which contains the directory):
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
Could you help me please? I tried many solutions, but any of them worked.
You should use the values attribute of a data frame to get the elements of one column. In addition, what you want to achieve is a regression, not a classification, thus you must use a regressor such as MLPRegressor, following
from sklearn.neural_network import MLPRegressor
import numpy as np
import pandas as pd
dataset = pd.read_csv('BTC_USD.csv')
X = dataset["Open"].values.reshape(-1, 1)
y = dataset["Close"].values
NeuralNetwork = MLPRegressor(verbose = True,
max_iter = 1000,
tol = 0,
activation = "logistic")
NeuralNetwork.fit(X, y)
The code works now, but the results are not correct as you will need to work on the features and your network hyperparameters. But this is beyond the scope of SO.
Related
Trying to learn sklearn in python. But the jupyter ntbk is giving error saying "ValueError: Expected 2D array, got scalar array instead:
array=750.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
*But I have already defined x to be 2D array using x.values.reshape(-1,1)
You can find the CSV file and screenshot of the Error Code here -> https://github.com/CaptainRD/CSV-for-StackOverflow
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.linear_model import LinearRegression
data = pd.read_csv('1.02. Multiple linear regression.csv')
data.head()
x = data[['SAT','Rand 1,2,3']]
y = data['GPA']
reg = LinearRegression()
reg.fit(x,y)r2 = reg.score(x,y)
n = x.shape[0]
p = x.shape[1]
adjusted_r2 = 1-(1-r2)*(n-1)/(n-p-1)
adjusted_r2
reg.predict(1750)
As you can see in your code, your x has two variables, SAT and Rand 1,2,3. Which means, you need to provide a two dimensional input for your predict method. example:
reg.predict([[1750, 1]])
which returns:
>>> array([1.88])
You are facing this error because you did not provide the second value (for the Rand 1,2,3 variable). Note, if this variable is not important, you should remove it from your x data.
This model is mapping two inputs (SAT and Rand 1,2,3) to a single output (GPA), and thus requires a list of two elements as input for a valid prediction. I'm guessing the 1750 that you're supplying is meant to be the SAT value, but you also need to provide the Rand 1,2,3 value. Something like [1750, 1] would work.
Im trying to make my way to a sligthly more flexible knn input script than the tutorials based of the iris dataset but Im having some trouble (I think) to add the matching 2nd dimension to the numpy array in #6 and when I come to #11. the fitting.
File "G:\PROGRAMMERING\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 212, in check_consistent_length
" samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [150, 1]
x is (150,5) and y is (150,1). 150 is the number of samples in both, but they differ in number of fields, is this the problem and if so how do I fix it?
#1. Loading the Pandas libraries as pd
import pandas as pd
import numpy as np
#2. Read data from the file 'custom.csv' placed in your code directory
data = pd.read_csv("custom.csv")
#3. Preview the first 5 lines of the loaded data
print(data.head())
print(type(data))
#4.Test the shape of the data
print(data.shape)
df = pd.DataFrame(data)
print(df)
#5. Convert non-numericals to numericals
print(df.dtypes)
# Any object should be converted to numerical
df['species'] = pd.Categorical(df['species'])
df['species'] = df.species.cat.codes
print("outcome:")
print(df.dtypes)
#6.Convert df to numpy.ndarray
np = df.to_numpy()
print(type(np)) #this should state <class 'numpy.ndarray'>
print(data.shape)
print(np)
x = np.data
y = [df['species']]
print(y)
#K-nearest neighbor (find closest) - searach for the K nearest observations in the dataset
#The model calculates the distance to all, and selects the K nearest ones.
#8. Import the class you plan to use
from sklearn.neighbors import (KNeighborsClassifier)
#9. Pick a value for K
k = 2
#10. Instantiate the "estimator" (make an instance of the model)
knn = KNeighborsClassifier(n_neighbors=k)
print(knn)
#11. fit the model with data/model training
knn.fit(x, y)
#12. Predict the response for a new observation
print(knn.predict([[3, 5, 4, 2]]))```
This is how I used the scikit-learn KNeighborsClassifier to fit the knn model:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
df = datasets.load_iris()
X = pd.DataFrame(df.data)
y = df.target
knn = KNeighborsClassifier(n_neighbors = 2)
knn.fit(X,y)
print(knn.predict([[6, 3, 5, 2]]))
#prints output class [2]
print(knn.predict([[3, 5, 4, 2]]))
#prints output class [1]
From DataFrame you don't need to convert to numpy array, you can directly fit the model on DataFrame, also while converting the DataFrame to numpy array you have named that as np which is also used to import numpy at the top import numpy as np
The input prediction input is 4 columns, leaving the fifth 'species' without prediction. Also, if 'species' was the target it cannot be given as input to the knn at the same time. The pop removes this particular column from the dataFrame df.
#npdf = df.to_numpy()
df = df.apply(lambda x:pd.Series(x))
y = np.asarray(df['species'])
#removes the target from the sample
df.pop('species')
x = df.to_numpy()
"ValueError: all the input array dimensions except for the concatenation axis must match exactly" is the error am getting when am trying to append values.PFB the code. x is a dataset of size [16754,3] and a is an array of just one's with a size of [16754,1]. As far as I understand the axis matches exactly.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('data_monthly_rainfall.csv')
x = dataset.iloc[:, [0,1,2]].values
y = dataset.iloc[:, 3].values
# Apending a coloumn y with 1 for the equation
import statsmodels.api as sm
a = np.ones((16754, 0)).astype(int)
x = np.append(arr = a,values = x, axis = 1)
Can anyone please tell me what am doing wrong here? I am very new to python and ML, in the learning phase. Please let me know if more info is needed.
Link to the dataset
The problem is that the shape of x is (16755, 3) and you are creating a with a shape of (16754, 1). The mismatch occurs on the rows. Change the dimentions of a to (16755, 1):
a = np.ones((16755, 1)).astype(int)
You can avoid this altogether by saving the number of rows in a variable.
m = x.shape[0]
a = np.ones((m, 1)).astype(int)
x = np.append(arr = a,values = x, axis = 1)
The Issue
To begin with I'm pretty new to machine learning. I have decided to test up some of the things that I have learned on some financial datam my machine learning model looks like this:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
df = pd.read_csv("/Users/Documents/Trading.csv")
poly_features = PolynomialFeatures(degree=2, include_bias=False)
linear_reg = LinearRegression(fit_intercept = True)
X = df_copy[["open","volume", "base volume", "RSI_14"]]
X_poly = poly_features.fit_transform(X)[1]
y = df_copy[["high"]]
linear_reg.fit(X_poly, y)
x = linear_reg.predict([[1.905E-05, 18637.07503453,0.35522205, 69.95820948552947]])
print(x)
all works great until the moment I try to implement PolynomialFeatures which brings to be the following error:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Attempts to solve the issue:
Atempt 1
I've tried adding .values to X but the same error still comes up:
X_poly = poly_features.fit_transform(X.values)[1]
Atempt 2
I tried solving this problem by adding reshape(-1, 1) at the end of X_poly:
X_poly = poly_features.fit_transform(X)[1].reshape(-1, 1)
but it just replaces the previous error with this one:
ValueError: Found input variables with inconsistent numbers of samples: [14, 5696]
Thank you very much in advance for your help.
It wants you to transform your input. Try using X_poly = poly_features.fit_transform(X.values.reshape(1,-1))[1]
I have a 14x5 data matrix titled data. The first column (Y) is the dependent variable followed by 4 independent variables (X,S1,S2,S3). When trying to fit a regression model to a subset of the independent variables ['S2'][:T] I get the following error:
ValueError: Found array with dim 3. Estimator expected <= 2.
I'd appreciate any insight on a fix. Code below.
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
data = pd.read_csv('C:/path/Macro.csv')
T=len(data['X'])-1
#Fit variables
X = data['X'][:T]
S1 = data['S1'][:T]
S2 = data['S2'][:T]
S3 = data['S3'][:T]
Y = data['Y'][:T]
regressor = LinearRegression()
regressor.fit([[X,S1,S2,S3]], Y)
You are passing a 3-dimensional array as the first argument to fit(). X, S1, S2, S3 are all Series objects (1-dimensional), so the following
[[X, S1, S2, S3]]
is 3-dimensional. sklearn estimators expect an array of feature vectors (2-dimensional).
Try something like this:
# pandas indexing syntax
# data.ix[ row index/slice, column index/slice ]
X = data.ix[:T, 'X':] # rows up to T, columns from X onward
y = data.ix[:T, 'Y'] # rows up to T, Y column
regressor = LinearRegression()
regressor.fit(X, y)