I get this error , post choosing the epochs: ValueError: Input arrays should have the same number of samples as target arrays. Found 5516 input samples and 12870 target samples. Any suggestions are welcome. Thanks in advance
Im using a dataset with a lot of categorical variables and they add up to 95 after creating the dummy variables, the code until I choose the number of epochs runs flawlessly and then I get this error, what is the reason for this error, its important, one, I could use it in the future and 2, Im unable to proceed :)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('zrpl_data.csv')
X = dataset.iloc[:, 0:6].values
y = dataset.iloc[:, 6].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])
labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_1.fit_transform(X[:, 2])
labelencoder_X_4 = LabelEncoder()
X[:, 3] = labelencoder_X_1.fit_transform(X[:, 3])
labelencoder_X_5 = LabelEncoder()
X[:, 4] = labelencoder_X_1.fit_transform(X[:, 4])
onehotencoder = OneHotEncoder(categorical_features = [0,1,2,3,4])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,
random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
import keras
from keras.models import Sequential
from keras.layers import Dense
classifier = Sequential()
classifier.add(Dense(output_dim=47,
init='uniform',activation='relu',input_dim=95))
classifier.add(Dense(output_dim=47, init='uniform',activation='relu'))
classifier.add(Dense(output_dim=1,
init='uniform',activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit (X_train, y_train, batch_size=10,nb_epoch=100)
You have 5516 feature samples and 12870 target samples (you should have equal), before train the model double check their dimensions.
Related
I have a data frame with 3 categorical values (moisture, fertilizer, type) and one numerical value - biomass quantity. I created a regression model to predict the biomass quantity, based on these variables. I got the good accuracy.. The code is below
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.linear_model import LinearRegression
X = dce_stylized_fs.iloc[:, :-1].values
y = dce_stylized_fs.iloc[:, 3].values
labelencoder = LabelEncoder()
X[:, 1] = labelencoder.fit_transform(X[:, 1])
X[:, 2] = labelencoder.fit_transform(X[:, 2])
ct = ColumnTransformer([("moisture", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
ct = ColumnTransformer([("fertilizer", OneHotEncoder(), [1])], remainder = 'passthrough')
X = ct.fit_transform(X)
ct = ColumnTransformer([("type", OneHotEncoder(), [2])], remainder = 'passthrough')
X = ct.fit_transform(X)
X = X[:,1:] #avoid dummy variable trap
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
results = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r = r2_score(y_test, y_pred)
What I've been wondering and can't figure out is how to pass some arbitrary data to a model and see prediction. For example, I would like to see what model would predict if I put moisture = 10 (this is like a scale or class), fertilizer = kjx, and type = chermozher (those values already appear in the test and train data, but not in this combination). I know that I need to format those arbitrary values to be in a format like X_train or X_test and call the function predict. But because I perform one-hot encoding I got 17 columns and I don't know which refers to which attribute. I can't see the column names as those are NumPy arrays. Can someone help me?
I'm learning machine learning online. In the multiple regression model when I write the following code:
# multiple linear regression
import pandas as pd
import numpy as np
dataset = pd.read_csv("50_Startups.csv")
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_x = LabelEncoder()
x[:, 3] = labelencoder_x.fit_transform(x[:, 3])
ct = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(categories="auto"), [3])],
remainder="passthrough"
)
# avoiding the dummy variable trap
x = x[:, 1:]
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
# fitting multiple libnear regresion to the training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train, y_train)
# predicting the test set results
y_pred = regressor.predict(x_test)
import statsmodels.formula.api as sm
x = np.append(arr = np.ones((50, 1)).astype(int), values = x, axis = 1)
x_opt = x[:, [0, 1, 2, 3, 4, 5]]
regressor_ols = sm.OLS(endog = y, exog = x_opt).fit()
regressor_ols.summary()
I got the following error:
Traceback (most recent call last):
File "/home/ashutosh/Machine Learning A-Z Template Folder/Part 2 - Regression/Section 5 - Multiple Linear Regression/P14-Multiple-Linear-Regression/Multiple_Linear_Regression/mlr.py", line 35, in <module>
x_opt = x[:, [0, 1, 2, 3, 4, 5]]
IndexError: index 4 is out of bounds for axis 1 with size 4
I checked multiple answers but they don't have the same problem as mine.
What can I do?
You can download the dataset from here: https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P14-Multiple-Linear-Regression.zip
Here is the complete code
top part runs fine till i import keras.
I have tried installing and uninstalling keras, however the error is still there
Classification template
# Importing the libraries
import numpy as my
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
#Removing 1 Dummy Variable to avoid Dummy Variable Trap
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2: Let's make the ANN
#Importing the keras library
import keras.backend
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
AttributeError: module 'tensorflow.python.keras.backend' has no attribute 'get_graph'
Solution (as found in comments) was to install keras version 2.2.4
e.g:
pip install 'keras==2.2.4'
if you are above that version, you may try using this function instead:
keras.backend.image_data_format()
I have created an artificial neural network. I am trying to calculate the accuracy of the model using k-fold cross validation technique but after compiling the last line its not progressing any further, its stuck there for more than 20 mins. I am not able to figure out where I am going wrong. Can anyone please help me with this thing? Below is the code I have used.
Thanks in advance.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X=X[:,1:]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential #required to initialize ann
from keras.layers import Dense #required to build the layers of ann
def build_classifier():
classifier=Sequential()
classifier.add(Dense(kernel_initializer="uniform", activation="relu", input_dim=11, units=6))
classifier.add(Dense(kernel_initializer="uniform", activation="relu", units=6))
classifier.add(Dense(kernel_initializer="uniform", activation="sigmoid",units=1))
classifier.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
return classifier
classifier=KerasClassifier(build_fn=build_classifier, batch_size=10, nb_epoch=100)
accuracies=cross_val_score(estimator=classifier,X=X_train,y=y_train,cv=10,n_jobs=-1)
I had the same issue with the exact same code. It seems Windows has an issue with "n_jobs", if you remove it by "accuracies = .." , it will start working. It's just that it could take long but it will work and show each epoch being updated.
I am not able to pickle my model below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
script_dir = os.path.dirname(__file__)
abs_file_path = os.path.join(script_dir, 'Churn_Modelling.csv')
# Importing the dataset
dataset = pd.read_csv(abs_file_path)
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features=[1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2 - Now let's make the ANN!
# Importing the Keras libraries and packages
from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras import backend
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))
# Adding the second hidden layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
# Adding the output layer
classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
# Compiling the ANN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size=10, epochs=100, validation_split=0.1)
# Part 3 - Making predictions and evaluating the model
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Predicting a single new observation
new_prediction = classifier.predict(sc.transform(np.array([[0.0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))
new_prediction = (new_prediction > 0.5)
I have tried using
from sklearn.externals import joblib
joblib.dump(classifier, 'model.pkl')
and
import pickle
with open('classifier.pkl', 'wb') as fid:
pickle.dump(classifier, fid,2)
for both, I am getting PicklingError: Can't pickle : attribute lookup module on builtins failed
What am I doing wrong? Your insights are much appriciated.