Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 days ago.
Improve this question
I am trying to built simple regression model using keras but it throws very high loss (around 114378162176.0000).
I tried to change (loss = mean_squared_error) to (loss = mean_squared_logarithmic_error), it gave me less loss but accuracy was in 0.0000e+00. I have also normalized my X_train using StandardScalar() but it is not working. Please provide me solution.
X_train
X_train_scaled
loss: 4956785664.0000and accuracy: 0.0000e+00
Here is the code:
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import permutation_importance
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import *
from matplotlib import pyplot as plt
from sklearn import metrics
model = keras.models.Sequential()
model.add(keras.layers.Dense(64,kernel_initializer='normal',input_dim=6,activation="relu"))
model.add(keras.layers.Dense(128,kernel_initializer='normal',activation="relu"))
model.add(keras.layers.Dense(128,kernel_initializer='normal',activation="relu"))
model.add(keras.layers.Dense(1))
optimizer = keras.optimizers.RMSprop(learning_rate=0.009)
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics='mae')
model.summary()
model.fit(X_train,y_train,epochs=200)
Related
I am training multiple ML models to get the best one by accuracy score.
I have noticed that accuracy scores for LogisticRegression and RandomForestClassifier are the same. Should it be this way or am I doing it wrong. Is it a good idea to loop over algorithms this way?
FYI: Dataset is Parkinsson's disease detection dataset.
from sklearn.metrics import confusion_matrix,accuracy_score,f1_score,precision_score,recall_score
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
list_methods=[]
list_accuracy=[]
lr_model=LogisticRegression()
svm_model=svm.SVC(kernel='linear')
dtc_model=DecisionTreeClassifier()
rfc_model=RandomForestClassifier()
models=[lr_model,svm_model,dtc_model,rfc_model]
for model in models:
model.fit(xtrain,ytrain)
xtrain_prediction=model.predict(xtrain)
train_accuracy_score=accuracy_score(xtrain_prediction,ytrain)
xtest_prediction=model.predict(xtest)
test_accuracy_score=accuracy_score(xtest_prediction,ytest)
print(f'{model} - train accuracy score: {train_accuracy_score} and test accuracy score: {test_accuracy_score}')
list_methods.append(model)
list_accuracy.append(test_accuracy_score)
I am trying to perform basic linear regression of MNIST data using the scikit-learn module. It seems to crash with MemoryError. What am I doing wrong? The shape of the training dataset is (60000, 728)
import numpy as np
from tensorflow.keras.datasets import mnist
from sklearn import linear_model
(xTrain, yTrain), (xTest, yTest) = mnist.load_data()
xTrain2D = xTrain.reshape((len(xTrain), -1))
xTest2D = xTest.reshape((len(xTest), -1))
reg = linear_model.LinearRegression()
reg.fit(xTrain2D, yTrain)
The problem is with the implementation of sklearn older versions of sklearn have this issue, the older versions have issues in resource management. Try upgrading sklearn.
Other viable option is to run this code in kaggle or google colab.
I have a densely connected neural network that was built using the Keras Sequential API. I'm trying to create some partial dependence plots (PDP's) to use for a bit a sensitivity analysis. I am attempting to use the scikit-learn plot_partial_dependence function in order to do this. I've been getting the following error: ValueError: 'estimator' must be a fitted regressor or classifier.. When it first happened, I added the use of KerasClassifier. I've used it successfully in the past to use my Keras model in scikit-learn GridSearchCV. I'm still getting the same error. I've also tried KerasRegressor.
Can anyone tell me what's wrong and how I could fix it? Do I absolutely need to use scikit-learn's decision tree based functions to be able to use the PDP function? If yes, what's the biggest implementation difference between Keras neural networks and decision trees? (I've never used decision trees. My machine learning experience is limited to Keras.)
My relevant code is below and I'm running python on google colab's GPU. I'm sure there are several issues in that last line but I can't get past this one to figure them out.
from sklearn.inspection import plot_partial_dependence
from keras.wrappers.scikit_learn import KerasClassifier
from keras.wrappers.scikit_learn import KerasRegressor
def create_model():
def swish(x):
return (x*sigmoid(x))
from keras.utils.generic_utils import get_custom_objects
from keras.layers import Activation
get_custom_objects().update({'swish':(swish)})
model=Sequential()
model.add(Dense(1024,activation='swish',input_shape=(6,)))
model.add(Dropout(.1))
model.add(Dense(512,activation='swish'))
model.add(Dense(256,activation='swish'))
model.add(Dropout(.1))
model.add(Dense(128,activation='swish'))
model.add(Dense(64,activation='swish'))
model.add(Dropout(.1))
model.add(Dense(32,activation='swish'))
model.add(Dense(16,activation='swish'))
model.add(Dropout(.1))
model.add(Dense(12, activation='softmax'))
opt=optimizers.Adam(lr=0.05)
model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])
return model
from keras.callbacks import LearningRateScheduler
from keras.callbacks import EarlyStopping
import math
def scheduler(epoch, lr):
if epoch < 20:
return lr
else:
return lr * math.exp(-0.1)
callback=LearningRateScheduler(scheduler, verbose=1)
weightsCallback=EarlyStopping(patience=30,monitor='accuracy',restore_best_weights=True, min_delta=1*10**-5, verbose=1)
modelClassified=KerasClassifier(build_fn=create_model)
modelClassified.fit(X_train, Y_train, batch_size=50, epochs=50, callbacks=[callback,weightsCallback], verbose=1)
disp=plot_partial_dependence(modelClassified, X_holdout,target=1, verbose =1, features=[0,1,2,3,4,5], feature_names=['aspect ratio','diel inner radius','diel outer radius','diel seperation','diel height','diel constant'])
I found that this error is actually a bug. My program should have, by all means, worked just fine. There is an error in the plot_partial_dependence function source code.
For much more detail and the solution I used to make it work, see this link to another StackOverflow question: https://stackoverflow.com/a/61485502/13822019
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
The below program must import mnist data from keras but the imported data consist only zeros. I have tried it on remote servers and again it has the same issue. Anyone knows why?
import tensorflow as tf
from keras.datasets import mnist
import numpy as np
from tempfile import TemporaryFile
(x_train, y_train), (x_test, y_test) = mnist.load_data()
data= np.concatenate((x_train, x_test), axis= 0)
Mnist data consists of grayscale images of the 10 digits. Black background is representing by zeros and white number by integer numbers from 1 to 255 (255 means white).
You probable printed x_train and show only zeros in printed portion of the array, but this is not the case for whole array.
Try:
import numpy as np
print(np.mean(x_train))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I got started with naive numerical prediction. Here is the training data
https://gist.github.com/karimkhanp/75d6d5f9c4fbaaaaffe8258073d00a75
Test data
https://gist.github.com/karimkhanp/0f93ecf5fe8ec5fccc8a7f360a6c3950
I wrote basic scikit learn code to train and test.
import pandas as pd
import pylab as pl
from sklearn import datasets
from sklearn import metrics, linear_model
from sklearn.linear_model import LogisticRegression, LinearRegression, BayesianRidge, OrthogonalMatchingPursuitCV, SGDRegressor
from datetime import datetime, date, timedelta
class NumericPrediction(object):
def __init__(self):
pass
def dataPrediction(self):
Train = pd.read_csv("data_scientist_assignment.tsv", sep='\t', parse_dates=['date'])
Train_visualize = Train
Train['timestamp'] = Train.date.values.astype(pd.np.int64)
Train_visualize['date'] = Train['timestamp']
print Train.describe()
x1=["timestamp", "hr_of_day"]
test=pd.read_csv("test.tsv", sep='\t', parse_dates=['date'])
test['timestamp'] = test.date.values.astype(pd.np.int64)
model = LinearRegression()
model.fit(Train[x1], Train["vals"])
# print(model)
# print model.score(Train[x1], Train["vals"])
print model.predict(test[x1])
Train.hist()
pl.show()
if __name__ == '__main__':
NumericPrediction().dataPrediction()
But accuracy is very low here. Because approach is very naive. Any better suggestion to improve accuracy ( In terms of algorithm, example, reference, library)?
For starter, your 'test' set doesn't look right. Please check it.
Secondly, your model is doomed to fail. Plot your data - what do you see? Clearly we have a seasonality here, while linear regression assumes that observations are independent. It's important to observe that you are dealing here with time series.
R language is excellent when it comes to time series, with advanced packages for time series forecasting like bsts. Still, Python here will be just as good. Pandas module is going to serve you well. Mind that you might not necessarily have to use machine learning here. Check ARMA and ARIMA. Bayesian structural time series are also excellent.
Here is a very good article that guides you through basics of dealing with time series data.