How to make logistic regression with fashion-MNIST dataset - python

I don`t know what to do next , i tried many times, but my teacher wants me to make 10 models.
import pandas as pd
from numpy import reshape
from sklearn import metrics
train = pd.read_csv('fashion_train.csv',header =None)
print(train.head())
label = train[0]
test = pd.read_csv('fashion_test.csv',header = None)
print(test.head())
labelT = test[0]
print(labelT)
X_train = train.iloc[:, 1:]
print(X_train)
y_train = train.iloc[:, 0]
print(y_train)
X_test = test.iloc[:, 1:]
y_test = test.iloc[:, 0]
from sklearn.linear_model import LogisticRegression

Okay So what you need to do now is to train your model. So just add the following lines to your code.
model = LogisticRegression()
model.fit(X_train, y_train)
prediction = model.predict(X_test)
Now you can use any accuracy calculator. For example:
score = sklearn.metrics.classification_report(y_test, prediction)
print(score)
also I suggest change the import to import sklearn

Related

How to apply regression models for separate test and train datasets?

I have test and train subset separately. I need to apply regression models on those datasets. Just tried to make an example but something is wrong. My prediction column for regression is 'survival-time'. Also i have 93 column on each datasets. Here is code.
NOTE: Specially, troubling on 'score = svr.score(X_train,y_train)', should i use test for train argument?
import pandas as pd
import numpy as np
from sklearn.svm import SVR
train = pd.read_csv('training.csv',sep=";")
test = pd.read_csv('test.csv',sep=";")
X_train = train.drop('survival-time',axis=1)
y_train = train['survival-time']
X_test = test.drop('survival-time',axis=1)
y_test = test['survival-time']
svr = SVR().fit(X_train, y_train)
print(svr)
yfit = svr.predict(X_train)
score = svr.score(X_train,y_train)
print("R-squared:", score)
print("MSE:", mean_squared_error(y_train, yfit))
print("RMSLE:", mean_squared_log_error(y_train, yfit))

ANN problem: I am building an ANN model to predict the profit of a new startup based on certain features

The image of the dataset
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
Loading the data set using pandas as data frame format
import pandas as pd
df = pd.read_csv(r"E:\50_Startups.csv")
df.drop(['State'],axis = 1, inplace = True)
from sklearn.preprocessing import MinMaxScaler
mm = MinMaxScaler()
df.iloc[:,:] = mm.fit_transform(df.iloc[:,:])
info = df.describe()
x = df.iloc[:,:-1].values
y = df.iloc[:,-1].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split( x,y, test_size=0.2, random_state=42)
Initializing the model
model = Sequential()
model.add(Dense(40,input_dim =3,activation="relu",kernel_initializer='he_normal'))
model.add(Dense(30,activation="relu"))
model.add(Dense(1))
model.compile(loss="mean_squared_error",optimizer="adam",metrics=["accuracy"])
fitting model on train data
model.fit(x=x_train,y=y_train,epochs=150, batch_size=32,verbose=1)
Evaluating the model on test data
eval_score_test = model.evaluate(x_test,y_test,verbose = 1)
I am getting zero accuracy.
The problem is that accuracy is a metric for discrete values (classification).
you should use:
r2 score
mape
smape
instead.
e.g:
model.compile(loss="mean_squared_error",optimizer="adam",metrics=["mean_absolute_percentage_error"])
Adding to the answer of #GuintherKovalski accuracy is not for regression but if you still want to use it then you can use it along with some threshold using following steps:
Set a threshold such that if the absolute difference in the predicted value and the actual value is less than equal to the threshold then you consider that value as correct, otherwise false.
Ex -> predicted values = [0.3, 0.7, 0.8, 0.2], original values = [0.2, 0.8, 0.5, 0.4].
Now abs diff -> [0.1, 0.1, 0.3, 0.2] and let's take a threshold of 0.2. So with this threshold the correct -> [1, 1, 0, 1] and your accuracy will be correct.sum()/len(correct) and that is 3/4 -> 0.75.
This could be implemented in TensorFlow like this
import numpy as np
import tensorflow as tf
from sklearn.datasets import make_regression
data = make_regression(10000)
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(100,))])
def custom_metric(a, b):
threshold = 1 # Choose accordingly
abs_diff = tf.abs(b - a)
correct = abs_diff >= threshold
correct = tf.cast(correct, dtype=tf.float16)
res = tf.math.reduce_mean(correct)
return res
model.compile('adam', 'mae', metrics=[custom_metric])
model.fit(data[0], data[1], epochs=30, batch_size=32)
Just want to say Thank you to everyone who took their precious time to help me. I am posting this code as this worked for me. I hope it helps everyone who is stuck somewhere looking for answers. I got this code after consulting with my friend.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
import pandas as pd
from sklearn.model_selection import train_test_split
# Loading the data set using pandas as data frame format
startups = pd.read_csv(r"E:\0Assignments\DL_assign\50_Startups.csv")
startups = startups.drop("State", axis =1)
train, test = train_test_split(startups, test_size = 0.2)
x_train = train.iloc[:,0:3].values.astype("float32")
x_test = test.iloc[:,0:3].values.astype("float32")
y_train = train.Profit.values.astype("float32")
y_test = test.Profit.values.astype("float32")
def norm_func(i):
x = ((i-i.min())/(i.max()-i.min()))
return (x)
x_train = norm_func(x_train)
x_test = norm_func(x_test)
y_train = norm_func(y_train)
y_test = norm_func(y_test)
# one hot encoding outputs for both train and test data sets
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
# Storing the number of classes into the variable num_of_classes
num_of_classes = y_test.shape[1]
x_train.shape
y_train.shape
x_test.shape
y_test.shape
# Creating a user defined function to return the model for which we are
# giving the input to train the ANN mode
def design_mlp():
# Initializing the model
model = Sequential()
model.add(Dense(500,input_dim =3,activation="relu"))
model.add(Dense(200,activation="tanh"))
model.add(Dense(100,activation="tanh"))
model.add(Dense(50,activation="tanh"))
model.add(Dense(num_of_classes,activation="linear"))
model.compile(loss="mean_squared_error",optimizer="adam",metrics =
["accuracy"])
return model
# building a cnn model using train data set and validating on test data set
model = design_mlp()
# fitting model on train data
model.fit(x=x_train,y=y_train,batch_size=100,epochs=10)
# Evaluating the model on test data
eval_score_test = model.evaluate(x_test,y_test,verbose = 1)
print ("Accuracy: %.3f%%" %(eval_score_test[1]*100))
# accuracy score on train data
eval_score_train = model.evaluate(x_train,y_train,verbose=0)
print ("Accuracy: %.3f%%" %(eval_score_train[1]*100))

KNN model, accuracy(clf.score) returns 0

I am working one a simple KNN model with 3NN to predict a weight,
However, the accuracy is 0.0, I don't know why.
The code can give me a prediction on weight with 58 / 59.
This is the reproducible code
import numpy as np
from sklearn import preprocessing, neighbors
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.metrics import accuracy_score
#Create df
data = {"ID":[i for i in range(1,11)],
"Height":[5,5.11,5.6,5.9,4.8,5.8,5.3,5.8,5.5,5.6],
"Age":[45,26,30,34,40,36,19,28,23,32],
"Weight": [77,47,55,59,72,60,40,60,45,58]
}
df = pd.DataFrame(data, columns = [x for x in data.keys()])
print("This is the original df:")
print(df)
#Feature Engineering
df.drop(["ID"], 1, inplace = True)
X = np.array(df.drop(["Weight"],1))
y = np.array(df["Weight"])
#define training and testing
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size =0.2)
#Build clf with n =3
clf = neighbors.KNeighborsClassifier(n_neighbors=3)
clf.fit(X_train, y_train)
#accuracy
accuracy = clf.score(X_test, y_test)
print("\n accruacy = ", accuracy)
#Prediction on 11th
ans = np.array([5.5,38])
ans = ans.reshape(1,-1)
prediction = clf.predict(ans)
print("\nThis is the ans: ", prediction)
You are classifying Weight which is a continuous (not a discrete) variable. This should be a regression rather than a classification. Try KNeighborsRegressor.
To evaluate your result, use metrics for regression such as R2 score.
If your score is low, that can mean different things: training set too small, test set too different from training set, regression model not adequate...

How to predict a specific Image (from or outside dataset) after training the KNN classifier

I have a simple KNN classification problem, the output of the code below is the accuracy of the classifier resulted after training the classifier and splitting the dataset into "train" and "test".
What I want my system to be like is:
First, train the classifier using dataset;
Upload an image from URL;
Classify it according to the dataset.
For example, the output should be "class 1". I believe it's simple but I am pretty new to python.
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=5)
dataset = pd.read_csv(fdes)
X = dataset.iloc[:,:20].values
y = dataset['target'].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
neigh.fit(X_train, y_train)
# Predicting the Test set results
y_pred = neigh.predict(X_test)
y_compare = np.vstack((y_test,y_pred)).T
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#finding accuracy from the confusion matrix.
a = cm.shape
corrPred = 0
falsePred = 0
#prining results
for row in range(a[0]):
for c in range(a[1]):
if row == c:
corrPred +=cm[row,c]
else:
falsePred += cm[row,c]
kernelRbfAccuracy = corrPred/(cm.sum())
print ('Accuracy of knn : ', corrPred/(cm.sum()))
After all those steps, you can continue with:
from io import BytesIO
import numpy as np
import requests
from PIL import Image
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img = np.array(img).reshape(1, -1)
output_class = neigh.predict(img)[0]
print(output_class)

How to increase the model accuracy of multiple linear regression

This is the custom code
#Custom model for multiple linear regression
import numpy as np
import pandas as pd
dataset = pd.read_csv("50s.csv")
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,4:5].values
from sklearn.preprocessing import LabelEncoder
lb = LabelEncoder()
x[:,3] = lb.fit_transform(x[:,3])
from sklearn.preprocessing import OneHotEncoder
on = OneHotEncoder(categorical_features=[3])
x = on.fit_transform(x).toarray()
x = x[:,1:]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=1/5, random_state=0)
con = np.matrix(X_train)
z = np.matrix(y_train)
#training model
result1 = con.transpose()*con
result1 = np.linalg.inv(result1)
p = con.transpose()*z
f = result1*p
l = []
for i in range(len(X_test)):
temp = f[0]*X_test[i][0] + f[1]*X_test[i][1] +f[2]*X_test[i][2]+f[3]*X_test[i][3]+f[4]*X_test[i][4]
l.append(temp)
import matplotlib.pyplot as plt
plt.scatter(y_test,l)
plt.show()
Then I created created a model with scikit learn
and compared the results with y_test and l(predicted values of above code)
comparisons are as follows
for i in range(len(prediction)):
print(y_test[i],prediction[i],l[i],sep=' ')
103282.38 103015.20159795816 [[116862.44205399]]
144259.4 132582.27760816005 [[118661.40080974]]
146121.95 132447.73845175043 [[124952.97891882]]
77798.83 71976.09851258533 [[60680.01036438]]
This were the comparison between y_test,scikit-learn model predictions and custom code predictions
please help with the accuracy of model.
blue :Custom model predictions
yellow : scikit-learn model predictions

Categories