Tensorflow better model loading - python

I am traing to load my model without my database, but I have to use Standard Scale split. This is my code:
import tensorflow as tf
import pandas as pd
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
dataset = pd.read_csv('database.csv') #reading database
x = dataset.drop(columns=['good/bad']).values
y = dataset['good/bad'].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
scaler = StandardScaler().fit(x_train)
x_train = scaler.transform(x_train)
model = load_model("model.h5")
#Now I want to predict my data
out = scaler.transform([my_data])
prediction = model.predict(out)
pred = prediction[0][0]
Can I predict my data without loading my dataset?

In your code, firstly you need to create a model. For the research models and techniques, most machine learning or deep learning practitioners use these steps:
Data preprocess
Create Model
Save model
Load model
Make Prediction
More details have been already cleared in Tensorflow official Documentation

Related

How to log a sklearn pipeline with a Keras step using mlflow.pyfunc.log_model()? TypeError: can't pickle _thread.RLock objects

I would like to log into MlFlow a sklearn pipeline with a Keras step.
The pipeline has 2 steps: a sklearn StandardScale and a Keras TensorFlow model.
I am using mlflow.pyfunc.log_model() as possible solution, but I am having this error:
TypeError: can't pickle _thread.RLock objects
---> mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)
Here is my code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import keras
from keras import layers, Input
from keras.wrappers.scikit_learn import KerasRegressor
import mlflow.pyfunc
from sklearn.pipeline import Pipeline
from mlflow.models.signature import infer_signature
#toy dataframe
df1 = pd.DataFrame([[1,2,3,4,5,6], [10,20,30,40,50,60],[100,200,300,400,500,600]] )
#create train test datasets
X_train, X_test = train_test_split(df1, random_state=42, shuffle=True)
#scale X_train
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_train_s = pd.DataFrame(X_train_s)
#wrap the keras model to use it inside of sklearn pipeline
def create_model(optimizer='adam', loss='mean_squared_error', s = X_train.shape[1]):
input_layer = keras.Input(shape=(s,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(25, activation='relu')(input_layer)
encoded = layers.Dense(2, activation='relu')(encoded)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(2, activation='relu')(encoded)
decoded = layers.Dense(25, activation='relu')(encoded)
decoded = layers.Dense(s, activation='linear')(decoded)
model = keras.Model(input_layer, decoded)
model.compile(optimizer, loss)
return model
# wrap the model
model = KerasRegressor(build_fn=create_model, verbose=1)
# create the pipeline
pipe = Pipeline(steps=[
('scale', StandardScaler()),
('model',model)
])
#function to wrap the pipeline to be logged by mlflow
class SklearnModelWrapper(mlflow.pyfunc.PythonModel):
def __init__(self, model):
self.model = model
def predict(self, context, model_input):
return self.model.predict(model_input)[:,1]
mlflow.end_run()
with mlflow.start_run(run_name='test1'):
#train the pipeline
pipe.fit(X_train, X_train_s, model__epochs=2)
#wrap the model for mlflow log
wrappedModel = SklearnModelWrapper(pipe)
# Log the model with a signature that defines the schema of the model's inputs and outputs.
signature = infer_signature(X_train, wrappedModel.predict(None, X_train))
mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)
From what I googled, it seems like this type of error is related to concurrency of threads. It could be then related to the TensorFlow, since it distributes the code during the model training phase.
However, the offending code line is after the training phase. If I remove this line, the rest of the code works, which makes me think that it happens after the concurrency phase of the model training. I have no idea why I am getting this error in this context.
I am a beginner? Can someone please help me?
Thanks
In the python_model=wrappedModel shoud be python_model=SklearnModelWrapper()
I think

ValueError: Found input variables with inconsistent numbers of samples: [645471, 78]

How can I fix this error it throws? ValueError: Found input variables with inconsistent numbers of samples: [645471, 78]
full code attached
#Importing the numpy to perform Linear Algebraic operations on the data
import numpy as np
#Import pandas library to perform the data preprocessing
import pandas
#importing the Keras deep learning framework of Python
import keras
#Importing the Sequential model from keras
from keras.models import Sequential
#Importing the types of layers in the Neural Network that we are going to have
from keras.layers import Dense
#Importing the train_test_split function which is useful in dividing the dataset into the training and testing data
from sklearn.model_selection import train_test_split
#Importing the StandardScaler function to perform the standardisation/scaling of the data
from sklearn.preprocessing import StandardScaler, LabelEncoder
#Importing the metries for the performance evaluation of our deep learning model
from sklearn import metrics
from keras.utils import np_utils, normalize, to_categorical
data = pandas.read_csv("C:/Users/bam/train.csv", header=0, dtype=object)
X = data.iloc[:, 0:78]
y = data.iloc[:78]
#I have splitted the dataset into a ratio of 80:20 between the train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 23)
#Creating an object of StandardScaler
sc = StandardScaler()
#Scaling the data using the StandardScaler() object
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
error

How to test machine learning model with real data in python

I am new to machine learning thing and python. I have created a simple linear regression model in python . I can test the accuracy of my model but only for the data in my data set , my data set is a csv file which contains a relation between salary and years of experience . But I want to use it in practical life . Like I will input the years of experience and the output will be predicted salary . Here is what I have done so far
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
I want to modify the above code in such way, that I can give input years of experience and the output will expected salary .
Thanks in advance .
After training the model, save your model to file and load it later in order to make predictions. In Python, you can use 'pickle' to achieve this.
References:
scikit-learn Model Persistence
save and load machine learning models, an example
You can use your trained model to make a prediction.
As a previous answer mentioned, you would want to use
regressor.predict([years_of_xp])
This will ask your model to make a prediction of the salary someone will recieve, given years_of_xp years of experience.

Load and predict new data sklearn

I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it.
Is this the correct way to do this? Especially the standardization. Should I use scaler.fit() on my new data too? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here.
Here is my code:
#Loading the saved model with joblib
model = joblib.load('model.pkl')
# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]
# Standardize new data
scaler = StandardScaler()
X_pred = scaler.fit(pr[pred_cols]).transform(pr[pred_cols])
pred = pd.Series(model.predict(X_pred))
print pred
No, it's incorrect. All the data preparation steps should be fit using train data. Otherwise, you risk applying the wrong transformations, because means and variances that StandardScaler estimates do probably differ between train and test data.
The easiest way to train, save, load and apply all the steps simultaneously is to use Pipelines:
At training:
# prepare the pipeline
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
pipe = make_pipeline(StandardScaler(), LogisticRegression)
pipe.fit(X_train, y_train)
joblib.dump(pipe, 'model.pkl')
At prediction:
#Loading the saved model with joblib
pipe = joblib.load('model.pkl')
# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]
# apply the whole pipeline to data
pred = pd.Series(pipe.predict(pr[pred_cols]))
print pred

How can I create a Linear Regression Model from a split dataset?

I've just split my data into a training and testing set and my plan is to train a Linear Regression model and be able to check what the performance is like using my testing split.
My current code is:
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
import matplotlib.pyplot as plt
df = pd.read_csv('C:/Dataset.csv')
df['split'] = np.random.randn(df.shape[0], 1)
split = np.random.rand(len(df)) <= 0.75
training_set = df[split]
testing_set = df[~split]
Is there a proper method I should be using to plot a Linear Regression model from an external file such as a .csv?
Since you want to use scikit-learn, here's an approach using sklearn.linear_model.LinearRegression:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
X_train, y_train = training_set[x_vars], training_set[y_var]
X_test, y_test = testing_test[x_vars], testing_test[y_var]
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Depending on whether you need more descriptive output, you might also look into use statsmodels for linear regression.

Categories