Creating a new object every time you call a function in python - python

I have a recommender system that I need to train, and I included the entire training procedure inside a function:
def train_model(data):
model = Recommender()
Recommender.train(data)
pred = Recommender.predict(data)
return pred
something like this. Now if I want to train this inside a loop, for different datasets, like:
preds_list = []
data_list = [dataset1, dataset2, dataset3...]
for data_subset in data_list:
preds = train_model(data_subset)
preds_list += [preds]
How can I make sure that every time I call the train_model function, a brand new instance of a recommender is created, not an old one, trained on the previous dataset?

You are already creating a new instance everytime you execute train_model. The thing you are not using the new instance.
You probably meant:
def train_model(data):
model = Recommender()
model.train(data)
pred = model.predict(data)
return pred

Use the instance you've instantiated, not the class
class Recommender:
def __init__(self):
self.id = self
def train(self, data):
return data
def predict(self, data):
return data + str(self.id)
def train_model(data):
model = Recommender()
model.train(data)
return model.predict(data)
data = 'a data '
x = {}
for i in range(3):
x[i] = train_model(data)
print(x[i])
# a data <__main__.Recommender object at 0x11cefcd10>
# a data <__main__.Recommender object at 0x11e0471d0>
# a data <__main__.Recommender object at 0x11a064d50>

Related

Creating custom dataset in PyTorch

Problem
In PyTorch, I am trying to write a class that could return the entire data and label separately using syntax like dataset.data and dataset.label. The code skeleton looks like:
class MyDataset(object):
data = _get_data()
label = _get_label()
def __init__(self, dir, transforms):
self.img_list = ... # all image paths loaded from dir
# do something
def __getitem__(self):
# do something
return data, label
def __len__(self):
return len(self.img_list)
def _get_data():
# do something
def _get_label():
# do something
However, when I use dataset.data and dataset.label to access the corresponding variables, nothing is returned.
I am wondering why this is the case and how I can fix this.
Edit
Thank you for all of your attention.
I have solved this problem by myself. The solution is pretty straightforward, which just utilizes the property of class variables.
class FaceDataset(object):
# class variable
data = None
label = None
def __init__(self, root, transforms=None):
# read img_list from root
img_list = ...
self.transforms = ...
FaceDataset.data = FaceDataset._get_data(self.img_list, self.transforms)
FaceDataset.label = FaceDataset._get_label(self.img_list)
#classmethod
def _get_data(cls, img_list, transforms):
data_list = []
for img_path in img_list:
data_list.append(transforms(Image.open(img_path)).unsqueeze(0))
return torch.stack(data_list, dim=0)
#classmethod
def _get_label(cls, img_list):
label = torch.zeros(len(img_list))
for i, img_path in enumerate(img_list):
label[i] = ...
return label
def __getitem__(self, index):
img_path = self.img_list[index]
label = ...
# read image from file
data = Image.open(img_path)
# apply transform defined in __init__
data = self.transforms(data)
return data, label
def __len__(self):
return len(self.img_list)
The "normal" way to create custom datasets in Python has already been answered here on SO. There happens to be an official PyTorch tutorial for this.
For a simple example, you can read the PyTorch MNIST dataset code here (this dataset is used in this PyTorch example code for further illustration). Finally, you can find other dataset implementations in this torchvision datasets list (click on the dataset name, then on the "source" button in the dataset documentation, to access the dataset's PyTorch implementation).

How to use model in batch generator?

I want to use model.predict in batch generator, what a possible ways of achieve this?
Seems one option is to load model on init and on epoch end:
class DataGenerator(keras.utils.Sequence):
def __init__(self, model_name):
# Load model
# ...
def on_epoch_end(self):
# Load model
In my experience, predicting another model while training will bring errors.
You should probably simply append your training model after your generator model.
Suppose you have:
generator_model (the one you want to use inside the generator)
training_model (the one you want to train)
Then
generatorInput = Input(shapeOfTheGeneratorInput)
generatorOutput = generator_model(generatorInput)
trainingOutput = training_model(generatorOutput)
entireModel = Model(generatorInput,trainingOutput)
Make sure that the generator model has all layers untrainable before compiling:
genModel = entireModel.layers[1]
for l in genModel.layers:
l.trainable = False
entireModel.compile(optimizer=optimizer,loss=loss)
Now use the generator regularly.
Predicting inside the generator:
class DataGenerator(keras.utils.Sequence):
def __init__(self, model_name, modelInputs, batchSize):
self.genModel = load_model(model_name)
self.inputs = modelInputs
self.batchSize = batchSize
def __len__(self):
l,rem = divmod(len(self.inputs), self.batchSize)
return (l + (1 if rem > 0 else 0))
def __getitem__(self,i):
items = self.inputs[i*self.batchSize:(i+1)*self.batchSize]
items = doThingsWithItems(items)
predItems = self.genModel.predict_on_batch(items)
#the following is the only reason not to chain models
predItems = doMoreThingsWithItems(predItems)
#do something to get Y_train_items as well
return predItems, y_train_items
If you do find the error I mentioned, you can sacrifice the parallel generation capabilities and do some manual loops:
for e in range(epochs):
for i in range(batches):
x,y = generator[i]
model.train_on_batch(x,y)

Zipped variable losing value on 2nd iteration

I'm pretty new to Python so I'm sure this isn't the most efficient way to code this. The problem I'm having is I have a 2nd For loop that runs inside another for loop. It works fine the first time, but on the second iteration, the second for loop doesn't register the data and skips over it so the it never runs again. I use a zipped tuple and it just looks like it loses the value completely. `
class Model:
def predict(self, data):
prediction = []
distances = []
for item in data:
distances.clear()
for trainedItem in self.Train_Data:
distances.append([(abs((item[0] - trainedItem[0][3])) + abs((item[1] - trainedItem[0][1])) + abs((item[2] - trainedItem[0][2])) + abs((item[3] - trainedItem[0][3]))), trainedItem[1]])
distances.sort()
targetNeighbors = []
for closest in distances[:self.K]:
targetNeighbors.append(closest[1])
prediction.append(Counter(targetNeighbors).most_common()[0][0])
return prediction
class HardcodedClassifier:
def fit(X_Train, Y_Train, k):
Model.Train_Data = zip(X_Train, Y_Train)
Model.K = k
return Model`
The iterator was depleted. Try Model.Train_Data = list(zip(X_Train, Y_Train)) so it will iterate every time in the inner for loop.
Based on what I see, you are calling the Model class constructor instead of instantiating a model object and accessing its data. When you declare a class, the declaration only creates a constructor object. The constructor returns a new object of the type you defined when called.
class Bacon:
tasty = true
def __init__():
self.salty = true
Bacon
>> <class constructor object at #memoryaddress>
Bacon.tasty
>> True
Bacon.salty
>> Error: Attribute not found
baconstrip = Bacon()
baconstrip
>> <Bacon object at #memoryaddress>
baconstrip.tasty
>> True
baconstrip.salty
>> True
The baconstrip object is of type Bacon and has memory and its own namespace allocated to it to store variables. The Bacon variable is the constructor, and you can also access it like an object too, but it's not an actual instance of itself.
For your code:
class HardcodedClassifier:
def __init__(self, model): # to initialize the class, provide a model.
self.model = model
def fit(X_Train, Y_Train, k):
self.model.Train_Data = zip(X_Train, Y_Train)
self.model.K = k
# no need to return a value. The state of the object is preserved.
mymodel = Model()
myclassifier = HardcodedClassifier(mymodel)

Handling tensorflow session in a class

I'm using tensorflow to predict outputs of a neural network. I have a class where I have described the neural network and I have a main file where the predictions are being made, and based on the results, the weights are updated. However, the predictions seem to be really slow. Here is how my code looks like:
class NNPredictor():
def __init__(self):
self.input = tf.placeholder(...)
...
self.output = (...) #Neural network output
def predict_output(self, sess, input):
return sess.run(tf.squeeze(self.output), feed_dict = {self.input: input})
Here's how the main file looks like:
sess = tf.Session()
predictor = NNPredictor()
input = #some initial value
for i in range(iter):
output = predictor.predict_output(sess, input)
input = #some function of output
However, if I use the following function definition in the class:
def predict_output(self):
return self.output
And have the main file as follows:
sess = tf.Session()
predictor = NNPredictor()
input = #some initial value
output_op = predictor.predict_value()
for i in range(iter):
output = np.squeeze(sess.run(output_op, feed_dict = {predictor.input: input}))
input = #some function of output
The code runs almost 20-30x faster. I don't seem to understand how things are working here, and I'd like to know what the best practice would be.
This has to do with the underlying memory accesses masked by Python. Here's some sample code to illustrate this idea:
import time
runs = 10000000
class A:
def __init__(self):
self.val = 1
def get_val(self):
return self.val
# Using method to then call object attribute
obj = A()
start = time.time()
total = 0
for i in xrange(runs):
total += obj.get_val()
end = time.time()
print end - start
# Using object attribute directly
start = time.time()
total = 0
for i in xrange(runs):
total += obj.val
end = time.time()
print end - start
# Assign to local_var first
start = time.time()
total = 0
local_var = obj.get_val()
for i in xrange(runs):
total += local_var
end = time.time()
print end - start
On my computer, it runs in the following timing:
1.49576115608
0.656110048294
0.551875114441
Specific to your case, you're calling the object method in the first case but not doing it in the second case. If you're calling your code many times in this way, there would be significant performance differences.

Create classes in a loop

I want to define a class and then make a dynamic number of copies of that class.
Right now, I have this:
class xyz(object):
def __init__(self):
self.model_type = ensemble.RandomForestClassifier()
self.model_types = {}
self.model = {}
for x in range(0,5):
self.model_types[x] = self.model_type
def fit_model():
for x in range(0,5):
self.model[x] = self.model_types[x].fit(data[x])
def score_model():
for x in range(0,5):
self.pred[x] = self.model[x].predict(data[x])
I want to fit 5 different models but I think Python is pointing to the same class 5 times rather than creating 5 different classes in the model dictionary.
This means that when I use the "score_model" method, it is just scoring the LAST model that was fit rather than 5 unique models.
I think that I just need to use inheritance to populate the model[] dictionary with 5 distinct classes but I'm not sure how to do that?
In your orignal code, you created one instance and used it five times. Instead, you want to initialize the class only when you add it to the model_types array, as in this code.
class xyz(object):
def __init__(self):
self.model_type = ensemble.RandomForestClassifier
self.model_types = {}
self.model = {}
for x in range(0,5):
self.model_types[x] = self.model_type()
def fit_model():
for x in range(0,5):
self.model[x] = self.model_types[x].fit(data[x])
def score_model():
for x in range(0,5):
self.pred[x] = self.model[x].predict(data[x])
In python everything is an object, so your variable can point to a class as well, and then your variable can be treated as a class.

Categories