I currently have a neural network module:
import torch.nn as nn
class NN(nn.Module):
def __init__(self,args,lambda_f,nn1, loss, opt):
super().__init__()
self.args = args
self.lambda_f = lambda_f
self.nn1 = nn1
self.loss = loss
self.opt = opt
# more nn.Params stuff etc...
def forward(self, x):
#some code using fields
return out
I am trying to checkpoint it but because pytorch saves using state_dicts it means I can't save the lambda functions I was actually using if I checkpoint with the pytorch torch.save etc. I literally want to save everything without issue and re-load to train on GPUs later. I currently am using this:
def save_ckpt(path_to_ckpt):
from pathlib import Path
import dill as pickle
## Make dir. Throw no exceptions if it already exists
path_to_ckpt.mkdir(parents=True, exist_ok=True)
ckpt_path_plus_path = path_to_ckpt / Path('db')
## Pickle args
db['crazy_mdl'] = crazy_mdl
with open(ckpt_path_plus_path , 'ab') as db_file:
pickle.dump(db, db_file)
currently it throws no errors when I chekpoint it and it saved it.
I am worried that when I train it there might be a subtle bug even if no exceptions/errors are trained or something unexpected might happen (e.g. weird saving on disks in the clusters etc who knows).
Is this safe to do with pytorch classes/nn models? Especially if we want to resume training with GPUs?
Cross posted:
How does one pickle arbitrary pytorch models that use lambda functions?
https://discuss.pytorch.org/t/how-does-one-pickle-arbitrary-pytorch-models-that-use-lambda-functions/79026
https://www.reddit.com/r/pytorch/comments/gagpjg/how_does_one_pickle_arbitrary_pytorch_models_that/?
https://www.quora.com/unanswered/How-does-one-pickle-arbitrary-PyTorch-models-that-use-lambda-functions
I'm the dill author. I use dill (and klepto) to save classes that contain trained ANNs inside of lambda functions. I tend to use combinations of mystic and sklearn, so I can't speak directly to pytorch, but I can assume it works the same. The place where you have to be careful is if you have a lambda that contains a pointer to an object external to the lambda... so for example y = 4; f = lambda x: x+y. This might seem obvious, but dill will pickle the lambda, and depending on the rest of the code and the serialization variant, may not serialize the value of y. So, I've seen many cases where people serialize a trained estimator inside some function (or lambda, or class) and then the results aren't "correct" when they restore the function from serialization. The overarching cause is because the function wasn't encapsulated so all objects required for the function to yield the correct results are stored in the pickle. However, even in that case you can get the "correct" results back, but you'd just need to create the same environment you had when you pickled the estimator (i.e. all the same values it depends on in the surrounding namespace). The takeaway should be, try to make sure that all variables used in the function are defined within the function. Here's a portion of a class I've recently started to use myself (should be in the next release of mystic):
class Estimator(object):
"a container for a trained estimator and transform (not a pipeline)"
def __init__(self, estimator, transform):
"""a container for a trained estimator and transform
Input:
estimator: a fitted sklearn estimator
transform: a fitted sklearn transform
"""
self.estimator = estimator
self.transform = transform
self.function = lambda *x: float(self.estimator.predict(self.transform.transform(np.array(x).reshape(1,-1))).reshape(-1))
def __call__(self, *x):
"f(*x) for x of xtest and predict on fitted estimator(transform(xtest))"
import numpy as np
return self.function(*x)
Note when the function is called, everything that it uses (including np) is defined in the surrounding namespace. As long as pytorch estimators serialize as expected (without external references), then you should be fine if you follow the above guidelines.
Yes, I think it is safe to use dill to pickle lambda functions etc. I have been using torch.save with dill to save state dict and have had no problems resuming training over GPU as well as CPU unless the model class was changed. Even if the model class was changed (adding/deleting some parameters), I could load state dict, modify it, and load to the model.
Also, usually, people don't save the model objects but only state dicts i.e parameter values to resume the training along with hyperparameters/model arguments to get the same model object later.
Saving model object can be sometimes problematic as changes to model class (code) can make the saved object useless. If you don't plan on changing your model class/code at all and hence the model object won't be changed then maybe saving objects can work well but generally, it is not recommended to pickle module object.
this is not a good idea. If you do this then if your code changes to a different github repo then it will be hard restore your models that took a lot of time to train. The cycles spent recovering those or retraining is not worth it. I recommend to instead do it the pytorch way and only save the weights as they recommend in pytorch.
Related
How to prune weights of a CNN (convolution neural network) model which is less than a threshold value (let's consider prune all weights which are <= 1).
How we can achieve that for a weight file saved in .pth format in pytorch?
PyTorch since 1.4.0 provides model pruning out of the box, see official tutorial.
As there is no threshold method to prune in PyTorch currently, you have to implement it yourself, though it's kinda easy once you get the overall idea.
Threshold Pruning method
Below is a code performing pruning:
from torch.nn.utils import prune
class ThresholdPruning(prune.BasePruningMethod):
PRUNING_TYPE = "unstructured"
def __init__(self, threshold):
self.threshold = threshold
def compute_mask(self, tensor, default_mask):
return torch.abs(tensor) > self.threshold
Explanation:
PRUNING_TYPE can be one of global, structured, unstructured. global acts across whole module (e.g. remove 20% of weight with smallest value), structured acts on whole channels/modules. We need unstructured as we would like to modify each connection in specific parameter tensor (say weight or bias)
__init__ - pass here whatever you want or need to make it work, normal stuff
compute_mask - mask to be used to prune specific tensor. In our case all parameters below threshold should be zero. I did it with absolute value as it makes more sense. default_mask is not needed here, but is left as named parameter as that's what API requires atm.
Moreover, inheriting from prune.BasePruningMethod defines methods to apply the mask to each parameter, make pruning permanent etc. See base class docs for more info.
Example module
Nothing too fancy, you can put anything you want here:
class MyModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.first = torch.nn.Linear(50, 30)
self.second = torch.nn.Linear(30, 10)
def forward(self, inputs):
return self.second(torch.relu(self.first(inputs)))
module = MyModule()
You can also load your module via module = torch.load('checkpoint.pth')
if you need, it doesn't matter.
Prune module's parameters
We should define which parameter of our module (and whether it's weight or bias) should be pruned, like this:
parameters_to_prune = ((module.first, "weight"), (module.second, "weight"))
Now, we can apply globally our unstructured pruning to all defined parameters (threshold is passed as kwarg to __init__ of ThresholdPruning):
prune.global_unstructured(
parameters_to_prune, pruning_method=ThresholdPruning, threshold=0.1
)
Results
weight attribute
To see the effect, check weights of first submodule simply with:
print(module.first.weight)
It is a weight with our pruning technique applied, but please notice it's not a torch.nn.Parameter anymore! Now it is simply an attribute of our model, hence it won't take part in training or evaluation currently.
weight_mask
We can check created mask via module.first.weight_mask to see everything is done correctly (it will be binary in this case).
weight_orig
Applying pruning creates a new torch.nn.Parameter with original weights named name + _orig, in this case weight_orig, let's see:
print(module.first.weight_orig)
This parameter will be used during training and evaluation currently!. After applying pruning via methods described above there are forward_pre_hooks added which "switch" original weight to weight_orig.
Due to such approach you can define and apply your pruning at any part of training or inference without "destroying" original weights.
Applying pruning permanently
If you wish to apply pruning permanently simply issue:
prune.remove(module.first, "weight")
And now our module.first.weight is once again parameter with entries appropriately pruned, module.first.weight_mask is removed and so is module.first.weight_orig. It's what you are probably after.
You can iterate over children to make it permanent:
for child in module.children():
prune.remove(child, "weight")
You could define parameters_to_prune using the same logic:
parameters_to_prune = [(child, "weight") for child in module.children()]
Or if you want only convolution layers to be pruned (or anything else really):
parameters_to_prune = [
(child, "weight")
for child in module.children()
if isinstance(child, torch.nn.Conv2d)
]
Advantages
uses "PyTorch way of pruning" so it's easier to communicate your intent to other programmers
define pruning on a per-tensor basis, single responsibility instead of going through everything
confine to predefined ways
pruning is not permanent hence you can recover from it if needed. Module can be saved with pruning masks and original weights so it leaves you some space to revert eventual mistake (e.g. threshold was too high and now all your weights are zero rendering results meaningless)
works with original weights during forward calls unless you want to finally change to pruned version (simple call to remove)
Disadvantages
IMO pruning API could be clearer
You can do it shorter (as provided by Shai)
might be confusing for those who do not know such thing is "defined" by PyTorch (still there are tutorials and docs so I don't think it's a major problem)
You can work directly on the values saved in the state_dict:
sd = torch.load('saved_weights.pth') # load the state dicd
for k in sd.keys():
if not 'weight' in k:
continue # skip biases and other saved parameters
w = sd[k]
sd[k] = w * (w > thr) # set to zero weights smaller than thr
torch.save(sd, 'pruned_weights.pth')
Is it possible to access a fasttext model (gensim) using multithreading?
Currently, I'm trying to load a model once (due to size and loading time), so it stays in memory and access its similarity functions multiple thousands times in a row. I want to do that in parallel and my current approach uses a wrapper class that loads the model and is then passed to the workers. But it looks like it does not return any results.
The wrapper class. Initiated once.
from gensim.models.fasttext import load_facebook_model
class FastTextLocalModel:
def __init__(self):
self.model_name = "cc.de.300.bin"
self.model_path = path.join("data", "models", self.model_name)
self.fast_text = None
def load_model(self):
self.fast_text = load_facebook_model(self.model_path)
def similarity(self, word1: str = None, word2: str = None):
return self.fast_text.wv.similarity(word1, word2)
And the Processor class makes use of the FastTextLocalModel methods above:
fast_text_instance = FastTextLocalModel()
fast_text_instance.load_model()
with concurrent.futures.ThreadPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
docs = corpus.get_documents() # docs is iterable
processor = ProcessorClass(model=fast_text_instance)
executor.map(processor.process, docs)
Using max_workers=1 seems to work.
I have to mention that I have no expertise in python multithreading.
There may be useful ideas for you in this prior answer, which may need adaptation for FastText & latest versions of gensim:
https://stackoverflow.com/a/43067907/130288
(The keys are...
even redundantly loading in different processes may not use redundant memory, if the key memory-consuming arrays are mmapped and thus automatically shared at the OS level; and
you have to do a little extra trickery to prevent the usual recalc after-load and before-similarity-ops of normed vectors, which would destroy the sharing
..but messiness in the FastText code might make these a bit harder there.)
I'm trying to implement an INN (invertible neural network) with the structure as described in this paper.
I was wondering if it is possible to create a block (as proposed in the paper) as a custom keras layer with two different call functions.
The basic structur would look as follows:
import tensorflow as tf
import tensorflow.keras.layers as layers
class INNBlock(tf.keras.Model):
#inheriting from model instead of keras.layers.Layer, because I want manage the
#underlying layer as well
def __init__(self, size):
super(INNBlock, self).__init__(name='innblock')
#define layers
self.denseL1 = layers.Dense(size,activation='relu')
def call(self, inputs):
#define the relationship between the layers for a foward call
out = self.denseL1(inputs)
return out
def inverse_call(self, inputs):
#define inverse relationship between the layer
out = -self.denseL1(inputs) #use the same weights as the foward call
return out
class INN(tf.keras.Model):
def __init__(self,kenel_size,input_dim,min_clip,max_clip):
super(INN, self).__init__()
self.block_1 = INNBlock(size)
self.block_2 = INNBlock(size)
def call(self, inputs):
x = self.block_1(inputs)
x = self.block_2.inverse_call(y)
x = self.block_1.inverse_call(x)
return (y,x)
Solutions I already thought of (but don't particulary like):
Creating new layers for the inverse call and give them the same weights as the layers in the forward call.
Adding another dimension to inputs and have a variable in there, that determines whether or not the inverse call or the foward call is to be executed (but I don't know if this would even be allowed by keras)
I hope someone knows, if there is a way to implement this.
Thank you in advance :)
There is nothing wrong with your code. You can try it and it will run normally.
The call method is the standard method for when you simply do model_instance(input_tensor) or layer_instance(input_tensor).
But there is nothing wrong if you define another method and use that method inside the model's call method. What will happen is just:
If you use the_block(input_tensor), it will use the_block.call(input_tensor).
If you use the_block.inverse_call(input_tensor) somewhere outside a layer/model, it will fail to build a Keras model (nothing can be outside a layer)
If you use the_block.inverse_call(input_tensor) inside a layer/model (that's what you're doing), it is exactly the same as just writing the operations directly. You just wrapped it inside another function.
For Keras/Tensorflow, there will be nothing special about inverse_call. You can use it anywhere you could use any other keras/tensorflow function.
Will the gradients be updated twice?
Not exactly twice, but the operation will certainly be counted in. When the system calculates the gradient of the loss with relation to the weights, if the loss was built with inverse_call in the way, then it will participate in the gradient calculation.
But the update will be once per batch, as usual.
In PyTorch, in the forward function of a model
class my_model(torch.nn.module):
......
def forward():
self.a=torch.zeros(1)
#blabla
After model.cuda(), why self.a is still a cpu variable?
This is so by design.
Only the tensors which are a part of the model will move with model.cuda() or model.to("cuda").
These tensors are registered with register_parameter or register_buffer. This also includes child modules, parameters and buffers registered with the aforementioned functions.
Even though self.a=torch.zeros(1) is actually a part of the class itself, by design it will not be moved to CUDA, instead you would need to do a.to("cuda"), if you haven't used the register* methods.
One solution:
device=whatever_parameter_in_your_forward_function_args.device
self.a=torch.zeros(1).to(device)
One gets an already built tensorflow dataset object (tf.data.Dataset) named data.
Is there a way to know if the function repeat/batch/shuffle was called on this object, by inspecting data ? (and possibly get other informations like the argument of batch and repeat)
(I assume eager execution)
edit 1: seems line the str method carries some information. Looking into that.
edit 2: the attribute output_shapes give information on the batch size and shapes.
The only solution I could think of is getting into tensorflow code. gen_dataset_ops.py is generated during building from source, so it could only be found locally.
Another file is dataset_ops.py, it's available in the link below. You just insert print statement before relevant function's return. For example shuffle function from dataset_ops.py:
def shuffle(self, buffer_size, seed=None, reshuffle_each_iteration=None):
"""Randomly shuffles the elements of this dataset.
...
print('Dataset shuffled') #inserted print here
return ShuffleDataset(self, buffer_size, seed, reshuffle_each_iteration)
Dataset object is wrapped into DatasetV1Adapter, so you can't know anything about it advance. The only difference in eager mode is that it supports explicit iteration, but it'll be extremely inefficient to do smth like
array = np.random.rand(10)
dataset = tf.data.Dataset.from_tensor_slices(array)
if len([i for i in dataset]) != array.shape[0]:
print('repeated')
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/data/ops/dataset_ops.py