Run multiple clones of a model in parallel

Run multiple clones of a model in parallel - python

So I am trying to implement a reinforcement learning algorithm using Evolution Strategy.
The principle is to clone your original model N times (let's say 100 times), apply some noise on those 100 clones, run them, check which ones are giving the best results and use that to update the original model.
Now I am trying to put each of these clones in a different thread and run them all in parallel.
Here is my Worker class :
class WorkerThread(Thread):
def __init__(self, action_dim, img_dim, sigma, sess):
Thread.__init__(self)
#sess = tf.Session()
self.actor = ActorNetwork(sess, action_dim, img_dim)
self.env = Environment()
self.reward = 0
self.N = {}
self.original_model = None
self.sigma = sigma
def setActorModel(self, model):
self.original_model = model
def run(self):
k = 0
for l in self.actor.model.layers:
if len(np.array(l.get_weights())) > 0:
# First generate some noise
shape = (np.array(l.get_weights()[0])).shape
if len(shape) == 2:
self.N[k] = np.random.randn(shape[0], shape[1])
else:
self.N[k] = np.random.randn(shape[0], shape[1], shape[2], shape[3])
# 2nd set weights using original model's weights and noise
la = self.original_model.layers[k]
self.actor.model.layers[k].set_weights((la.get_weights()[0] + self.sigma * self.N[k], la.get_weights()[1]))
k += 1
ob = self.env.reset()
while True:
action = self.actor.predict(np.reshape(ob['image'], (1, 480, 480, 3)))
ob = self.env.step(action[0])
if ob['done']:
self.reward = ob['reward']
break
So each worker thread has it's own model, and when running I set the weights using the original's model weights.
At that point I get the following error
File "/usr/local/lib/python3.6/site-packages/keras/engine/topology.py", line 1219, in set_weights
K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2365, in batch_set_value
assign_op = x.assign(assign_placeholder)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 594, in assign
return state_ops.assign(self._variable, value, use_locking=use_locking)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign
validate_shape=validate_shape)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 59, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 350, in _apply_op_helper
g = ops._get_graph_from_inputs(_Flatten(keywords.values()))
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5055, in _get_graph_from_inputs
_assert_same_graph(original_graph_element, graph_element)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4991, in _assert_same_graph
original_item))
ValueError: Tensor("Placeholder:0", shape=(5, 5, 3, 24), dtype=float32) must be from the same graph as Tensor("conv2d_11/kernel:0", shape=(5, 5, 3, 24), dtype=float32_ref).
In the above code sample I use the same tensorflow session in all the threads. I tried creating a different session for each but I get the same error.
I have little knowledge about tensorflow, does anyone know how to fix that?

You need to use the same graph in all threads. Create a tf.Graph() in your main thread and wrap your per-thread function in "with my_graph.as_default():".

Related

Why does shap Explainer give KeyError: 'class0'?

I have a pytorch text multiclass classifier model based on a XLMR based architecture. Due to IP reasons I can't share the architecture code. I have tried putting as much detail as I can. Please point out if more information is needed. But it outputs 28 classes from 'class0' to 'class27' with probability scores that add to 1.
I am trying to use shap package to explain the results. I have wrapped my model into huggingface's custom pipeline object and I get the following output for 1 input text:
pipe = CustomPipeline(model = model, tokenizer = base_tokenizer)
output = pipe(list_of_inputs) # list_of_inputs = ['this is test input']
Output:
[[{"label": "class0","score": 0.01500235591083765},{"label": "class1","score": 0.001698049483820796},{"label": "class2","score": 0.0019644589629024267},{"label": "class3","score": 0.0004418794414959848},{"label": "class4","score": 5.9095666074426845e-05},{"label": "class5","score": 0.0007908751722425222},{"label": "class6","score": 0.002379569923505187},{"label": "class7","score": 0.0035733324475586414},{"label": "class8","score": 0.0014360857894644141},{"label": "class9","score": 0.0007365105557255447},{"label": "class10","score": 0.0014471099711954594},{"label": "class11","score": 0.0011013210751116276},{"label": "class12","score": 0.0010048456024378538},{"label": "class13","score": 0.000885132874827832},{"label": "class14","score": 0.0022015925496816635},{"label": "class15","score": 0.0013197452062740922},{"label": "class16","score": 0.0037292027845978737},{"label": "class17","score": 0.004212632775306702},{"label": "class18","score": 0.9481304287910461},{"label": "class19","score": 0.001469381619244814},{"label": "class20","score": 0.0009713817853480577},{"label": "class21","score": 0.0018773127812892199},{"label": "class22","score": 0.0009251375449821353},{"label": "class23","score": 0.0007248060428537428},{"label": "class24","score": 0.00031718137324787676},{"label": "class25","score": 0.0011144360760226846},{"label": "class26","score": 0.0002294857840752229},{"label": "class27","score": 0.00025681318948045373}]]
The output is in same format as the notebook specified by shap package.
Now, when I try to use shap Explainer:
pipe = UDTMPipeline(model = model, tokenizer = base_tokenizer)
explainer = shap.Explainer(pipe)
shap_values = explainer(list_of_inputs)
shap.plots.text(shap_values)
Error:
File "C:\pipeline.py", line 104, in udtm_xai
shap_values = explainer(query_data)
File "C:\Users\Miniconda3\envs\dtlr_udtm\lib\site-packages\shap\explainers\_partition.py", line 136, in __call__
return super().__call__(
File "C:\Users\Miniconda3\envs\dtlr_udtm\lib\site-packages\shap\explainers\_explainer.py", line 266, in __call__
row_result = self.explain_row(
File "C:\Users\Miniconda3\envs\dtlr_udtm\lib\site-packages\shap\explainers\_partition.py", line 161, in explain_row
self._curr_base_value = fm(m00.reshape(1, -1), zero_index=0)[0] # the zero index param tells the masked model what the baseline is
File "C:\Users\Miniconda3\envs\dtlr_udtm\lib\site-packages\shap\utils\_masked_model.py", line 67, in __call__
return self._full_masking_call(masks, batch_size=batch_size)
File "C:\Users\Miniconda3\envs\dtlr_udtm\lib\site-packages\shap\utils\_masked_model.py", line 144, in _full_masking_call
outputs = self.model(*joined_masked_inputs)
File "C:\Users\Miniconda3\envs\dtlr_udtm\lib\site-packages\shap\models\_transformers_pipeline.py", line 35, in __call__
output[i, self.label2id[obj["label"]]] = sp.special.logit(obj["score"]) if self.rescale_to_logits else obj["score"]
KeyError: 'class0'
The code is unable to find 'class0'. In the postprocess function of pipeline class, I read a file containing label mappings, obtain the softmax scores from _forward function of pipeline class and create a dictionary in the final format to send as output:
class CustomPipeline(Pipeline): # Ignore no indentation below, formatting issue in stackoverflow
def _sanitize_parameters(self, **kwargs):
self.mapping_json = json.loads(open("mapping_file.json", "r", encoding = "utf-8").read().strip())
return {}, {}, {}
def preprocess(self, inputs):
inputs_df = pd.DataFrame([inputs], columns = ["Query"])
inference_dataloader = getInferenceDataloader(inputs_df, self.tokenizer, batch_size = 16)
return inference_dataloader
def softmax_with_temp(self, input, t):
ex = torch.exp(input/t)
sum = torch.sum(ex,0)
return ex / sum
def _forward(self, model_inputs):
if torch.cuda.is_available():
device = torch.device('cuda')
else:
device = torch.device('cpu')
final_pred_labels = []
final_scores = []
batch_count = 0
self.model.eval()
for batch in model_inputs:
batch_count += 1
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_input_task = torch.full((b_input_ids.shape[0],), -1, dtype=torch.int32).to(device)
with torch.no_grad():
result = self.model((b_input_ids, b_input_mask, b_input_task))
logits = result
logits = logits.detach().cpu().numpy()
pred_softmax_t = self.softmax_with_temp(torch.from_numpy(logits[0]), 2).numpy()
return pred_softmax_t
def postprocess(self, model_outputs):
output_list = [{"label":"class0", "score":float(model_outputs[0])}]
index = 1
for label in self.mapping_json.keys(): #self.mapping_json contains label names that has been read from a file
output_list.append({"label":label, "score":float(model_outputs[index])}) #model_outputs is a list of 28 floating scores
index += 1
return output_list
Am I missing some to define any label based variable which is why I am getting 'class0' key error?

Pytorch Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

I am a beginner to machine learning and trying to train a model on counting the amount of numbers below 0.5 in a 1D Vector with the length of 10. The input vectors contain number between 0 and 1. I generate the input data and the labels in my script instead of having them in a seperate file, because the data is so simple.
This is the Code:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
class MyNet(nn.Module):
def __init__(self):
super(MyNet, self).__init__()
self.lin1 = nn.Linear(10,10)
self.lin2 = nn.Linear(10,1)
def forward(self,x):
x = self.lin1(x)
x = F.relu(x)
x = self.lin2(x)
return x
net = MyNet()
net.to(device)
def train():
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.1)
for epochs in range(100):
target = 0
data = torch.rand(10)
for entry in data:
if entry < 0.5:
target += 1
# print(target)
# print(data)
data = data.to(device)
out = net(data)
# print(out)
target = torch.Tensor(target)
target = target.to(device)
loss = criterion(out, target)
print(loss)
net.zero_grad()
loss.backward()
optimizer.step()
def test():
acc_error = 0
for i in range(100):
test_data = torch.rand(10)
test_data.to(device)
test_target = 0
for entry in test_data:
if entry < 0.5:
test_target += 1
out = net(test_data)
error = test_target - out
if error < 0:
error *= -1
acc_error += error
overall_error = acc_error / 100
print(overall_error)
train()
test()
This is the error:
Traceback (most recent call last):
File "test1.py", line 70, in <module>
test()
File "test1.py", line 59, in test
out = net(test_data)
File "/vol/fob-vol7/mi18/radtklau/SP/sem_project/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "test1.py", line 15, in forward
x = self.lin1(x)
File "/vol/fob-vol7/mi18/radtklau/SP/sem_project/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/vol/fob-vol7/mi18/radtklau/SP/sem_project/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/vol/fob-vol7/mi18/radtklau/SP/sem_project/lib64/python3.6/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
The other posts regarding the topic have not solved my problem. Maybe somebody can help. Thanks!

Notice how your error message traces back to test, while train works fine.
You've transfered your data correctly in train:
data = data.to(device)
But not in test:
test_data.to(device)
Instead it should be reassigned to test_data, since torch.Tensor.to makes a copy:
test_data = test_data.to(device)

Error :_pickle.PicklingError: Can't pickle <function <lambda> at 0x0000002F2175B048>: attribute lookup <lambda> on main failed

I am trying to run following code that reported running well with other users, but I found this error.
-- coding: utf-8 --
Import the Stuff
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils import data
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import cv2
import numpy as np
import csv
Step1: Read from the log file
samples = []
with open('data/driving_log.csv') as csvfile:
reader = csv.reader(csvfile)
next(reader, None)
for line in reader:
samples.append(line)
Step2: Divide the data into training set and validation set
train_len = int(0.8*len(samples))
valid_len = len(samples) - train_len
train_samples, validation_samples = data.random_split(samples, lengths=[train_len, valid_len])
Step3a: Define the augmentation, transformation processes, parameters and dataset for dataloader
def augment(imgName, angle):
name = 'data/IMG/' + imgName.split('/')[-1]
current_image = cv2.imread(name)
current_image = current_image[65:-25, :, :]
if np.random.rand() < 0.5:
current_image = cv2.flip(current_image, 1)
angle = angle * -1.0
return current_image, angle
class Dataset(data.Dataset):
def __init__(self, samples, transform=None):
self.samples = samples
self.transform = transform
def __getitem__(self, index):
batch_samples = self.samples[index]
steering_angle = float(batch_samples[3])
center_img, steering_angle_center = augment(batch_samples[0], steering_angle)
left_img, steering_angle_left = augment(batch_samples[1], steering_angle + 0.4)
right_img, steering_angle_right = augment(batch_samples[2], steering_angle - 0.4)
center_img = self.transform(center_img)
left_img = self.transform(left_img)
right_img = self.transform(right_img)
return (center_img, steering_angle_center), (left_img, steering_angle_left), (right_img, steering_angle_right)
def __len__(self):
return len(self.samples)
Step3b: Creating generator using the dataloader to parallasize the process
transformations = transforms.Compose([transforms.Lambda(lambda x: (x / 255.0) - 0.5)])
params = {'batch_size': 32,
'shuffle': True,
'num_workers': 4}
training_set = Dataset(train_samples, transformations)
training_generator = data.DataLoader(training_set, **params)
validation_set = Dataset(validation_samples, transformations)
validation_generator = data.DataLoader(validation_set, **params)
Step4: Define the network
class NetworkDense(nn.Module):
def __init__(self):
super(NetworkDense, self).__init__()
self.conv_layers = nn.Sequential(
nn.Conv2d(3, 24, 5, stride=2),
nn.ELU(),
nn.Conv2d(24, 36, 5, stride=2),
nn.ELU(),
nn.Conv2d(36, 48, 5, stride=2),
nn.ELU(),
nn.Conv2d(48, 64, 3),
nn.ELU(),
nn.Conv2d(64, 64, 3),
nn.Dropout(0.25)
)
self.linear_layers = nn.Sequential(
nn.Linear(in_features=64 * 2 * 33, out_features=100),
nn.ELU(),
nn.Linear(in_features=100, out_features=50),
nn.ELU(),
nn.Linear(in_features=50, out_features=10),
nn.Linear(in_features=10, out_features=1)
)
def forward(self, input):
input = input.view(input.size(0), 3, 70, 320)
output = self.conv_layers(input)
output = output.view(output.size(0), -1)
output = self.linear_layers(output)
return output
class NetworkLight(nn.Module):
def __init__(self):
super(NetworkLight, self).__init__()
self.conv_layers = nn.Sequential(
nn.Conv2d(3, 24, 3, stride=2),
nn.ELU(),
nn.Conv2d(24, 48, 3, stride=2),
nn.MaxPool2d(4, stride=4),
nn.Dropout(p=0.25)
)
self.linear_layers = nn.Sequential(
nn.Linear(in_features=48*4*19, out_features=50),
nn.ELU(),
nn.Linear(in_features=50, out_features=10),
nn.Linear(in_features=10, out_features=1)
)
def forward(self, input):
input = input.view(input.size(0), 3, 70, 320)
output = self.conv_layers(input)
output = output.view(output.size(0), -1)
output = self.linear_layers(output)
return output
Step5: Define optimizer
model = NetworkLight()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
criterion = nn.MSELoss()
Step6: Check the device and define function to move tensors to that device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('device is: ', device)
def toDevice(datas, device):
imgs, angles = datas
return imgs.float().to(device), angles.float().to(device)
Step7: Train and validate network based on maximum epochs defined
max_epochs = 22
for epoch in range(max_epochs):
model.to(device)
# Training
train_loss = 0
model.train()
for local_batch, (centers, lefts, rights) in enumerate(training_generator):
# Transfer to GPU
centers, lefts, rights = toDevice(centers, device), toDevice(lefts, device), toDevice(rights, device)
# Model computations
optimizer.zero_grad()
datas = [centers, lefts, rights]
for data in datas:
imgs, angles = data
# print("training image: ", imgs.shape)
outputs = model(imgs)
loss = criterion(outputs, angles.unsqueeze(1))
loss.backward()
optimizer.step()
train_loss += loss.data[0].item()
if local_batch % 100 == 0:
print('Loss: %.3f '
% (train_loss/(local_batch+1)))
# Validation
model.eval()
valid_loss = 0
with torch.set_grad_enabled(False):
for local_batch, (centers, lefts, rights) in enumerate(validation_generator):
# Transfer to GPU
centers, lefts, rights = toDevice(centers, device), toDevice(lefts, device), toDevice(rights, device)
# Model computations
optimizer.zero_grad()
datas = [centers, lefts, rights]
for data in datas:
imgs, angles = data
# print("Validation image: ", imgs.shape)
outputs = model(imgs)
loss = criterion(outputs, angles.unsqueeze(1))
valid_loss += loss.data[0].item()
if local_batch % 100 == 0:
print('Valid Loss: %.3f '
% (valid_loss/(local_batch+1)))
Step8: Define state and save the model wrt to state
state = {
'model': model.module if device == 'cuda' else model,
}
torch.save(state, 'model.h5')
this is the error message:
"D:\VICO\Back up\venv\Scripts\python.exe" "D:/VICO/Back up/venv/Scripts/self_driving_car.py"
device is: cpu
Traceback (most recent call last):
File "D:/VICO/Back up/venv/Scripts/self_driving_car.py", line 163, in <module>
for local_batch, (centers, lefts, rights) in enumerate(training_generator):
File "D:\VICO\Back up\venv\lib\site-packages\torch\utils\data\dataloader.py", line 291, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "D:\VICO\Back up\venv\lib\site-packages\torch\utils\data\dataloader.py", line 737, in __init__
w.start()
File "C:\Users\isonata\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\isonata\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\isonata\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\isonata\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\isonata\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x0000002F2175B048>: attribute lookup <lambda> on __main__ failed
Process finished with exit code 1
I am not sure the next step to resolve the problem.

pickle doesn't pickle function objects. It expects to find the function object by importing its module and looking up its name. lambdas are anonymous functions (no name) so that doesn't work. The solution is to name the function at module level. The only lambda I found in your code is
transformations = transforms.Compose([transforms.Lambda(lambda x: (x / 255.0) - 0.5)])
Assuming that's the troublesome function, you can
def _my_normalization(x):
return x/255.0 - 0.5
transformations = transforms.Compose([transforms.Lambda(_my_normalization])
You may have other problems because it looks like you are doing work at module level. If this is a multiprocessing thing and you are running on windows, the new process will import the file and run all of that module level code again. This isn't a problem on linux/mac where a forked process already has the modules loaded from the parent.

How to train data in NiftyNet

I'm trying to train a network using NiftyNet with my own data (CT images and their corresponding labels). I designed the Net class shortly following some other training with similar sample data, all NiftyNet documentation I could find and parameters of my own data adjusted. But I keep getting this error:
"TypeError: init() got an unexpected keyword argument 'w_initializer'".
I've tried every change I could think of in my config.ini, Net class, etc. But I can't make it work nor find the reason. Can anyone help with this error? Or maybe share some guidelines to train my own network from the beginning so I can at least try to start an alternative from zero and see if I find a way out?
Training command:
! net_segment train -c /home/niftynet/extensions/dense_vnet_TC/config.ini --name dense_vnet_TC.net_TC.MyNet
Some values in config.ini:
[NETWORK]
name = dense_vnet
batch_size = 6
volume_padding_size = 0
window_sampling = resize
[TRAINING]
sample_per_volume = 1
lr = 0.001
loss_type = dense_vnet_TC.dice_hinge.dice
starting_iter = 0
save_every_n = 1000
max_iter = 3001
[INFERENCE]
border = (0, 0, 0)
inference_iter = 3000
output_interp_order = 0
spatial_window_size = (512, 512, 40)
save_seg_dir = ./segmentation_output/
############################ Custom configuration
[SEGMENTATION]
image = ct
label = label
label_normalisation = False
output_prob = False
num_classes = 2
Basics of Net class:
from niftynet.network.base_net import BaseNet
class MyNet(BaseNet):
def __init__(self, num_classes, name='MyNet'):
super(MyNet, self).__init__(num_classes=num_classes, acti_func=acti_func, name=name)
# network specific property
self.hidden_features = 10
def layer_op(self, images, is_training):
# create layer instances
conv_1 = ConvolutionalLayer(self.hidden_features, kernel_size=3, name='conv_input')
conv_2 = ConvolutionalLayer(self.num_classes, kernel_size=1, acti_func=None, name='conv_output')
# apply layer instances
flow = conv_1(images, is_training)
flow = conv_2(flow, is_training)
return flow
End of output, after doing some of the processing as expected:
Traceback (most recent call last): File
"/home/niftynet/bin/net_segment", line 10, in
sys.exit(main()) File "/home/niftynet/lib/python3.6/site- packages/niftynet/init.py",
line 142, in main
app_driver.run(app_driver.app) File "/home/niftynet/lib/python3.6/site-packages/niftynet/engine/application_driver.py",
line 189, in run
is_training_action=self.is_training_action) File "/home/niftynet/lib/python3.6/site- packages/niftynet/engine/application_driver.py",
line 258, in create_graph
application.initialise_network() File "/home/niftynet/lib/python3.6/site-packages/niftynet/application/segmentation_application.py",
line 280, in initialise_network
acti_func=self.net_param.activation_function) TypeError: init() got an unexpected keyword argument 'w_initializer'

I think you need to change this line (based on a similar problem I had):
super(MyNet, self).__init__(num_classes=num_classes, acti_func=acti_func, name=name)
for (just add w_regularizer) :
super(MyNet, self).__init__(num_classes=num_classes, w_regularizer=w_regularizer, acti_func=acti_func, name=name)
if not try also to add it here :
def __init__(self, num_classes, w_regularizer=w_regularizer, name='MyNet'):
I hope it helps.

Use two different LSTM cell in Tensorflow

I am building a neural machine translator, and I have to use two different LSTM cells (one for the encoder, and one for the decode).
The two cells have differents shapes:
the encoder (first one) is fed with the token of the input sentence and produces a state vector
the decoder (second one) is fed with the previous state vector, and the tokens generated by itself
I writed this in Tensorflow, and when I run the script, I got the following error (raised during the decoder phase):
outputs, states = tf.nn.rnn(cell_backward, inputs, initial_state=initial_state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 158, in rnn
(output, state) = call_cell()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 145, in <lambda>
call_cell = lambda: cell(input_, state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 520, in __call__
dtype, self._num_unit_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 357, in _get_concat_variable
sharded_variable = _get_sharded_variable(name, shape, dtype, num_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 387, in _get_sharded_variable
dtype=dtype))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 732, in get_variable
partitioner=partitioner, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 596, in get_variable
partitioner=partitioner, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 161, in get_variable
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 437, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable backward/RNN/LSTMCell/W_0 already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
File "/home/alexis/Documents/NMT/NMT.py", line 88, in dense_to_vector_state
outputs, states = tf.nn.rnn(cell_forward, inputs, initial_state=initial_state)
How can I explicitly specify that I want to create a totally new LSTM cell ?
Thanks in advance !
Alexis

Use variable scopes
with tf.variable_scope('enc'):
cell_enc = LSTMCell(hidden_size)
with tf.variable_scope('dec'):
cell_dec = LSTMCell(hidden_size)

I am trying to do machine translation. Here is my encoder and decoder. You just need to use different variable scopes for each rnn. Rather than using the MultiRNNCell cell for the encoder I unroll each layer manually which lets me alternate directions between layers. See how each layer gets its own scope.
with tf.variable_scope('encoder'):
rnn_cell = tf.nn.rnn_cell.LSTMCell(512, num_proj = 256, state_is_tuple = True)
for level in range(3):
with tf.variable_scope('level_%d' % level) as scope:
state = [tf.zeros((BATCH_SIZE, sz)) for sz in rnn_cell.state_size]
for t in range(TIME_STEPS) if level % 2 else reversed(range(TIME_STEPS)):
y[t], state = rnn_cell(y[t], state)
scope.reuse_variables()
with tf.variable_scope('decoder') as scope:
rnn_cell = tf.nn.rnn_cell.MultiRNNCell \
([
tf.nn.rnn_cell.LSTMCell(512, num_proj = 256, state_is_tuple = True),
tf.nn.rnn_cell.LSTMCell(512, num_proj = WORD_VEC_SIZE, state_is_tuple = True)
], state_is_tuple = True)
state = [[tf.zeros((BATCH_SIZE, sz)) for sz in sz_outer] for sz_outer in rnn_cell.state_size]
W_soft = tf.get_variable('W_soft', shape = (NWORDS, WORD_VEC_SIZE), initializer = tf.truncated_normal_initializer(0.0, 1 / np.sqrt(WORD_VEC_SIZE)))
b_soft = tf.get_variable('b_soft', shape = (NWORDS,), initializer = tf.truncated_normal_initializer(0.0, 0.01))
cost = 0
output = [None] * TIME_STEPS
for t in range(TIME_STEPS):
if t:
last = y_[t - 1] if TRAINING else y[t - 1]
else:
last = tf.zeros((BATCH_SIZE, WORD_VEC_SIZE))
y[t] = tf.concat(1, (y[t], last))
y[t], state = rnn_cell(y[t], state)
cost += tf.reduce_mean(tf.nn.sampled_softmax_loss(W_soft, b_soft, y[t], target_output[:, t : t + 1], 1000, NWORDS))
output[t] = tf.reshape(tf.nn.softmax(tf.matmul(y[t], W_soft, transpose_b = True) + b_soft), (BATCH_SIZE, 1, NWORDS))
scope.reuse_variables()
output = tf.concat(1, output)
cost /= TIME_STEPS

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Run multiple clones of a model in parallel - python

You need to use the same graph in all threads. Create a tf.Graph() in your main thread and wrap your per-thread function in "with my_graph.as_default():".

Related

Why does shap Explainer give KeyError: 'class0'?

Pytorch Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

Error :_pickle.PicklingError: Can't pickle <function <lambda> at 0x0000002F2175B048>: attribute lookup <lambda> on main failed

How to train data in NiftyNet

Use two different LSTM cell in Tensorflow

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Run multiple clones of a model in parallel - python

You need to use the same graph in all threads. Create a tf.Graph() in your main thread and wrap your per-thread function in "with my_graph.as_default():".

Related

Why does shap Explainer give KeyError: 'class0'?

Pytorch Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

Error :_pickle.PicklingError: Can't pickle <function <lambda> at 0x0000002F2175B048>: attribute lookup <lambda> on __main__ failed

How to train data in NiftyNet

Use two different LSTM cell in Tensorflow

Categories

Resources

Error :_pickle.PicklingError: Can't pickle <function <lambda> at 0x0000002F2175B048>: attribute lookup <lambda> on main failed