Related
I am working on implementing an algorithm which approximates the solution of a partial differential equation. The main idea behind this is that I start at time 0 with a guess of the solution u[0], and its gradient z[0], and then use a recursive formula to approximately calculate the solution up to the last time point in a forward manner. The formula looks like this
u[i+1] = u[i] + f(t[i],x[i],u[i],z[i])*dt + z[i]*dW[i]
where the function f, time time discretization, the time step dt, and the increment of a Brownian motion dW is given. The gradient z[i] at time point i is being approximated by a deep neural network with input x[i] which I already have implemented with tf.keras with two hidden dense layers. These networks perform quite well. So far, I have N (number of time points) independent neural networks approximating z[i] for each time point respectively.
My task is to form a global neural network with input (x, W), and where (u[0], z[0]) will be given to this network as network parameters, such that this network can than optimize its parameters by minimizing the expected quadratic loss of the output/approximation of uN and the given terminal condition of the partial differential equation g(x). u[0] will then be the solution of the PDE. So while my neural networks approximating have 2 hidden layers each, the global network should have 2*(N-1) layers in total.
My neural networks for the gradients look like this:
# Input dimension
d = 1
# Output dimension
d_1 = 1
# Number of neurons
m = d + 10
# Batch size
batch_size = 32
# Training data
x_tr = some_simulation()
z_tr = calculated_given(x_tr)
# Test data
x_te = some_simulation()
z_te = calculated_given(x_te)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(d,), dtype=tf.float32))
model.add(tf.keras.layers.Dense(m, activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(m, activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(d_1, activation=tf.keras.activations.linear))
model.compile(optimizer='adam',
loss='MeanSquaredError',
metrics=[])
model.fit(x_tr, z_tr, batch_size=batch_size, epochs=10)
val_loss = model.evaluate(x_te, z_te)
print(val_loss)
So I have trained N of them, and saved each as a file using
model.save(path_to_model)
So given the approximations of the gradients, I now want to stack all the subnetworks together to form a global deep neural network, which is based on the recursive formula above, which only takes the N-dimensional vectors x, and W as input data, and which gives the final output u[N] as output, and which uses (u[0], z[0]) as parameters. But I am trying to wrap my head around for two days as to how such a global neural network should be implemented in Python using Tensorflow.keras, so maybe someone can give me a push in the right direction?
I assume that you'll pass the tuple (x, dW, t) as input to your model, since t is also indexed. Furthermore, you can always create dW from W using np.diff. I also assume that u0 and z0 are scalars (common to all batches).
With all of that in mind, you can subclass the base Model and override its call() as follows
class GlobalModel(tf.keras.models.Model):
def __init__(self, u0, z0, dt, subnet_list, **kwargs):
super().__init__(**kwargs)
self.u0 = tf.Variable(u0, trainable=True, dtype=tf.float32)
self.z0 = tf.Variable(z0, trainable=True, dtype=tf.float32)
self.dt = tf.constant(dt, dtype=tf.float32)
self.subnet_list = subnet_list
# Freeze the pre-trained subnets
for subnet in subnet_list:
subnet.trainable = False
def f(self, t, x, u, z):
# code of your function f() goes here
def step_update(self, t, x, u, z, dW):
return u + self.f(t, x, u, z) * self.dt + z * dW
def call(self, inputs, training=None):
x, dW, t = inputs
# First step
x_i = tf.gather(x, 0, axis=1)
dW_i = tf.gather(dW, 0, axis=1)
t_i = tf.gather(t, 0, axis=1)
u_i = self.step_update(t_i, x_i, self.u0, self.z0, dW_i)
# Subsequent steps
for i, subnet in enumerate(self.subnet_list):
x_i = tf.gather(x, i+1, axis=1)
dW_i = tf.gather(dW, i+1, axis=1)
t_i = tf.gather(t, i+1, axis=1)
z_i = subnet(x_i, training=False)
u_i = self.step_update(t_i, x_i, u_i, z_i, dW_i)
return u_i
You initialize this model by
global_model = GlobalModel(init_u0, init_z0, dt, subnet_list)
where subnet_list is a list of your pre-trained subnets, ordered by time index. That is, the subnet responsible for predicting z_i should be at index i-1 in this list.
After compiling, you call fit() on the model by
global_model.fit(x=(x_tr, dW_tr, t_tr), y=y_tr, batch_size=batch_size, epochs=epochs)
where y_tr is your target.
I have a precipitation map timeseries dataset with input shape (None, seq_length =7, c = 75, w=112, h=112) output shape (None, lead_times = 60, c=51, w=28, h=28). The model (Conv downsampler + ConvGRU + Axial Attention) predicts precipitation in a 28x28 region in the middle with 51 categorical precipitation intervals and is conditioned with 60 different lead times (5, 10, ..., 300 minutes).
Right now my forward pass looks like this:
def forward(self, imgs):
"""It takes a rank 5 tensor
- imgs [bs, seq_len, channels, h, w]
"""
# Compute all timesteps, probably can be parallelized
res = []
for i in range(self.forecast_steps):
x_i = self.encode_timestep(imgs, i)
out = self.head(x_i)
res.append(out)
res = torch.stack(res, dim=1)
return res
Here imgs is the input tensor without lead time encoding, so only 15 channels. The imgs is then one-hot encoded for each respective lead time and the output is the entire predicted time series (5-300min). However this leads to severe memory issues even with batch_size = 1 so I want the forward loop to only do one random lead time at a time. I am training this with pytorch-lightning module for easier parallelization so I don't have much control of the training loop.
The issue is that the effective batch size with this training loop is 60*batch_size. The paper solves this by only doing one random lead time per sample, which now makes sense to me. This solves the memory issue by allowing effective minimum batch size to be 1. How can pass a random integer (the lead time) to the forward pass and couple it with the correct Y when pytorch-lightning computes the loss?
I want
y_hat = forward(self, X[n], lead_time=random)
...
loss(y_hat-Y[n,lead_time,:,:])
My code is available at https://github.com/ValterFallenius/metnet.
I figured out how to fix it. Once explaining the problem to someone else, I realized how simple the solution was...
def forward(self, imgs,lead_time):
"""It takes a rank 5 tensor
- imgs [bs, seq_len, channels, h, w]
- lead_time #random int between 0,self.forecast_steps
"""
x_i = self.encode_timestep(imgs, lead_time)
out = self.head(x_i)
res.append(out)
res = torch.stack(res, dim=1)
return res
The trick was to simply add the lead_time variable to the training_step method:
def training_step(self, batch, batch_idx):
x, y = batch
lead_time = np.random.randint(0,self.forecast_steps)
y_hat = self(x.float(),lead_time)
loss = F.mse_loss(y_hat, y[:,lead_time])
pbar = {"training_loss": loss}
return {"loss": loss, "progress_bar": pbar}
I've been searching on how to perform matrix factorization for this very simple and basic case that I will show, but didn't find anything. I only found complex and long solutions, so I will present what I want to solve:
U x V = A
I would just like to know how to solve this equation in Tensorflow 2, being A a known sparse matrix, and U and V two random initialized matrices. So I would like to find U and V, so that their multiplication is approximately equal to A.
For example, having these variables:
# I use this function to build a toy dataset for the sparse matrix
def build_rating_sparse_tensor(ratings):
indices = ratings[['U_num', 'V_num']].values
values = ratings['rating'].values
return tf.SparseTensor(
indices=indices,
values=values,
dense_shape=[ratings.U_num.max()+1, ratings.V_num.max()+1])
# here I create what will be the matrix A
ratings = (pd.DataFrame({'U_num': list(range(0,10_000))*30,
'V_num': list(range(0,60_000))*5,
'rating': np.random.randint(6, size=300_000)})
.sample(1000)
.drop_duplicates(subset=['U_num','V_num'])
.sort_values(['U_num','V_num'], ascending=[1,1]))
# Variables
A = build_rating_sparse_tensor(ratings)
U = tf.Variable(tf.random_normal(
[A_Sparse.shape[0], embeddings], stddev=init_stddev))
# this matrix would be transposed in the equation
V = tf.Variable(tf.random_normal(
[A_Sparse.shape[1], embeddings], stddev=init_stddev))
# loss function
def sparse_mean_square_error(sparse_ratings, user_embeddings, movie_embeddings):
predictions = tf.reduce_sum(
tf.gather(user_embeddings, sparse_ratings.indices[:, 0]) *
tf.gather(movie_embeddings, sparse_ratings.indices[:, 1]),
axis=1)
loss = tf.losses.mean_squared_error(sparse_ratings.values, predictions)
return loss
Is it possible to do this with a particular loss function, optimizer and learning schedule?
Thank you very much.
A naive and straightforward approach using TensorFlow 2:
Note that rating was converted to float32. TensorFlow cannot calculate gradients over integer, see https://github.com/tensorflow/tensorflow/issues/20524.
A = build_rating_sparse_tensor(ratings)
indices = ratings[["U_num", "V_num"]].values
embeddings = 3000
U = tf.Variable(tf.random.normal([A.shape[0], embeddings]), dtype=tf.float32)
V = tf.Variable(tf.random.normal([embeddings, A.shape[1]]), dtype=tf.float32)
optimizer = tf.optimizers.Adam()
trainable_weights = [U, V]
for step in range(100):
with tf.GradientTape() as tape:
A_prime = tf.matmul(U, V)
# indexing the result based on the indices of A that contain a value
A_prime_sparse = tf.gather(
tf.reshape(A_prime, [-1]),
indices[:, 0] * tf.shape(A_prime)[1] + indices[:, 1],
)
loss = tf.reduce_sum(tf.metrics.mean_squared_error(A_prime_sparse, A.values))
grads = tape.gradient(loss, trainable_weights)
optimizer.apply_gradients(zip(grads, trainable_weights))
if step % 20 == 0:
print(f"Training loss at step {step}: {loss:.4f}")
We take advantage of the sparsity of A by calculating the loss only over the actual values of A. However, we still have to allocate two really big dense tensor for the trainable weights U and V. For big numbers like in your example, you will probably encounter some OOM errors.
Maybe it could be worth exploring another representation for your data.
I am trying to feed the pixel vector to the convolutional neural network (CNN), where the pixel vector came from image data like cifar-10 dataset. Before feeding the pixel vector to CNN, I need to expand the pixel vector with maclaurin series. The point is, I figured out how to expand tensor with one dim, but not able to get it right for tensor with dim >2. Can anyone one give me ideas of how to apply maclaurin series of one dim tensor to tensor dim more than 1? is there any heuristics approach to implement this either in TensorFlow or Keras? any possible thought?
maclaurin series on CNN:
I figured out way of expanding tensor with 1 dim using maclaurin series. Here is how to scratch implementation looks like:
def cnn_taylor(input_dim, approx_order=2):
x = Input((input_dim,))
def pwr(x, approx_order):
x = x[..., None]
x = tf.tile(x, multiples=[1, 1, approx_order + 1])
pw = tf.range(0, approx_order + 1, dtype=tf.float32)
x_p = tf.pow(x, pw)
x_p = x_p[..., None]
return x_p
x_p = Lambda(lambda x: pwr(x, approx_order))(x)
h = Dense(1, use_bias=False)(x_p)
def cumu_sum(h):
h = tf.squeeze(h, axis=-1)
s = tf.cumsum(h, axis=-1)
s = s[..., None]
return s
S = Lambda(cumu_sum)(h)
so above implementation is sketch coding attempt on how to expand CNN with Taylor expansion by using 1 dim tensor. I am wondering how to do same thing to tensor with multi dim array (i.e, dim=3).
If I want to expand CNN with an approximation order of 2 with Taylor expansion where input is a pixel vector from RGB image, how am I going to accomplish this easily in TensorFlow? any thought? Thanks
If I understand correctly, each x in the provided computational graph is just a scalar (one channel of a pixel). In this case, in order to apply the transformation to each pixel, you could:
Flatten the 4D (b, h, w, c) input coming from the convolutional layer into a tensor of shape (b, h*w*c).
Apply the transformation to the resulting tensor.
Undo the reshaping to get a 4D tensor of shape (b, h, w, c)` back for which the "Taylor expansion" has been applied element-wise.
This could be achieved as follows:
shape_cnn = h.shape # Shape=(bs, h, w, c)
flat_dim = h.shape[1] * h.shape[2] * h.shape[3]
h = tf.reshape(h, (-1, flat_dim))
taylor_model = taylor_expansion_network(input_dim=flat_dim, max_pow=approx_order)
h = taylor_model(h)
h = tf.reshape(h, (-1, shape_cnn[1], shape_cnn[2], shape_cnn[3]))
NOTE: I am borrowing the function taylor_expansion_network from this answer.
UPDATE: I still don't clearly understand the end goal, but perhaps this update brings us closer to the desired output. I modified the taylor_expansion_network to apply the first part of the pipeline to RGB images of shape (width, height, nb_channels=3), returning a tensor of shape (width, height, nb_channels=3, max_pow+1):
def taylor_expansion_network_2(width, height, nb_channels=3, max_pow=2):
input_dim = width * height * nb_channels
x = Input((width, height, nb_channels,))
h = tf.reshape(x, (-1, input_dim))
# Raise input x_i to power p_i for each i in [0, max_pow].
def raise_power(x, max_pow):
x_ = x[..., None] # Shape=(batch_size, input_dim, 1)
x_ = tf.tile(x_, multiples=[1, 1, max_pow + 1]) # Shape=(batch_size, input_dim, max_pow+1)
pows = tf.range(0, max_pow + 1, dtype=tf.float32) # Shape=(max_pow+1,)
x_p = tf.pow(x_, pows) # Shape=(batch_size, input_dim, max_pow+1)
return x_p
h = raise_power(h, max_pow)
# Compute s_i for each i in [0, max_pow]
h = tf.cumsum(h, axis=-1) # Shape=(batch_size, input_dim, max_pow+1)
# Get the input format back
h = tf.reshape(h, (-1, width, height, nb_channels, max_pow+1)) # Shape=(batch_size, w, h, nb_channels, max_pow+1)
# Return Taylor expansion model
model = Model(inputs=x, outputs=h)
model.summary()
return model
In this modified model, the last step of the pipeline, namely the sum of w_i * s_i for each i, is not applied. Now, you can use the resulting tensor of shape (width, height, nb_channels=3, max_pow+1) in any way you want.
I'm trying to implement the recurrent neural network with numpy.
My current input and output designs are as follow:
x is of shape: (sequence length, batch size, input dimension)
h : (number of layers, number of directions, batch size, hidden size)
initial weight: (number of directions, 2 * hidden size, input size + hidden size)
weight: (number of layers -1, number of directions, hidden size, directions*hidden size + hidden size)
bias: (number of layers, number of directions, hidden size)
I have looked up pytorch API of RNN the as reference (https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN), but have slightly changed it to include initial weight as input. (output shapes are supposedly the same as in pytorch)
While it is running, I cannot determine whether it is behaving right, as I am inputting randomly generated numbers as input.
In particular, I am not so certain whether my input shapes are designed correctly.
Could any expert give me a guidance?
def rnn(xs, h, w0, w=None, b=None, num_layers=2, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True):
num_directions = 2 if bidirectional else 1
batch_size = xs.shape[1]
input_size = xs.shape[2]
hidden_size = h.shape[3]
hn = []
y = [None]*len(xs)
for l in range(num_layers):
for d in range(num_directions):
if l==0 and d==0:
wi = w0[d, :hidden_size, :input_size].T
wh = w0[d, hidden_size:, input_size:].T
wi = np.reshape(wi, (1,)+wi.shape)
wh = np.reshape(wh, (1,)+wh.shape)
else:
wi = w[max(l-1,0), d, :, :hidden_size].T
wh = w[max(l-1,0), d, :, hidden_size:].T
for i,x in enumerate(xs):
if l==0 and d==0:
ht = np.tanh(np.dot(x, wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
ht = np.reshape(ht,(batch_size, hidden_size)) #otherwise, shape is (bs,1,hs)
else:
ht = np.tanh(np.dot(y[i], wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
y[i] = ht
hn.append(ht)
y = np.asarray(y)
y = np.reshape(y, y.shape+(1,))
return np.asarray(y), np.asarray(hn)
Regarding the shape, it probably makes sense if that's how PyTorch does it, but the Tensorflow way is a bit more intuitive - (batch_size, seq_length, input_size) - batch_size sequences of seq_length length where each element has input_size size. Both approaches can work, so I guess it's a matter of preferences.
To see whether your rnn is behaving appropriately, I'd just print the hidden state at each time step, run it on some small random data (e.g. 5 vectors, 3 elements each) and compare the results with your manual calculations.
Looking at your code, I'm unsure if it does what it's supposed to, but instead of doing this on your own based on an existing API, I'd recommend you read and try to replicate this awesome tutorial from wildml (in part 2 there's a pure numpy implementation).