What I am trying to achieve now is to create a custom loss function in Keras that takes in two tensors (y_true, y_pred) with shapes (None, None, None) and (None, None, 3), respectively. However, the None's are so, that the two shapes are always equal for every (y_true, y_pred). From these tensors I want to produce two distance matrices that contain the squared distances between every possible point pair (the third, length 3 dimension contains x, y, and z spatial values) inside them and then return the difference between these distance matrices. The first code I tried was this:
def distanceMatrixLoss1(y_true, y_pred):
distMatrix1 = [[K.sum(K.square(y_true[i] - y_true[j])) for j in range(i + 1, y_true.shape[1])] for j in range(y_true.shape[1])]
distMatrix2 = [[K.sum(K.square(y_pred[i] - y_pred[j])) for j in range(i + 1, y_pred.shape[1])] for j in range(y_pred.shape[1])]
return K.mean(K.square(K.flatten(distMatrix1) - K.flatten(distMatrix2)))
(K is the TensorFlow backend.) Needless to say, I got the following error:
'NoneType' object cannot be interpreted as an integer
This is understandable, since range(None) does not make a lot of sense and y_true.shape[0] or y_pred.shape[0] is None. I searched whether others got somehow the same problem or not and I found that I could use the scan function of TensorFlow:
def distanceMatrixLoss2(y_true, y_pred):
subtractYfromXi = lambda x, y: tf.scan(lambda xi: K.sum(K.square(xi - y)), x)
distMatrix = lambda x, y: K.flatten(tf.scan(lambda yi: subtractYfromXi(x, yi), y))
distMatrix1 = distMatrix(y_true, y_true)
distMatrix2 = distMatrix(y_pred, y_pred)
return K.mean(K.square(distMatrix1-distMatrix2))
What I got from this is a different error, that I do not fully understand.
TypeError: <lambda>() takes 1 positional argument but 2 were given
So this went into the trash too. My last try was using the backend's map_fn function:
def distanceMatrixLoss3(y_true, y_pred):
subtractYfromXi = lambda x, y: K.map_fn(lambda xi: K.sum(K.square(xi - y)), x)
distMatrix = lambda x, y: K.flatten(K.map_fn(lambda yi: subtractYfromXi(x, yi), y))
distMatrix1 = distMatrix(y_true, y_true)
distMatrix2 = distMatrix(y_pred, y_pred)
return K.mean(K.square(distMatrix1-distMatrix2))
This did not throw an error, but when the training started the loss was constant 0 and stayed that way. So now I am out of ideas and I kindly ask you to help me untangle this problem. I have already tried to do the same in Mathematica and also failed (here is the link to the corresponding question, if it helps).
Assuming that dimension 0 is the batch size as usual and you don't want to mix samples.
Assuming that dimension 1 is the one you want to make pairs
Assuming that the last dimension is 3 for all cases although your model returns None.
Iterating tensors is a bad idea. It might be better just to make a 2D matrix from the original 1D, though having repeated values.
def distanceMatrix(true, pred): #shapes (None1, None2, 3)
#------ creating the distance matrices 1D to 2D -- all vs all
true1 = K.expand_dims(true, axis=1) #shapes (None1, 1, None2, 3)
pred1 = K.expand_dims(pred, axis=1)
true2 = K.expand_dims(true, axis=2) #shapes (None1, None2, 1, 3)
pred2 = K.expand_dims(pred, axis=2)
trueMatrix = true1 - true2 #shapes (None1, None2, None2, 3)
predMatrix = pred1 - pred2
#--------- euclidean x, y, z distance
#maybe needs a sqrt?
trueMatrix = K.sum(K.square(trueMatrix), axis=-1) #shapes (None1, None2, None2)
predMatrix = K.sum(K.square(predMatrix), axis=-1)
#-------- loss for each pair
loss = K.square(trueMatrix - predMatrix) #shape (None1, None2, None2)
#----------compensate the duplicated non-diagonals
diagonal = K.eye(K.shape(true)[1]) #shape (None2, None2)
#if Keras complains because the input is a tensor, use `tf.eye`
diagonal = K.expand_dims(diagonal, axis=0) #shape (1, None2, None2)
diagonal = 0.5 + (diagonal / 2.)
loss = loss * diagonal
#--------------
return K.mean(loss, axis =[1,2]) #or just K.mean(loss)
Related
In TensorFlow, I intend to manipulate tensor with Taylor series of sin(x) with certain approximation terms. To do so, I have tried to manipulate the grayscale image (shape of (32,32)) with Taylor series of sin(x) and it works fine. Now I have trouble manipulating the same things that worked for a grayscale image with the shape of (32,32) to RGB image with the shape of (32,32,3), and it doesn't give me the correct array. Intuitively, I am trying to manipulate tensor with Taylor's expansion of sin(x). Can anyone show me the possible way of doing this in tensorflow? Any idea?
my attempt:
here is taylor expansion of sin(x) at x=0: 1- x + x**2/2 - x**3/6 with three expansion term.
from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
x= X_train[1,:,:,1]
k= 3
func = 'sin(x)'
new_x = np.zeros((x.shape[0], x.shape[1]*k))
new_x = new_x.astype('float32')
nn = 0
for i in range(x.shape[1]):
col_d = x[:,i].ravel()
new_x[:,nn] = col_d
if n_terms > 0:
for j in range(1,k):
if func == 'cos(x)':
new_x[:,nn+j] = new_x[:,nn+j-1]
I think I could do this more efficiently with TensorFlow but that's not quite intuitive for me how to do it. Can anyone suggest a possible workaround to make this work? Any thought?
update:
In 2dim array col_d = x[:,i].ravel() is pixel vector which flattened 2 dim array. Similarly, we could reshape 3dim array to 2 dim by this way: x.transpose(0,1,2).reshape(x.shape[1],-1) in for loop, so it could be x[:,i].transpose(0,1,2).reshape(x.shape[1],-1), but this is still not correct. I think tensorflow might have better way of doing this. How can we manipulate the tensor with taylor series of sin(x) more efficiently? Any thoughts?
goal:
Intuitively, in Taylor series of sin(x), x is tensor, and if we want only 2, 3 approximation terms of Taylor series of sin(x) for each tensor, I want to concatenate them in new tensor. How should we do it efficiently in TensorFlow? Any thoughts?
new_x = np.zeros((x.shape[0], x.shape[1]*n_terms))
This line has no meaning, why allocating space for 96 elements for 3 taylor expansion terms.
(new_x[:, 3:] == 0.0).all() = True # check
For pixelwise taylor expansion with n-terms
def sin_exp_step(x, i):
c1 = 2 * i + 1
c2 = (-1) ** i / np.math.factorial(c1)
t = c2 * (x ** c1)
return t
# validate
x = 45.0
x = (np.pi / 180.0) * x
y = np.sin(x)
approx_y = 0
for i in range(n_terms):
approx_y += sin_exp_step(x, i)
abs(approx_y - y) < 1e-8
x= X_train[1,:,:,:]
n_terms = 3
func = 'sin(x)'
new_x = np.zeros((*x.shape, n_terms))
for i in range(0, n_terms):
if func == 'sin(x)': # sin(x)
new_x[..., i] += sin_exp_step(x, i)
Commonly numerical approximation methods are being avoided, as they are computationally expensive (i.e. factorial) and less stable, so gradient based optimization usually is the best, for a higher order derivatives algorithms such BFGS and LBFGS used to approximate hessian matrix (2nd order derivative). Optimizers such Adam & SGD are sufficient and comes with much less computational consumption. Using neural network, we might be able to find a much better expansions.
Tensorflow solution for n-terms expansion
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.layers import Input, LocallyConnected2D
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = tf.constant(x_train, dtype=tf.float32)
x_test = tf.constant(x_test, dtype=tf.float32)
def expansion_approx_of(func):
def reconstruction_loss(y_true, y_pred):
loss = (y_pred - func(y_true)) ** 2
loss = 0.5 * K.mean(loss)
return loss
return reconstruction_loss
class Expansion2D(LocallyConnected2D): # n-terms expansion layer
def __init__(self, i_shape, n_terms, kernel_size=(1, 1), *args, **kwargs):
if len(i_shape) != 3:
raise ValueError('...')
self.i_shape = i_shape
self.n_terms = n_terms
filters = self.n_terms * self.i_shape[-1]
super(Expansion2D, self).__init__(filters=filters, kernel_size=kernel_size,
use_bias=False, *args, **kwargs)
def call(self, inputs):
shape = (-1, self.i_shape[0], self.i_shape[1], self.i_shape[-1], self.n_terms)
out = super().call(inputs)
expansion = tf.reshape(out, shape)
out = tf.math.reduce_sum(expansion, axis=-1)
return out, expansion
inputs = Input(shape=(32, 32, 3))
# expansion: might be a taylor expansion or something better.
out, expansion = Expansion2D(i_shape=(32, 32, 3), n_terms=3)(inputs)
model = Model(inputs, [out, expansion])
opt = tf.keras.optimizers.Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999)
loss = expansion_approx_of(K.sin)
model.compile(optimizer=opt, loss=[loss])
model.summary()
model.fit(x_train, x_train, batch_size=1563, epochs=100)
x_pred, x_exp = model.predict_on_batch(x_test[:32])
print((x_exp[0].sum(axis=-1) == x_pred[0]).all())
err = abs(x_pred - np.sin(x_test[0])).mean()
print(err)
Put three expansion terms into a tensor at axis=1
x = tf.ones([8, 32, 32, 3], tf.float32) * 0.5 # example batchsize=8, imageshape=[32, 32, 3]
x = tf.stack([x, - (1/6) * tf.math.pow(x, 3), (1/120) * tf.math.pow(x, 5)], axis=1) # expansion of three terms of sin(x), [8, 3, 32, 32, 3]
If you would go with tf.keras Functional API or Sequential API, you might make a Keras custom layer
tf.math.pow
tf.stack
Edit: In the first answer, I recommended tf.keras.layers.Lambda, but it might not work with tf.math.pow or tf.stack (I haven't tried). You would go with Keras custom layer.
I think you can do this for 1D tensor as:
def expend_func(x):
p1 = x
p2 = x - ((x**2)/2)
new_x = K.concatenate([p1, p2], axis=1)
return new_x
note that x is your 1D tensor, new_x with two terms. If you need new_x with three terms, you might modify expend_funcs with three terms. for 2D tensor, you should use tf.stack() which is not the elegant way but that might help.
I saw this question: Implementing custom loss function in keras with condition And I need to do the same thing but with code that seems to need loops.
I have a custom numpy function which calculates the mean Euclid distance from the mean vector. I wrote this based on the paper https://arxiv.org/pdf/1801.05365.pdf:
import numpy as np
def mean_euclid_distance_from_mean_vector(n_vectors):
dists = []
for (i, v) in enumerate(n_vectors):
n_vectors_rest = n_vectors[np.arange(len(n_vectors)) != i]
print("rest of vectors: ")
print(n_vectors_rest)
# calculate mean vector
mean_rest = n_vectors_rest.mean(axis=0)
print("mean rest vector")
print(mean_rest)
dist = v - mean_rest
print("dist vector")
print(dist)
dists.append(dist)
# dists is now a matrix of distance vectors (distance from the mean vector)
dists = np.array(dists)
print("distance vector matrix")
print(dists)
# here we matmult each vector
# sum them up
# and divide by the total number of elements
result = np.sum([np.matmul(d, d) for d in dists]) / dists.size
return result
features = np.array([
[1,2,3,4],
[4,3,2,1]
])
c = mean_euclid_distance_from_mean_vector(features)
print(c)
I need this function however to work inside tensorflow with Keras. So a custom lambda https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
However, I'm not sure how to implement the above in Keras/Tensorflow since it has loops, and the way the paper talked about calculating the m_i seems to require loops like the way I implemented the above.
For reference, the PyTorch version of this code is here: https://github.com/PramuPerera/DeepOneClass
Given a feature map like:
features = np.array([
[1, 2, 3, 4],
[2, 4, 4, 3],
[3, 2, 1, 4],
], dtype=np.float64)
reflecting a batch_size of
batch_size = features.shape[0]
and
k = features.shape[1]
One has that implementing the above Formulas in Tensorflow could be expressed (prototyped) by:
dim = (batch_size, features.shape[1])
def zero(i):
arr = np.ones(dim)
arr[i] = 0
return arr
mapper = [zero(i) for i in range(batch_size)]
elems = (features, mapper)
m = (1 / (batch_size - 1)) * tf.map_fn(lambda x: tf.math.reduce_sum(x[0] * x[1], axis=0), elems, dtype=tf.float64)
pairs = tf.map_fn(lambda x: tf.concat(x, axis=0) , tf.stack([features, m], 1), dtype=tf.float64)
compactness_loss = (1 / (batch_size * k)) * tf.map_fn(lambda x: tf.math.reduce_euclidean_norm(x), pairs, dtype=tf.float64)
with tf.Session() as sess:
print("loss value output is: ", compactness_loss.eval())
Which yields:
loss value output is: [0.64549722 0.79056942 0.64549722]
However a single measure is required for the batch, therefore it is necessary to reduce it; by the summation of all values.
The wanted Compactness Loss function à la Tensorflow is:
def compactness_loss(actual, features):
features = Flatten()(features)
k = 7 * 7 * 512
dim = (batch_size, k)
def zero(i):
z = tf.zeros((1, dim[1]), dtype=tf.dtypes.float32)
o = tf.ones((1, dim[1]), dtype=tf.dtypes.float32)
arr = []
for k in range(dim[0]):
arr.append(o if k != i else z)
res = tf.concat(arr, axis=0)
return res
masks = [zero(i) for i in range(batch_size)]
m = (1 / (batch_size - 1)) * tf.map_fn(
# row-wise summation
lambda mask: tf.math.reduce_sum(features * mask, axis=0),
masks,
dtype=tf.float32,
)
dists = features - m
sqrd_dists = tf.pow(dists, 2)
red_dists = tf.math.reduce_sum(sqrd_dists, axis=1)
compact_loss = (1 / (batch_size * k)) * tf.math.reduce_sum(red_dists)
return compact_loss
Of course the Flatten() could be moved back into the model for convenience and the k could be derived directly from the feature map; this answers your question. You may just have some trouble finding out the the expected values for the model are - feature maps from the VGG16 (or any other architechture) trained against the imagenet for instance?
The paper says:
In our formulation (shown in Figure 2 (e)), starting froma pre-trained deep model, we freeze initial features (gs) and learn (gl) and (hc). Based on the output of the classification sub-network (hc), two losses compactness loss and descriptiveness loss are evaluated. These two losses, introduced in the subsequent sections, are used to assess the quality of the learned deep feature. We use the provided one-class dataset to calculate the compactness loss. An external multi-class reference dataset is used to evaluate the descriptiveness loss.As shown in Figure 3, weights of gl and hc are learned in the proposed method through back-propagation from the composite loss. Once training is converged, system shown in setup in Figure 2(d) is used to perform classification where the resulting model is used as the pre-trained model.
then looking at the "Framework" backbone here plus:
AlexNet Binary and VGG16 Binary (Baseline). A binary CNN is trained by having ImageNet samples and one-class image samples as the two classes using AlexNet andVGG16 architectures, respectively. Testing is performed using k-nearest neighbor, One-class SVM [43], Isolation Forest [3]and Gaussian Mixture Model [3] classifiers.
Makes me wonder whether it would not be reasonable to add suggested the dense layers to both the Secondary and the Reference Networks to a single class output (Sigmoid) or even and binary class output (using Softmax) and using the mean_squared_error as the so called Compactness Loss and binary_cross_entropy as the Descriptveness Loss.
I am trying to feed the pixel vector to the convolutional neural network (CNN), where the pixel vector came from image data like cifar-10 dataset. Before feeding the pixel vector to CNN, I need to expand the pixel vector with maclaurin series. The point is, I figured out how to expand tensor with one dim, but not able to get it right for tensor with dim >2. Can anyone one give me ideas of how to apply maclaurin series of one dim tensor to tensor dim more than 1? is there any heuristics approach to implement this either in TensorFlow or Keras? any possible thought?
maclaurin series on CNN:
I figured out way of expanding tensor with 1 dim using maclaurin series. Here is how to scratch implementation looks like:
def cnn_taylor(input_dim, approx_order=2):
x = Input((input_dim,))
def pwr(x, approx_order):
x = x[..., None]
x = tf.tile(x, multiples=[1, 1, approx_order + 1])
pw = tf.range(0, approx_order + 1, dtype=tf.float32)
x_p = tf.pow(x, pw)
x_p = x_p[..., None]
return x_p
x_p = Lambda(lambda x: pwr(x, approx_order))(x)
h = Dense(1, use_bias=False)(x_p)
def cumu_sum(h):
h = tf.squeeze(h, axis=-1)
s = tf.cumsum(h, axis=-1)
s = s[..., None]
return s
S = Lambda(cumu_sum)(h)
so above implementation is sketch coding attempt on how to expand CNN with Taylor expansion by using 1 dim tensor. I am wondering how to do same thing to tensor with multi dim array (i.e, dim=3).
If I want to expand CNN with an approximation order of 2 with Taylor expansion where input is a pixel vector from RGB image, how am I going to accomplish this easily in TensorFlow? any thought? Thanks
If I understand correctly, each x in the provided computational graph is just a scalar (one channel of a pixel). In this case, in order to apply the transformation to each pixel, you could:
Flatten the 4D (b, h, w, c) input coming from the convolutional layer into a tensor of shape (b, h*w*c).
Apply the transformation to the resulting tensor.
Undo the reshaping to get a 4D tensor of shape (b, h, w, c)` back for which the "Taylor expansion" has been applied element-wise.
This could be achieved as follows:
shape_cnn = h.shape # Shape=(bs, h, w, c)
flat_dim = h.shape[1] * h.shape[2] * h.shape[3]
h = tf.reshape(h, (-1, flat_dim))
taylor_model = taylor_expansion_network(input_dim=flat_dim, max_pow=approx_order)
h = taylor_model(h)
h = tf.reshape(h, (-1, shape_cnn[1], shape_cnn[2], shape_cnn[3]))
NOTE: I am borrowing the function taylor_expansion_network from this answer.
UPDATE: I still don't clearly understand the end goal, but perhaps this update brings us closer to the desired output. I modified the taylor_expansion_network to apply the first part of the pipeline to RGB images of shape (width, height, nb_channels=3), returning a tensor of shape (width, height, nb_channels=3, max_pow+1):
def taylor_expansion_network_2(width, height, nb_channels=3, max_pow=2):
input_dim = width * height * nb_channels
x = Input((width, height, nb_channels,))
h = tf.reshape(x, (-1, input_dim))
# Raise input x_i to power p_i for each i in [0, max_pow].
def raise_power(x, max_pow):
x_ = x[..., None] # Shape=(batch_size, input_dim, 1)
x_ = tf.tile(x_, multiples=[1, 1, max_pow + 1]) # Shape=(batch_size, input_dim, max_pow+1)
pows = tf.range(0, max_pow + 1, dtype=tf.float32) # Shape=(max_pow+1,)
x_p = tf.pow(x_, pows) # Shape=(batch_size, input_dim, max_pow+1)
return x_p
h = raise_power(h, max_pow)
# Compute s_i for each i in [0, max_pow]
h = tf.cumsum(h, axis=-1) # Shape=(batch_size, input_dim, max_pow+1)
# Get the input format back
h = tf.reshape(h, (-1, width, height, nb_channels, max_pow+1)) # Shape=(batch_size, w, h, nb_channels, max_pow+1)
# Return Taylor expansion model
model = Model(inputs=x, outputs=h)
model.summary()
return model
In this modified model, the last step of the pipeline, namely the sum of w_i * s_i for each i, is not applied. Now, you can use the resulting tensor of shape (width, height, nb_channels=3, max_pow+1) in any way you want.
I am writing a custom loss function for semi supervised learning on cifar-10 dataset, for which I need to duplicate columns of my tensor for creating a sort of mask which I then multiply with the activation values to later sum over.
My loss function is a sum of entropy and cross entropy for unlabelled and labeled samples. I add an extra class and set it to 1 for unlabelled samples.
I then create a mask for identifying row indices of unlabelled samples from the y_true tensor. From that I should get a (n_samples, 1) tensor which I need to repeat/duplicate/copy to a (n_samples, 11) tensor that I can multiply with the activation values in y_pred
Loss function code:
a = np.ones((mini_batch_size, 1)) * 10
a_var = K.variable(value=a)
v = K.cast(K.equal(K.cast(K.argmax(y_true, axis=1), 'float32'), a_var), 'float32')
e_loss = K.sum(K.concatenate([v,v,v,v,v,v,v,v,v,v,v], axis=-1) * K.log(y_pred) * y_pred)
m_u = K.sum(K.cast(K.equal(K.cast(K.argmax(y_true, axis=1), 'float32'), a_var), 'float32'))
b = np.ones((mini_batch_size, 1)) * 10
b_var = K.variable(value=b)
v2 = K.cast(K.not_equal(K.cast(K.argmax(y_true, axis=1), 'float32'), b_var), 'float32')
ce_loss = K.sum(K.concatenate([v2, v2, v2, v2, v2, v2, v2, v2, v2, v2, v2], axis=1) * K.log(y_pred))
m_l = K.variable(value=float(mini_batch_size), dtype='float32') #- m_u
return -((e_loss/m_u) + (ce_loss/m_l))
The error I get is:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [40,11] vs. [40,440]
[[{{node loss_36/dense_74_loss/mul_2}}]]
[[metrics_28/acc/Mean/_2627]]
(1) Invalid argument: Incompatible shapes: [40,11] vs. [40,440]
[[{{node loss_36/dense_74_loss/mul_2}}]]
0 successful operations.
0 derived errors ignored.
My batch size is 40.
I need my concatenated tensor to be of size [40, 11] not [40, 440]
I don't have real data to test whether the loss properly works, but this got rid of that InvalidArgumentError and did work with model.fit() for a dense model.
Few changes I did,
You don't have to repeat your v 11 times to multiply that with y_pred. All you need is reshape it to (-1,1) - (Will save you memory)
Got rid of all the K.variables. Now this is something I want to check with you, you are not trying to optimize a_var and b_var right (i.e. that's not a part of the model)? (Apparently, that's what's causing the issue. I need to dive deeper to see why). It seems the whole point of a_var and b_var is to perform boolean logics equal and not_equal, which works just fine with the constant.
Made m_l a K.constant
def loss_fn(y_true, y_pred):
v = K.cast(K.equal(K.cast(K.argmax(y_true, axis=-1), 'float32'), 10), 'float32')
e_loss = K.sum(K.reshape(v, (-1,1)) * K.log(y_pred) * y_pred)
m_u = K.sum(K.cast(K.equal(K.cast(K.argmax(y_true, axis=-1), 'float32'), 10), 'float32'))
v2 = K.cast(K.not_equal(K.cast(K.argmax(y_true, axis=-1), 'float32'), 10), 'float32')
ce_loss = K.sum(K.reshape(v2, (-1,1)) * K.log(y_pred))
m_l = K.constant(value=float(mini_batch_size), dtype='float32') #- m_u
return -((e_loss/m_u) + (ce_loss/m_l))
Note: Depending on the batch size within the loss function is a bad idea. Try to get rid of any batch_size dependent operations (especially for shape of tensors). You can see that I only have kept mini_batch_size to set m_l. But I would suggest setting this to some constant instead of min_batch_size. Because, if a batch with <40 comes through, you are using a different loss function for that batch. And your results aren't comparable between different batch sizes, as your loss function changes.
I'm trying to use Theano to do some recognition. All my images are different sizes, and I don't want to resize them because they're paintings so they shouldn't be the same size. I was wondering how to pass in a matrix of images of variable image size lengths into the Theano function.
I'm under the impression that this not possible with numpy. Is there an alternative?
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.01))
def model(X, w):
return T.nnet.softmax(T.dot(X, w))
X = T.fmatrix()
Y = T.fmatrix()
w = init_weights((784, 10))
py_x = model(X, w)
y_pred = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
update = [[w, w - gradient * 0.05]]
train = theano.function(inputs=[X, Y], outputs=cost, updates=update, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_pred, allow_input_downcast=True)
Unless I'm mistaken in my interpretation of your code, I don't think what you're trying to do makes sense.
If I understand correctly, in model() you are computing a weighted sum over your image pixels using dot(X, w), where I assume that X is an (nimages, npixels) array of image data, and w is a weight matrix with fixed dimensions (784, 10).
In order for that dot product to even be computable, X.shape[1] (the number of pixels in each of your input images) must be equal to w.shape[0].
If the sizes of your input images vary, how can you expect to learn a single weight matrix with fixed dimensions?