Applying softmax to non-zero elements in the matrix across a dimension - python

Perhaps this is trivial, but perhaps it is not. I have spent way too much time trying to figure out how to make this work. Here is the code:
# batch x time x events
batch = 2
time = 3
events = 4
tensor = np.random.rand(batch, time, events)
tensor[0][0][2] = 0
tensor[0][0][3] = 0
tensor[0][1][3] = 0
tensor[0][2][1] = 0
tensor[0][2][2] = 0
tensor[0][2][3] = 0
tensor[1][0][3] = 0
non_zero = ~tf.equal(tensor, 0.)
s = tf.Session()
g = tf.global_variables_initializer()
s.run(g)
s.run(non_zero)
I am trying to apply tf.nn.softmax to the non-zero values across each of the time dimensions. However, when I am using tf.boolean_mask then it actually gathers all of the non-zero values together. That is not what I want. I want to preserve the dimensions.
Here is the screenshot of what the tensor looks like:
So tf.nn.softmax should be applied to only those groups and it should "put them back" into their original positions. Does anyone know how to do this?
EDIT:
I almost found a solution that I need, with your help guys. But I am still missing one step. Assigning the softmax across each time dimension to the non-zero values:
def apply_sparse_softmax(time_vector):
non_zeros = ~tf.equal(time_vector, 0.)
sparse_softmax = tf.nn.softmax(tf.boolean_mask(time_vector, non_zeros))
new_time_vector = sparse_softmax * tf.cast(non_zeros, tf.float64) # won't work because dimensions are different
return time_vector
Please also note that this solution should handle the cases when you have zeros all across the time dimension. Then it should just stay the same.

possible duplicate: Applying tf.nn.softmax() only to positive elements of a tensor
With the help of tf.map_fn and tf.where
session.run(tf.map_fn(
lambda x : tf.where(x > 0, tf.nn.softmax(x,axis=2,name="pidgeon"), x), tensor))
Tested for np.random.seed(1992)
# tensor
[[[0.86018176 0.42148685 0. 0. ]
[0.64714 0.68271286 0.6449022 0. ]
[0.92037941 0. 0. 0. ]]
[[0.38479139 0.26825327 0.43027759 0. ]
[0.56077674 0.49309016 0.2433904 0.85396874]
[0.1267429 0.1861004 0.92251748 0.67904445]]]
# result
[[[0.34841156, 0.33845624, 0. , 0. ],
[0.28155918, 0.43949257, 0.48794109, 0. ],
[0.37002926, 0. , 0. , 0. ]],
[[0.33727059, 0.31513436, 0.2885575 , 0. ],
[0.40216839, 0.39458556, 0.23936921, 0.44145382],
[0.26056102, 0.29028008, 0.47207329, 0.37060957]]])
0.34841156 == np.exp(0.86018176) / (np.exp(0.86018176) + np.exp(0.64714) + np.exp(0.92037941))

This is my approach using numpy and tensorflow:
> tensor
array([[[0.2891092 , 0.76259227, 0. , 0. ],
[0.93660715, 0.18361367, 0.07234135, 0. ],
[0.23128076, 0. , 0. , 0. ]],
[[0.45708066, 0.76883403, 0.7584804 , 0. ],
[0.51019332, 0.73361557, 0.87442305, 0.66796383],
[0.9297317 , 0.22428208, 0.69184613, 0.06162719]]])
Find mask of non-zero elemets
non_zero = ~tf.equal(tensor, 0.)
# convert to numpy
with tf.Session() as sess:
non_zero_mask = non_zero.eval()
Retrieve the non-zero values
non_zero_val = tensor[non_zero_mask]
> non_zero_val
array([0.2891092 , 0.76259227, 0.93660715, 0.18361367, 0.07234135,
0.23128076, 0.45708066, 0.76883403, 0.7584804 , 0.51019332,
0.73361557, 0.87442305, 0.66796383, 0.9297317 , 0.22428208,
0.69184613, 0.06162719])
Apply softmax on non-zero values
# apply softmax
soft_max = tf.nn.softmax(non_zero_val)
# convert to numpy
with tf.Session() as sess:
soft_max_np = soft_max.eval()
> soft_max_np
array([0.04394964, 0.07056453, 0.08397696, 0.03954934, 0.0353846 ,
0.04148019, 0.05198816, 0.07100635, 0.07027497, 0.05482403,
0.06854914, 0.07891397, 0.06419332, 0.08340156, 0.0411909 ,
0.06574485, 0.0350075 ])
Update tensor with softmax applied to non-zero elements
tensor[non_zero_mask] = soft_max_np
tensor
array([[[0.04394964, 0.07056453, 0. , 0. ],
[0.08397696, 0.03954934, 0.0353846 , 0. ],
[0.04148019, 0. , 0. , 0. ]],
[[0.05198816, 0.07100635, 0.07027497, 0. ],
[0.05482403, 0.06854914, 0.07891397, 0.06419332],
[0.08340156, 0.0411909 , 0.06574485, 0.0350075 ]]])

OK, I figured out a solution from tenticon's duplicate link and his answer. Although this fails when the whole time vector is zeros. So I still need to fix that. Happy to hear your suggestions. But here is the solution:
def sparse_softmax(T):
# Creating partition based on condition:
condition_mask = tf.cast(tf.greater(T, 0.), tf.int32)
partitioned_T = tf.dynamic_partition(T, condition_mask, 2)
# Applying the operation to the target partition:
partitioned_T[1] = tf.nn.softmax(partitioned_T[1])
# Stitching back together, flattening T and its indices to make things easier::
condition_indices = tf.dynamic_partition(tf.range(tf.size(T)), tf.reshape(condition_mask, [-1]), 2)
res_T = tf.dynamic_stitch(condition_indices, partitioned_T)
res_T = tf.reshape(res_T, tf.shape(T))
return res_T
my_softmax = tf.map_fn(lambda batch:
tf.map_fn(lambda time_vector: sparse_softmax(time_vector), batch, dtype=tf.float64)
, tensor, dtype=tf.float64)
Another solution I came up with that still suffers when the whole vector is zeros:
def softmax(tensor):
# tensor_ = tf.placeholder(dtype=tf.float64, shape=(4,))
non_zeros = ~tf.equal(tensor, 0.)
sparse_softmax = tf.nn.softmax(tf.boolean_mask(tensor, non_zeros))
sparse_softmax_shape = tf.shape(sparse_softmax)[0]
orig_shape = tf.shape(tensor)[0]
shape_ = orig_shape-sparse_softmax_shape
zeros = tf.zeros(shape=shape_, dtype=tf.float64)
new_vec = tf.concat([sparse_softmax, zeros], axis=0)
return new_vec
but this does not work.... i.e. this is supposed to return zeros vector when the vector is all zeros, instead I get reshape error for some sort of an empty tensor..
def softmax_(tensor):
zeros = tf.cast(tf.equal(tensor, 0.), tf.float64)
cond_ = tf.reduce_sum(zeros)
def true_fn():
non_zeros = ~tf.equal(tensor, 0.)
sparse_softmax = tf.nn.softmax(tf.boolean_mask(tensor, non_zeros))
sparse_softmax_shape = tf.shape(sparse_softmax)[0]
orig_shape = tf.shape(tensor)[0]
shape_ = orig_shape-sparse_softmax_shape
zeros = tf.zeros(shape=shape_, dtype=tf.float64)
new_vec = tf.concat([sparse_softmax, zeros], axis=0)
return new_vec
def false_fn():
return tf.zeros(shape=tf.shape(tensor), dtype=tf.float64)
return tf.cond(tf.equal(cond_, tf.cast(tf.shape(tensor)[0], tf.float64)), false_fn, true_fn)
Still can't make it work for the vector of all zeros. Would be glad to hear about your solutions.
EDIT: actually the last code snippet works exactly how I want.

Related

How to generate 2d gaussian kernel using 2d convolution in python?

From my workout instruction:
A 2D Gaussian can be formed by convolution of a 1D Gaussian with its transpose.
Here is my 1d gaussian function:
def gauss1d(sigma, filter_length=11):
# INPUTS
# # sigma : sigma of gaussian distribution
# # filter_length : integer denoting the filter length
# OUTPUTS
# # gauss_filter : 1D gaussian filter without normalization
rng = range(-int(filter_length/2),int(filter_length/2)+1)
gauss_filter = [np.exp((-x**2) / (2*sigma**2)) for x in rng]
# The formula used above has been given in the instruction.
return np.array(gauss_filter)
And 2d convolution function which performs a 2D convolution between image and filt, image being a 2D image.
def myconv2(image, filt):
# INPUTS
# # image : 2D image, as numpy array of size mxn
# # filt : 1D or 2D filter of size kxl
# OUTPUTS
# img_filtered : 2D filtered image, of size (m+k-1)x(n+l-1)
m, n = image.shape
k, l = filt.shape
offsety = k // 2
offsetx = l // 2
img_filtered = np.zeros((m+k-1, n+l-1), "double")
image = np.pad(image, ((offsety,offsety),(offsetx, offsetx)), mode='constant')
for i in range(offsety, m+offsety):
for j in range(offsetx, n+offsetx):
box_vals = image[ i - offsety : i + offsety+1, j-offsetx: j+offsetx+1]
new_val = np.sum( filt * box_vals)
img_filtered[i][j] = np.sum(new_val)
return img_filtered
A simple presentation of how function works for 5x5 input image and 3x3 filter kernel:
With having following 1d gaussian and its transpose, I call myconv2 function :
sigma = 3
filter_length = 5
gauss = gauss1d(sigma, filter_length).reshape(1,filter_length)
guass
array([[0.18073067, 0.20897821, 0.22058223, 0.20897821, 0.18073067]])
gauss_t = np.transpose(gauss)
gauss_t
array([[0.18073067],
[0.20897821],
[0.22058223],
[0.20897821],
[0.18073067]])
myconv2(gauss, guass_t)
array([[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0.03986597, 0.04609688, 0.04865652, 0.04609688, 0.03986597],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]])
As you can see its not actually a 2d gaussian kernel and some values are missing.
I don't know what I am missing and what should I consider in my code to reach the goal.
Thanks.
You could just do a matrix multiplication. The convolution should also work, just beware of the padding.
gaus2d = gauss.T # gauss
Your conv2d implementation does not seem to be right. I suggest you to implement a 'valid' convolution (or cross correlation):
simple_valid_cross_correlation(img, filt):
ih, iw = img.shape
fh, fw = filt.shape
result = np.zeros((ih - fh + 1, iw - fw + 1))
for i in range(result.shape[0]):
for j in range(result.shape[1]):
result[i, j] = np.sum(filt * img[i:i+fh, j:j+fw])
return result
gauss_pad = np.pad(gauss.T, ((0, 0), (gauss.shape[1]-1, gauss.shape[1]-1)))
gauss2d = simple_valid_cross_correlation(gauss_pad, gauss)
There is also scipy.signal.convolve2d if you don't want to implement your own conv. I think it may be faster

Using slicing of a tensor with other tensors to assign value

I have a tensor nextq which is a probability distribution over some question set. I have potential answer to each question in synthetic_answers which is either 0 or 1. I want to update a tensor called cur_qinput by
Finding the index with the largest value in nextq for each vector in batch
If synthetic_answers at that index is 1, then set the 3rd feature of cur_qinput at that index to 1, otherwise set the 2nd feature.
Here is some non-functional code, it is non functional at the for-loop as I don't know the properly slice tensors with other tensors/assign values, I just tried to write it in python syntax for clarity of my intent.
#nextq shape = batch_size x q_size
#nextq_index shape = batch_size
nextq_index = tf.argmax(nextq,axis=1)
#synthetic_answers shape = batch_size x q_size
#cur_qinput shape = batch_size x q_size x 3
#"iterate over batch", doesn't actually work and I guess needs to be done entirely differently
for k in tf.range(tf.shape(nextq_index)[0]):
cur_qinput[k,nextq_index[k],1+synthetic_answers[k,nextq_index[k]]]=1
Let me assume that your data are as follows since there are no examples in the question.
import tensorflow as tf
nextq = tf.constant([[1,5,4],[6,8,10]],dtype=tf.float32)
synthetic_answers = tf.constant([[0,1,1],[1,1,0]],dtype=tf.int32)
cur_qinput = tf.random_normal(shape=(tf.shape(nextq)[0],tf.shape(nextq)[1],3))
First you can use tf.one_hot to build mask to describe whether synthetic_answers at that index is equal to 1.
nextq_index = tf.argmax(nextq,axis=1)
# [1 2]
nextq_index_hot = tf.one_hot(nextq_index,depth=nextq.shape[1],dtype=tf.int32)
# [[0 1 0]
# [0 0 1]]
mask = tf.logical_and(tf.equal(nextq_index_hot,synthetic_answers),tf.equal(nextq_index_hot,1))
# [[False True False]
# [False False False]]
Then you expand mask to the same shape as cur_qinput.
mask = tf.one_hot(tf.cast(mask,dtype=tf.int32)+1,depth=3)
# [[[0. 1. 0.]
# [0. 0. 1.]
# [0. 1. 0.]]
#
# [[0. 1. 0.]
# [0. 1. 0.]
# [0. 1. 0.]]]
Finally you can tf.where to assign 1 to cur_qinput.
scatter = tf.where(tf.equal(mask,1),tf.ones_like(cur_qinput),cur_qinput)
with tf.Session() as sess:
cur_qinput_val,scatter_val = sess.run([cur_qinput,scatter])
print(cur_qinput_val)
print(scatter_val)
[[[ 1.3651905 -0.96688586 0.74061954]
[-1.1236337 -0.6730857 -0.8439895 ]
[-0.52024084 1.1968751 0.79242617]]
[[ 1.4969068 -0.12403865 0.06582119]
[ 0.79385823 -0.7952771 -0.8562217 ]
[-0.05428046 1.4613343 0.2726114 ]]]
[[[ 1.3651905 1. 0.74061954]
[-1.1236337 -0.6730857 1. ]
[-0.52024084 1. 0.79242617]]
[[ 1.4969068 1. 0.06582119]
[ 0.79385823 1. -0.8562217 ]
[-0.05428046 1. 0.2726114 ]]]

How can I implement locally connected layer in pure Numpy

I would like to build a locally connected weight matrix that represents a locally connected neural network in pure python/numpy without deep learning frameworks like Torch or TensorFlow.
The weight matrix is a non-square 2D matrix with the dimension (number_input, number_output). (an autoencoder in my case; input>hidden)
So the function I would like to build, take the matrix dimension and the size of the receptive field (number of local connection) and give the associated weight matrix. I've already create a function like this, but for an input size of 8 and an output size of 4 (and RF = 4) my function output :
[[ 0.91822845 0. 0. 0. ]
[-0.24264655 -0.54754138 0. 0. ]
[ 0.55617366 0.12832513 -0.28733965 0. ]
[ 0.27993286 -0.33150324 0.06994107 0.61184121]
[ 0. 0.04286912 -0.20974503 -0.37633903]
[ 0. 0. -0.10386762 0.33553009]
[ 0. 0. 0. 0.09562682]
[ 0. 0. 0. 0. ]]
but I would like :
[[ 0.91822845 0. 0. 0. ]
[-0.24264655 -0.54754138 0. 0. ]
[ 0.55617366 0.12832513 0. 0. ]
[ 0 -0.33150324 0.06994107 0 ]
[ 0. 0.04286912 -0.20974503 0. ]
[ 0. 0. -0.10386762 0.33553009]
[ 0. 0. 0.11581854 0.09562682]
[ 0. 0. 0. 0.03448418]]
Here's my python code :
import numpy as np
def local_weight(input_size, output_size, RF):
input_range = 1.0 / input_size ** (1/2)
w = np.zeros((input_size, output_size))
for i in range(0, RF):
for j in range(0, output_size):
w[j+i, j] = np.random.normal(loc=0, scale=input_range, size=1)
return w
print(local_weight(8, 4, 4))
I look forward for your response!
The trick is in a small pad to work more comfortably (or control the limits).
Then you must define the step you will take with respect to the input (it is not more than the input / output). Once this is done you just have to fill in the gaps and then remove the pad.
import math
import numpy as np
def local_weight(input_size, output_size, RF):
input_range = 1.0 / input_size ** (1/2)
padding = ((RF - 1) // 2)
w = np.zeros(shape=(input_size + 2*padding, output_size))
step = float(w.shape[0] - RF) / (output_size - 1)
for i in range(output_size):
j = int(math.ceil(i * step))
j_next = j + RF
w[j:j_next, i] = np.random.normal(loc=0, scale=input_range, size=(j_next - j))
return w[padding:-padding, :]
I hope that is what you are looking for.
EDIT:
I think the implementation was misguided. I reimplement the function, we go by parts.
I calculate the radius of the receptive field (padding).
Determine the size of the W.
I calculate the step by removing the padding area so that I always stay inside.
I calculate the weights.
Remove the padding.

How can I include categorical distributions as observed data in PyMC3?

I have a dataset where each observation is basically a series of noisy measurements and one of the measurements contains signal.
Raw observed data y:
[[ 1.93542253e-01 1.39657327e-04 7.53918636e-01 5.23994535e-02]
[ 6.44964587e-02 8.50087384e-01 1.09894665e-02 7.44266910e-02]
[ 1.68387463e-02 5.38121456e-01 6.98554551e-02 3.75184342e-01]
...,
[ 5.79786789e-01 1.47417427e-02 3.15395731e-01 9.00757372e-02]
[ 8.66796124e-02 8.66999904e-02 4.47848127e-02 7.81835584e-01]
[ 8.18765043e-01 3.23448859e-03 5.61247840e-04 1.77439220e-01]]
I want to put each observation into an appropriate cluster based on this measurement data. For example the first datapoint above is drawn from the third column and the second datapoint above is drawn from the second column. If I sample from the known original distribution and provide those samples to the model as inputs to the Categorical I can get back the original distribution.
Sampled from original data y_choice:
[ 2. 3. 3. 1. 2. 2. 2. 2. 3. 3. 1. 2. 3. 0. 2. 0. 3. 1.
3. 0. 2. 0. 3. 0. 2. 0. 1. 0. 3. 0. 2. 0. 0. 0. 3. 0.
2. 0. 0. 3. 3. 1. ...
However this seems like I'm losing information because my choice sampler is outside the PyMC model. How can I supply the actual observed data y directly into the model? I'm guessing it has something to do with another model parameter based on the Dirichlet, but I haven't been able to wrap my head around how that works.
The sample code I'm operating from is below. I want to be able to supply y to the model and get the true_probs back out, but I've only managed to get it to work with y_choice so far.
import numpy as np
from pymc3 import *
import pymc3 as pm
import pandas as pd
print 'pymc3 version: ' + pm.__version__
def generate_noisy_observations():
y = np.ones((sample_size,k))
for i in range(sample_size):
#print("Iteration %d" % i)
true_category = np.random.choice(k, size=1, p=true_probs)
true_distribution = np.zeros(k)
true_distribution[true_category] = 1
noise_distribution = np.random.dirichlet(np.ones(k))
noise = np.random.normal(0,1,k)
distribution_weights = [0.9, 0.1]
raw_distribution = (true_distribution*distribution_weights[0] + noise**2*distribution_weights[1] )/\ (np.sum(true_distribution*distribution_weights[0])+np.sum(noise**2*distribution_weights[1]))
y[i] = raw_distribution
return y
def generate_choices_from_noisy_observations(y):
y_choice = np.ones(sample_size)
for i in range(sample_size):
y_choice[i] = np.random.choice(k, size=1, p=y[i])
return y_choice
sample_size = 1000
true_probs = [0.2, 0.1, 0.3, 0.4]
k = len(true_probs)
y = generate_noisy_observations()
y_choice = generate_choices_from_noisy_observations(y)
with pm.Model() as multinom_test:
probs = pm.Dirichlet('a', a=np.ones(k))
#data = Categorical('data',p = probs, observed = y)
data = Categorical('data',p = probs, observed = y_choice)
start = pm.find_MAP()
trace = pm.sample(50000, pm.Metropolis())
pm.traceplot(trace[500:])

Does this function compute convolution correctly?

I need to write a basic function that computes a 2D convolution between a matrix and a kernel.
I have recently got into Python, so I'm sorry for my mistakes.
My dissertation teacher said that I should write one by myself so I can handle it better and to be able to modify it for future improvements.
I have found an example of this function on a website, but I don't understand how the returned values are obtained.
This is the code (from http://docs.cython.org/src/tutorial/numpy.html )
from __future__ import division
import numpy as np
def naive_convolve(f, g):
# f is an image and is indexed by (v, w)
# g is a filter kernel and is indexed by (s, t),
# it needs odd dimensions
# h is the output image and is indexed by (x, y),
# it is not cropped
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
# smid and tmid are number of pixels between the center pixel
# and the edge, ie for a 5x5 filter they will be 2.
#
# The output size is calculated by adding smid, tmid to each
# side of the dimensions of the input image.
vmax = f.shape[0]
wmax = f.shape[1]
smax = g.shape[0]
tmax = g.shape[1]
smid = smax // 2
tmid = tmax // 2
xmax = vmax + 2*smid
ymax = wmax + 2*tmid
# Allocate result image.
h = np.zeros([xmax, ymax], dtype=f.dtype)
# Do convolution
for x in range(xmax):
for y in range(ymax):
# Calculate pixel value for h at (x,y). Sum one component
# for each pixel (s, t) of the filter g.
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
I don't know if this function does the weighted sum from input and filter, because I see no sum here.
I applied this with
kernel = np.array([(1, 1, -1), (1, 0, -1), (1, -1, -1)])
file = np.ones((5,5))
naive_convolve(file, kernel)
I got this matrix:
[[ 1. 2. 1. 1. 1. 0. -1.]
[ 2. 3. 1. 1. 1. -1. -2.]
[ 3. 3. 0. 0. 0. -3. -3.]
[ 3. 3. 0. 0. 0. -3. -3.]
[ 3. 3. 0. 0. 0. -3. -3.]
[ 2. 1. -1. -1. -1. -3. -2.]
[ 1. 0. -1. -1. -1. -2. -1.]]
I tried to do a manual calculation (on paper) for the first full iteration of the function and I got 'h[0,0] = 0', because of the matrix product: 'filter[0, 0] * matrix[0, 0]', but the function returns 1. I am very confused with this.
If anyone can help me understand what is going on here, I would be very grateful. Thanks! :)
Yes, that function computes the convolution correctly. You can check this using scipy.signal.convolve2d
import numpy as np
from scipy.signal import convolve2d
kernel = np.array([(1, 1, -1), (1, 0, -1), (1, -1, -1)])
file = np.ones((5,5))
x = convolve2d(file, kernel)
print x
Which gives:
[[ 1. 2. 1. 1. 1. 0. -1.]
[ 2. 3. 1. 1. 1. -1. -2.]
[ 3. 3. 0. 0. 0. -3. -3.]
[ 3. 3. 0. 0. 0. -3. -3.]
[ 3. 3. 0. 0. 0. -3. -3.]
[ 2. 1. -1. -1. -1. -3. -2.]
[ 1. 0. -1. -1. -1. -2. -1.]]
It's impossible to know how to explain all this to you since I don't know where to start, and I don't know how all the other explanations aren't working for you. I think, though, that you are doing all of this as a learning exercise so you can figure this out for yourself. From what I've seen on SO, asking big questions on SO is not a substitute for working it through yourself.
Your specific question of why does
h[0,0] = 0
in your calculation not match this matrix is a good one. In fact, both are correct. The reason for mismatch is that the output of the convolution doesn't have the mathematical indices specified, but instead they are implied. The center, which is mathematically indicated by the indices [0,0] corresponds to x[3,3] in the matrix above.

Categories