I am working on a customised loss function that uses numpy.digitize() internally. The loss is minimised for a set of parameters that are the bins values used in digitize method. In order to use the tensorflow optimisers, I would like to know if there an equivalent implementation of digitize in tensorflow? if not is there a good way to implement a workaround?
Here a numpy version:
def fom_func(b, n):
np.where((b > 0) & (n > 0), np.sqrt(2*(n*np.log(np.divide(n,b)) + b - n)),0)
def loss(param, X, y):
param = np.sort(np.asarray(param))
nbins = param.shape[0]
score = 0
y_pred = np.digitize(X, param)
for c in np.arange(nbins):
b = np.where((y==0) & (y_pred==c), 1, 0).sum()
n = np.where((y_pred==c), 1, 0).sum()
score += fom_func(b,n)**2
return -np.sqrt(score)
The equivalent of np.digitize method is called bucketize in TensorFlow, quoting from this api doc:
Bucketizes 'input' based on 'boundaries'.
Summary
For example, if the inputs are boundaries = [0, 10, 100] input = [[-5, 10000] [150, 10] [5, 100]]
then the output will be output = [[0, 3] [3, 2] [1, 3]]
Arguments:
scope: A Scope object
input: Any shape of Tensor contains with int or float type.
boundaries: A sorted list of floats gives the boundary of the buckets.
Returns:
Output: Same shape with 'input', each value of input replaced with bucket index.
(numpy) Equivalent to np.digitize.
I'm not sure why but, this method is hidden in TensorFlow (see the hidden_ops.txt file). So I wouldn't count on it even if you can import it by doing:
from tensorflow.python.ops import math_ops
math_ops._bucketize
this has helped me, you only have to pay attention that the affiliation does not happen to the right or to the left but with regard to the spaces in between the bins:
import tensorflow_probability as tfp
tfp.stats.find_bins()
Related
While building some code to train a tensorflow deep model, I am using tensorflow tf.map_fn and tf.py_function as a wrapper to apply a scipy python function as a loss function mapping each 2 rows of a batch of 2 probability vectors p and q of shape [batch_size,num_classes]. When using KL_divergence over this batch of vectors (p,q), the training works fine with this computation and there is no shape incompatibility issue:
tf.reduce_sum(p*(tf.log(p + 1e-16) - tf.log(q + 1e-16)), axis=1) #KL divergence
However, when I tried to use Wasserstein distance or the energy_distance functions from scipy, I get an error dealing with incompatible shapes [] and [5000]. 5000 is here the number of classes (p and q of shape [batch_size, 5000])
import tensorflow as tf
def compute_kld(p_logit, q_logit, divergence_type):
p = tf.nn.softmax(p_logit)
q = tf.nn.softmax(q_logit)
if divergence_type == "KL_divergence":
return tf.reduce_sum(p*(tf.log(p + 1e-16) - tf.log(q + 1e-16)), axis=1)
elif divergence_type == "Wasserstein_distance":
def wasserstein_distance(x,y):
import scipy
from scipy import stats
return stats.wasserstein_distance(x,y)
#tf.function
def func(p,q):
return tf.map_fn(lambda x: tf.py_function(func=wasserstein_distance, inp=[x[0], x[1]], Tout=tf.float32), (p, q), dtype=(tf.float32)) #, parallel_iterations=10)
return func(p, q)
elif divergence_type == "energy_distance": # The Cramer Distancedef energy_distance(x,y):
def energy_distance(x,y):
import scipy
from scipy import stats
return stats.energy_distance(x,y)
#tf.function
def func(p,q):
return tf.map_fn(lambda x: tf.py_function(func=energy_distance, inp=[x[0], x[1]], Tout=tf.float32), (p, q), dtype=(tf.float32)) #, parallel_iterations=10)
return func(p, q)
This is the code to test the loss functions with a batch of 5 and 3 classes, which all work fine individually:
import tensorflow as tf
p = tf.constant([[1, 2, 3], [1, 2, 3], [14, 50, 61], [71, 83, 79], [110,171,12]])
q = tf.constant([[1, 2, 3], [1.2, 2.3, 3.2], [4.2, 5.3, 6.4], [7.5, 8.6, 9.4], [11.2,10.1,13]])
p = tf.reshape(p, [-1,3])
q = tf.reshape(q, [-1,3])
p = tf.cast(p, tf.float32)
q = tf.cast(q, tf.float32)
with tf.Session() as sess:
divergence_type = "KL_divergence"
res = compute_kld(p, q, divergence_type = divergence_type)
divergence_type = "Wasserstein_distance"
res2 = compute_kld(p, q, divergence_type = divergence_type)
divergence_type = "energy_distance"
res3 = compute_kld(p, q, divergence_type = divergence_type)
print("############################## p")
print(sess.run(tf.print(p)))
print("##")
print(sess.run(tf.print(tf.shape(p))))
print("############################## KL_divergence")
print(sess.run(tf.print(res)))
print("##")
print(sess.run(tf.print(tf.shape(res))))
print("############################## Wasserstein_distance")
print(sess.run(tf.print(res2)))
print("##")
print(sess.run(tf.print(tf.shape(res2))))
print("############################## energy_distance")
print(sess.run(tf.print(res3)))
print("##")
print(sess.run(tf.print(tf.shape(res3))))
This is the output:
############################## p
[[1 2 3]
[1 2 3]
[14 50 61]
[71 83 79]
[110 171 12]]
None
##
[5 3]
None
############################## KL_divergence
[0 0.000939823687 0.367009342 1.1647588 3.09911442]
None
##
[5]
None
############################## Wasserstein_distance
[0 0.0126344115 0.204870835 0.237718046 0.120362818]
None
##
[5]
None
############################## energy_distance
[0 0.0917765796 0.41313991 0.438246906 0.316672504]
None
##
[5]
None
However, when using the wasserstein distance or the energy distance inside my training code, I get incompatible shape error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to set a tensor with incompatible shape at a list index. Item element shape: [] list shape: [5000]
[[{{node gradients/TensorArrayV2Read/TensorListGetItem_grad/TensorListSetItem}}]]
I am wondering if the dtype for tf.map_fn or tf.py_function I am using is wrong or if I have to specify/impose shape somewhere ?
Here is a link for the whole code where I tried to replace KL-divergence with Wasserstein distance in method "compute_kld": https://github.com/shenyuanyuan/IMSAT/blob/master/imsat_cluster.py
Thank you in advance for your kind help!
== UPDATE ==
I inspected all the provided batches and the shapes of p and q seem correct
shape(p)
(?, 5000)
shape(q)
(?, 5000)
However, the type of func's returned object is . Thus, I have tried to reshape it with:
return tf.reshape(func(p, q), [p.shape[0]])
However, this doesn't seem to change anything as the error is still the same. After providing the first batch, the code crashes before starting to process the second batch.
Without seeing your training code, what I can help is to fetch the docs and try to shed some light.
map_fn Transforms elems by applying fn to each element unstacked on axis 0.
If elems is a tuple (or nested structure) of tensors, then those tensors must all have the same outer-dimension size (num_elems); and fn is used to transform each tuple (or structure) of corresponding slices from elems. E.g., if elems is a tuple (t1, t2, t3), then fn is used to transform each tuple of slices (t1[i], t2[i], t3[i]) (where 0 <= i < num_elems).
energy_distance Computes the energy distance between two 1D distributions.
wasserstein_distance Computes the first Wasserstein distance between two 1D distributions.
To begin, you hould make sure you are passing only 2D p_logit and q_logit to compute_kld.
I start with the optimization function from scipy.
I tried to create my code by copying the Find optimal vector that minimizes function solution
I have an array that contains series in columns. I need to multiply each of them by a weight so that the sum of last row of these columns multiplied by the weights gives a given number (constraint).
The sum of the series multiplied by the weights gives a new series where I extract the max-draw-down and I want to minimize this mdd.
I wrote my code as best as I can (2 months of Python and 3 hours of scipy) and can't solve the error message on the function used to solve the problem.
Here is my code and any help would be much appreciated:
import numpy as np
from scipy.optimize import fmin_slsqp
# based on: https://stackoverflow.com/questions/41145643/find-optimal-vector-that-minimizes-function
# the number of columns (and so of weights) can vary; it should be generic, regardless the number of columns
def mdd(serie): # finding the max-draw-down of a series (put aside not to create add'l problems)
min = np.nanargmax(np.fmax.accumulate(serie) - serie)
max = np.nanargmax((serie)[:min])
return serie[np.nanargmax((serie)[:min])] - serie[min] # max-draw-down
# defining the input data
# mat is an array of 5 columns containing series of independent data
mat = np.array([[1, 0, 0, 1, 1],[2, 0, 5, 3, 4],[3, 2, 4, 3, 7],[4, 1, 3, 3.1, -6],[5, 0, 2, 5, -7],[6, -1, 4, 1, -8]]).astype('float32')
w = np.ndarray(shape=(5)).astype('float32') # 1D vector for the weights to be used for the columns multiplication
w0 = np.array([1/5, 1/5, 1/5, 1/5, 1/5]).astype('float32') # initial weights (all similar as a starting point)
fixed_value = 4.32 # as a result of constraint nb 1
# testing the operations that are going to be used in the minimization
series = np.sum(mat * w0, axis=1)
# objective:
# minimize the mdd of the series by modifying the weights (w)
def test(w, mat):
series = np.sum(mat * w, axis=1)
return mdd(series)
# constraints:
def cons1(last, w, fixed_value): # fixed_value = 4.32
# the sum of the weigths multiplied by the last value of each column must be equal to this fixed_value
return np.sum(mat[-1, :] * w) - fixed_value
def cons2(w): # the sum of the weights must be equal to 1
return np.sum(w) - 1
# solution:
# looking for the optimal set of weights (w) values that minimize the mdd with the two contraints and bounds being respected
# all w values must be between 0 and 1
result = fmin_slsqp(test, w0, f_eqcons=[cons1, cons2], bounds=[(0.0, 1.0)]*len(w), args=(mat, fixed_value, w0), full_output=True)
weights, fW, its, imode, smode = result
print(weights)
You weren't that far off the mark. The biggest problem lies in the mdd function: In case there is no draw-down, your function spits out an empty list as an intermediate result, which then can no longer cope with the argmax function.
def mdd(serie): # finding the max-draw-down of a series (put aside not to create add'l problems)
i = np.argmax(np.maximum.accumulate(serie) - serie) # end of the period
start = serie[:i]
# check if there is dd at all
if not start.any():
return 0
j = np.argmax(start) # start of period
return serie[j] - serie[i] # max-draw-down
In addition, you must make sure that the parameter list is the same for all functions involved (cost function and constraints).
# objective:
# minimize the mdd of the series by modifying the weights (w)
def test(w, mat,fixed_value):
series = mat # w
return mdd(series)
# constraints:
def cons1(w, mat, fixed_value): # fixed_value = 4.32
# the sum of the weigths multiplied by the last value of each column must be equal to this fixed_value
return mat[-1, :] # w - fixed_value
def cons2(w, mat, fixed_value): # the sum of the weights must be equal to 1
return np.sum(w) - 1
# solution:
# looking for the optimal set of weights (w) values that minimize the mdd with the two contraints and bounds being respected
# all w values must be between 0 and 1
result = fmin_slsqp(test, w0, eqcons=[cons1, cons2], bounds=[(0.0, 1.0)]*len(w), args=(mat,fixed_value), full_output=True)
One more remark: You can make the matrix-vector multiplications much leaner with the #-operator.
I'm trying to create a 2-layer neural network, for that I first initialize weights and biases to random floats between 0 an 1 using numpy.random.rand. However, for some reason this process produces floats bigger than 1 for W1 (weight 1) whereas it works correctly for all other weights an biases. I can't understand why this happens, I thought maybe something affects the function from outside the function where I initialized the parameters, but I couldn't detect any part in the function that could be affected from outside the function.
import numpy as np
### CONSTANTS DEFINING THE MODEL ####
n_x = 12288 # num_px * num_px * 3
n_h = 7
n_y = 1
layers_dims = (n_x, n_h, n_y)
def initialize_parameters_deep(layer_dims):
"""
Arguments:
layer_dims -- python array (list) containing the dimensions of each layer in our network
Returns:
parameters -- python dictionary containing your parameters "W1", "b1","W2", "b2":
"""
np.random.seed(1)
parameters = {}
parameters["W1"] = np.random.rand(n_h, n_x) #(7, 12288)
parameters["b1"] = np.random.rand(n_h, 1) #(7)
parameters["W2"] = np.random.rand(n_y, n_h) #(7, 1)
parameters["b2"] = np.random.rand(n_y, 1) #(1)
return parameters
parameters = initialize_parameters_deep(layers_dims)
print(parameters)
Output:
{'W1': array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, ...,
3.37562919e-01, 1.12292153e-01, 5.37047221e-01],
[7.07934286e-01, 3.37726007e-01, 7.07954162e-01, ...,
4.22040811e-01, 7.78593215e-01, 3.49866021e-01],
[9.01338451e-01, 7.95132845e-03, 1.03777034e-01, ...,
2.78602449e-01, 5.05813021e-02, 8.26828833e-01],
...,
[5.62717083e-03, 6.58208224e-01, 3.88407263e-01, ...,
5.56312618e-01, 8.69650932e-01, 1.00112287e-01],
[4.16278934e-01, 4.56060621e-01, 9.33378848e-01, ...,
9.52798385e-01, 9.41894584e-01, 4.44342962e-01],
[8.89254832e-01, 6.42558949e-01, 2.29427262e-01, ...,
8.05884494e-01, 1.80676088e-01, 6.12694420e-01]]), 'b1': array([[0.11933315],
[0.50073416],
[0.21336813],
[0.14223935],
[0.60809243],
[0.41994954],
[0.43137737]]), 'W2': array([[0.81360697, 0.44638382, 0.41794085, 0.08649817, 0.29957473,
0.33706742, 0.24721952]]), 'b2': array([[0.92363097]])}
It's not generating floats bigger than 1, it's just representing them differently.
4.17022005e-01 is the same as 0.417022005, and 1.14374817e-04 is the same as 0.000114374817.
See here or here.
The e-01, e-02, e-03, etc at the end of the W1 numbers just mean that the numbers are written in exponential format. So if you have for example 2.786e-01 that is the same as if it was written like (2.786/10) and that is the same as 0.2786. Same thing goes for: 2.786e-03 == (2.786/1000) == 0.002786. e+2 is 10^2 and e-2 is 1/(10^2).
Pay attention to the final few characters printed when you print your weights parameter tensor, which gives e.g. e-01. This represents base-10 exponentiation, i.e. meaning that the value of a given weight is the number printed times 10 to the given power.
All of the powers are negative, meaning the weights have small but positive values in the range [0, 1].
For example, 4.17022005e-01 equals 0.417022005.
I'm working on a model to generate music. All of my training data is in the same key and mode, C Major. I have a numpy array keyspace with shape (n,) that represents the total number of keys on my keyboard (in a chromatic scale). The slots in that array with a 1 are keys that are in C Major; the slots that have 0s are not in C Major.
The model predicts which keys should be pressed as an array y_pred. I want to add a term to my loss function that penalizes the model for pressing keys that aren't in C Major. That said, I don't want to penalize my model for failing to press keys in the keyspace (as not every beat uses every key in the scale!). In numpy, I can do this like so:
import numpy as np
keyspace = np.array( [0, 1, 0, 1, 0, 1] )
y_pred = np.array( [1, 0, 0, 1, 0, 1] )
loss_term = 0
for idx, i in enumerate(y_pred):
if i:
if not keyspace[idx]:
loss_term += 1
loss_term
I'd now like to convert this to Keras backend functions, which means vectorizing this. Does anyone see a good way to do so? Any pointers would be very helpful!
Your code is basically:
((1-keyspace) * y_pred).sum()
Test:
def loop_loss(keyspace, y_pred):
loss_term = 0
for idx, i in enumerate(y_pred):
if i and not keyspace[idx]:
loss_term += 1
return loss_term
keyspace, y_pred = np.random.choice([0,1], (2,10))
loop_loss(keyspace, y_pred) == ((1-keyspace) * y_pred).sum()
# True
I want to loop over a tensor which contains a list of Int, and apply a function to each of the elements.
In the function every element will get the value from a dict of python.
I have tried the easy way with tf.map_fn, which will work on add function, such as the following code:
import tensorflow as tf
def trans_1(x):
return x+10
a = tf.constant([1, 2, 3])
b = tf.map_fn(trans_1, a)
with tf.Session() as sess:
res = sess.run(b)
print(str(res))
# output: [11 12 13]
But the following code throw the KeyError: tf.Tensor'map_8/while/TensorArrayReadV3:0' shape=() dtype=int32 exception:
import tensorflow as tf
kv_dict = {1:11, 2:12, 3:13}
def trans_2(x):
return kv_dict[x]
a = tf.constant([1, 2, 3])
b = tf.map_fn(trans_2, a)
with tf.Session() as sess:
res = sess.run(b)
print(str(res))
My tensorflow version is 1.13.1. Thanks ahead.
There is a simple way to achieve, what you are trying.
The problem is that the function passed to map_fn must have tensors as its parameters and tensor as the return value. However, your function trans_2 takes plain python int as parameter and returns another python int. That's why your code doesn't work.
However, TensorFlow provides a simple way to wrap ordinary python functions, which is tf.py_func, you can use it in your case as follows:
import tensorflow as tf
kv_dict = {1:11, 2:12, 3:13}
def trans_2(x):
return kv_dict[x]
def wrapper(x):
return tf.cast(tf.py_func(trans_2, [x], tf.int64), tf.int32)
a = tf.constant([1, 2, 3])
b = tf.map_fn(wrapper, a)
with tf.Session() as sess:
res = sess.run(b)
print(str(res))
you can see I have added a wrapper function, which expects tensor parameter and returns a tensor, that's why it can be used in map_fn. The cast is used because python by default uses 64-bit integers, whereas TensorFlow uses 32-bit integers.
You cannot use a function like that, because the parameter x is a TensorFlow tensor, not a Python value. So, in order for that to work, you would have to turn your dictionary into a tensor as well, but it's not so simple because keys in the dictionary may not be sequential.
You can instead solve this problem without mapping, but instead doing something similar to what is proposed here for NumPy. In TensorFlow, you could implement it like this:
import tensorflow as tf
def replace_by_dict(x, d):
# Get keys and values from dictionary
keys, values = zip(*d.items())
keys = tf.constant(keys, x.dtype)
values = tf.constant(values, x.dtype)
# Make a sequence for the range of values in the input
v_min = tf.reduce_min(x)
v_max = tf.reduce_max(x)
r = tf.range(v_min, v_max + 1)
r_shape = tf.shape(r)
# Mask replacements that are out of the input range
mask = (keys >= v_min) & (keys <= v_max)
keys = tf.boolean_mask(keys, mask)
values = tf.boolean_mask(values, mask)
# Replace values in the sequence with the corresponding replacements
scatter_idx = tf.expand_dims(keys, 1) - v_min
replace_mask = tf.scatter_nd(
scatter_idx, tf.ones_like(values, dtype=tf.bool), r_shape)
replace_values = tf.scatter_nd(scatter_idx, values, r_shape)
replacer = tf.where(replace_mask, replace_values, r)
# Gather the replacement value or the same value if it was not modified
return tf.gather(replacer, x - v_min)
# Test
kv_dict = {1: 11, 2: 12, 3: 13}
with tf.Graph().as_default(), tf.Session() as sess:
a = tf.constant([1, 2, 3])
print(sess.run(replace_by_dict(a, kv_dict)))
# [11, 12, 13]
This will allow you to have values in the input tensor without replacements (left as they are), and also does not require to have all the replacement values in the tensor. It should be efficient unless the minimum and maximum values in your input are very far away.