I have a Markov chain function implemented in JAX that advances the chain from state s -> s' based on some training data (X_train).
def step(state: dict, key, X_train) -> dict:
new_state = advance(state, key, X_train)
return new_state
Here, state is a fairly complicated tree-structured dict of array's that was generated by Haiku. For example,
state = {
'layer1': {
'weights': array(...),
'bias': array(...),
},
'layer2': {
'weights': array(...),
'bias': array(...),
},
}
I would like to run multiple Markov chains, with different states, in parallel. At first glance, jax.vmap function looks like a good candidate. However, state is not an array but a (tree-structured) dict.
What is the best way to approach this?
Thanks!
Yes, you could use vmap for any pytree. But this is how you should construct it:
states = {'layer1':{'weights':jnp.array([[1, -2, 3],
[4, 5, 6]])},
'layer2':{'weights':jnp.array([[1, .2, 3],
[.4, 5, 6]])}}
So in your first run, your weights will be [1, -2, 3] and [1, .2, 3] for layer1 and layer2 respectively (second run will be [4, 5, 6] and [.4, 5, 6]). But markov chain should be handled by jax.lax.scan. And you could use jit compilation to speed things up. Here is a trivial example. In each step chain calculates the following:
import jax
import jax.numpy as jnp
from functools import partial
#jax.jit
def step(carry, k):
# this function runs a single step in the chain
# X_train dim:(3,3)
# w1 dim: (1,3)
# w2 dim: (3,1)
# X_new = log(Relu(w1#X_old)#w2) + e
# e~Normal(0, 1)
state, X_train, rng = carry
rng, rng_input = jax.random.split(rng)
e = jax.random.normal(rng) # generate pseudorandom
w1 = state['layer1']['weights'] # it is a column vector
w2 = state['layer2']['weights'][None, :] # make it a row vector
X_train = jax.nn.relu(w1#X_train)[:, None]+1
X_train = jnp.log(X_train#w2)
X_train = X_train + e
return [state, X_train, rng], e
#partial(jax.jit, static_argnums = 3)
def fi(state, X_train, key, number_of_steps):
rng = jax.random.PRNGKey(key)
carry = [state, X_train, rng]
carry, random_normals = jax.lax.scan(step, carry, xs = jnp.arange(number_of_steps))
state, X, rng = carry
return X
X_train = jnp.array([[1., -1., 0.5],
[1., 1, 2.],
[4, 2, 0.1]])
states = {'layer1':{'weights':jnp.array([[1, -2, 3],
[4, 5, 6]])},
'layer2':{'weights':jnp.array([[1, .2, 3],
[.4, 5, 6]])}}
vmap_fi = jax.vmap(fi, (0, None, None, None)) # only map on first argument axis 0
key = 42 # random seed
number_of_steps = 100 # chain runs 100 steps
last_states = vmap_fi(states, X_train, key, number_of_steps)
print(last_states)
Output:
[[[ 1.8478627 0.23842478 2.946475 ]
[ 1.3278859 -0.28155205 2.4264982 ]
[ 2.0921988 0.48276085 3.1908112 ]]
[[ 2.9374144 5.4631433 5.645465 ]
[ 3.4333894 5.959118 6.1414394 ]
[ 3.4612248 5.9869533 6.169275 ]]]
In this example, you could make states dictionaries more complicated. You just need to parallelize on their 0th axis.
Related
I want to replicate the torch.gather() function in TensorFlow 2.X.
I have a Tensor A (shape: [2, 4, 3]) and a corresponding Index-Tensor I (shape: [2,2,3]).
Using torch.gather() yields the following:
A = torch.tensor([[[10,20,30], [100,200,300], [1000,2000,3000]],
[[50,60,70], [500,600,700], [5000,6000,7000]]])
I = torch.tensor([[[0,1,0], [1,2,1]],
[[2,1,2], [1,0,1]]])
torch.gather(A, 1, I)
>
tensor([[[10, 200, 30], [100, 2000, 300]],
[5000, 600, 7000], [500, 60, 700]]])
I have tried using tf.gather(), but this did not yield pytorch-like results. I also tried to play around with tf.gather_nd(), but I could not find a suitable solution.
I found this StackOverflow post, but this seems not to work for me.
Edit:
When using tf.gather_nd(A, I), I get the following result:
tf.gather_nd(A, I)
>
[[100, 6000],
[ 0, 60]]
The result for tf.gather(A, I) is rather lengthy. It has the shape of [2, 2, 3, 4, 3]
torch.gather and tf.gather_nd work differently and will therefore yield different results when using the same indices tensor (in some cases an error will also be returned). This is what the indices tensor would have to look like to get the same results:
import tensorflow as tf
A = tf.constant([[
[10,20,30], [100,200,300], [1000,2000,3000]],
[[50,60,70], [500,600,700], [5000,6000,7000]]])
I = tf.constant([[[
[0,0,0],
[0,1,1],
[0,0,2],
],[
[0,1,0],
[0,2,1],
[0,1,2],
]],
[[
[1,2,0],
[1,1,1],
[1,2,2],
],
[
[1,1,0],
[1,0,1],
[1,1,2],
]]])
print(tf.gather_nd(A, I))
tf.Tensor(
[[[ 10 200 30]
[ 100 2000 300]]
[[5000 600 7000]
[ 500 60 700]]], shape=(2, 2, 3), dtype=int32)
So, the question is actually how are you calculating your indices or are they always hard-coded? Also, check out this post on the differences of the two operations.
As for the post you linked that didn't work for you, you just need to cast the indices and everything should be fine:
def torch_gather(x, indices, gather_axis):
all_indices = tf.where(tf.fill(indices.shape, True))
gather_locations = tf.reshape(indices, [indices.shape.num_elements()])
gather_indices = []
for axis in range(len(indices.shape)):
if axis == gather_axis:
gather_indices.append(tf.cast(gather_locations, dtype=tf.int64))
else:
gather_indices.append(tf.cast(all_indices[:, axis], dtype=tf.int64))
gather_indices = tf.stack(gather_indices, axis=-1)
gathered = tf.gather_nd(x, gather_indices)
reshaped = tf.reshape(gathered, indices.shape)
return reshaped
I = tf.constant([[[0,1,0], [1,2,1]],
[[2,1,2], [1,0,1]]])
A = tf.constant([[
[10,20,30], [100,200,300], [1000,2000,3000]],
[[50,60,70], [500,600,700], [5000,6000,7000]]])
print(torch_gather(A, I, 1))
tf.Tensor(
[[[ 10 200 30]
[ 100 2000 300]]
[[5000 600 7000]
[ 500 60 700]]], shape=(2, 2, 3), dtype=int32)
You could also try this as an equivalent to torch.gather:
import random
import numpy as np
import tensorflow as tf
import torch
# torch.gather equivalent
def tf_gather(x: tf.Tensor, indices: tf.Tensor, axis: int) -> tf.Tensor:
complete_indices = np.array(np.where(indices > -1))
complete_indices[axis] = tf.reshape(indices, [-1])
flat_ind = np.ravel_multi_index(tuple(complete_indices), x.shape)
return tf.reshape(tf.gather(tf.reshape(x, [-1]), flat_ind), indices.shape)
# ======= test program ========
if __name__ == '__main__':
a = np.random.rand(2, 5, 3, 4)
dim = 2 # 0 <= dim < len(a.shape))
ind = np.expand_dims(np.argmax(a, axis=dim), axis=dim)
# ========== np: groundtruth ==========
np_max = np.expand_dims(np.max(a, axis=dim), axis=dim)
# ========= torch: gather =========
torch_max = torch.gather(torch.tensor(a), dim=dim, index=torch.tensor(ind))
# ========= tensorflow: torch-like gather =========
tf_max = tf_gather(tf.convert_to_tensor(a), axis=dim, indices=tf.convert_to_tensor(ind))
keepdim = False
if not keepdim:
np_max = np.squeeze(np_max, axis=dim)
torch_max = torch.squeeze(torch_max, dim=dim)
tf_max = tf.squeeze(tf_max, axis=dim)
# print('np_max:\n', np_max)
# print('torch_max:\n', torch_max)
# print('tf_max:\n', tf_max)
assert np.allclose(np_max, torch_max.numpy()), '\33[1m\33[31mError with torch\33[0m'
assert np.allclose(np_max, tf_max.numpy()), '\33[1m\33[31mError with tensorflow\33[0m'
print('\33[1m\33[32mSuccess!\33[0m')
The problem I'm trying to solve is the one in the picture. Given a text sentence with word embeddings, and a fixed set of indexes for each sentence pointing to the words I want to keep, how do I slice the embeddings of interest?
Note: I cannot do it as a preprocess step because the embeddings are the result of several layers.
As a toy example, say that I have 2 input datasets, one containing the data itself as 2D tensors, and another one containing the indices of the words that I'm interested in. So for instance
NUM_SENTENCES=2
NUM_ENTITIES_PER_REL=3
LEN_SENTENCE=5
NUM_H_T=2
DIM_EMBEDDING=2
indices = tf.constant([
[1, 3],
[0, 4]
])
data = tf.constant(np.reshape(np.arange(NUM_SENTENCES*LEN_SENTENCE*DIM_EMBEDDING), [NUM_SENTENCES, LEN_SENTENCE, DIM_EMBEDDING]))
With the index as stated, I want to retrieve elements 1 and 3 from first element, and 0 and 4 from second to result in
array([[[ 2, 3],
[ 6, 7]],
[[10, 11],
[18, 19]]])
I can obtain desired result if I do:
selector = [[[idx, elem]
for elem in arr]
for idx, arr in enumerate(indices)]
tf.gather_nd(data, selector)
but this doesn't work within a model. Here it is my code:
input_text = keras.Input(shape=(LEN_SENTENCE, DIM_EMBEDDING), name="input_sentence")
input_ent = keras.Input(shape=(NUM_ENTITIES_PER_REL, 2), dtype=tf.int32, name="entities_to_classify")
class Selector(layers.Layer):
def __init__(self, **kwargs):
super(Selector, self).__init__(**kwargs)
def call(self, inputs):
h_s = inputs[1]
indexes = inputs[0]
idxs = indexes.numpy()
selector = [[[idx, elem]
for elem in arr]
for idx, arr in enumerate(idxs)]
return tf.gather_nd(h_s, selector)
x = Selector(name="selector")([input_ent, input_text])
model = keras.Model(inputs=[input_ent, input_text], outputs=x, name='language_model')
keras.utils.plot_model(model, '/tmp/model.jpg', show_shapes=True)
and the result of executing it (I'm using tensorflow==2.0.0-beta1).
AttributeError: 'Tensor' object has no attribute 'numpy'
and I don't know how to solve this chicken-egg problem. Any ideas?
You can do that like this:
import tensorflow as tf
import numpy as np
NUM_SENTENCES = 2
NUM_ENTITIES_PER_REL = 3
LEN_SENTENCE = 5
NUM_H_T = 2
DIM_EMBEDDING = 2
with tf.Graph().as_default(), tf.Session() as sess:
indices = tf.constant([
[1, 3],
[0, 4]
])
data = tf.constant(np.reshape(np.arange(NUM_SENTENCES * LEN_SENTENCE * DIM_EMBEDDING),
[NUM_SENTENCES, LEN_SENTENCE, DIM_EMBEDDING]))
# Make first dimension indices
s = tf.shape(indices)
idx0 = tf.tile(tf.expand_dims(tf.range(s[0]), 1), [1, s[1]])
# Make full index
idx_gather = tf.stack([idx0, indices], axis=-1)
# Gather result
result = tf.gather_nd(data, idx_gather)
print(sess.run(result))
# [[[ 2 3]
# [ 6 7]]
#
# [[10 11]
# [18 19]]]
Tensor can't be cast to numpy use data instead
idxs = indexes[0].numpy()
I'm generating samples in Tensorflow with tf.multinomial, and I'm looking for a way to return associated probability with the randomly selected element. So in the following case:
logits = [[-1., 0., 1], [1, 1, 1], [0, 1, 2]]
samples = tf.multinomial(logits, 2)
with tf.Session() as sess:
sess.run(samples)
Instead of having
[[1, 2], [0, 1], [1, 1]]
as result, I'd like to see something like
[[(1, 0.244728), (2, 0.66524)],
[(0, 0.33333), (1, 0.33333)],
[(1, 0.244728), (1, 0.244728)]]
Is there any way to achieve this?
I'm confused , does tensorflow do some sort of transformation on the inside that turns your logits into probabilities? The multinomial distribution takes in as parameters a set of positional probabilities that determines, probabilistically the likelihood of the outcome (positionally) being sampled. i.e
# this is all psuedocode
x = multinomial([.2, .3, .5])
y ~ x
# this will give a value of 0 20% of the time
# a value of 1 30% of the time
# and a value of 2 50% of the time
therefor your probabilities might be your logits.
looking at https://www.tensorflow.org/api_docs/python/tf/multinomial
you see it states they are "unnormalized log probabilities" so if you can apply that transformation, you have the probabilities
You can try tf.gather_nd, you can try
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> probs = tf.constant([[0.5, 0.2, 0.1, 0.2], [0.6, 0.1, 0.1, 0.1]], dtype=tf.float32)
>>> idx = tf.multinomial(probs, 1)
>>> row_indices = tf.range(probs.get_shape()[0], dtype=tf.int64)
>>> full_indices = tf.stack([row_indices, tf.squeeze(idx)], axis=1)
>>> rs = tf.gather_nd(probs, full_indices)
Or, you can use tf.distributions.Multinomial, the advantage is you do not need to care about the batch_size in the above code. It works under varying batch_size when you set the batch_size=None. Here is a simple example,
multinomail = tf.distributions.Multinomial(
total_count=tf.constant(1, dtype=tf.float32),
probs=probs)
sampled_actions = multinomail.sample() # sample one action for data in the batch
predicted_actions = tf.argmax(sampled_actions, axis=-1)
action_probs = sampled_actions * predicted_probs
action_probs = tf.reduce_sum(action_probs, axis=-1)
I think this is what you want to do. I prefer the latter one because it is flexible and elegant.
I would like to compare to partitions/clusterings (P1 and P2) of a set S of different sizes. Example:
S = [1, 2, 3, 4, 5, 6]
P1 = [[1, 2], [3,4], [5,6]]
P2 = [ [1,2,3,4], [5, 6]]
From what I have read mutual information could be an approach and it is implemented in scikit-learn. From the definition it does not state that the partitions must be of the same size (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html).l
However when I try to implement my code I get error due to different size.
from sklearn import metrics
P1 = [[1, 2], [3,4], [5,6]]
P2 = [ [1,2,3,4], [5, 6]]
metrics.mutual_info_score(P1,P2)
ValueErrorTraceback (most recent call last)
<ipython-input-183-d5cb8d32ce7d> in <module>()
2 P2 = [ [1,2,3,4], [5, 6]]
3
----> 4 metrics.mutual_info_score(P1,P2)
/home/user/anaconda2/lib/python2.7/site-packages/sklearn/metrics/cluster/supervised.pyc in mutual_info_score(labels_true, labels_pred, contingency)
556 """
557 if contingency is None:
--> 558 labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
559 contingency = contingency_matrix(labels_true, labels_pred)
560 contingency = np.array(contingency, dtype='float')
/home/user/anaconda2/lib/python2.7/site-packages/sklearn/metrics/cluster/supervised.pyc in check_clusterings(labels_true, labels_pred)
34 if labels_true.ndim != 1:
35 raise ValueError(
---> 36 "labels_true must be 1D: shape is %r" % (labels_true.shape,))
37 if labels_pred.ndim != 1:
38 raise ValueError(
ValueError: labels_true must be 1D: shape is (3, 2)
Is there a form to use scikit-learn and mutual information to see how close this partitions are? Otherwise, is there one without using mutual information?
The error is in the form the information is passed into the function. The correct form is giving a list of labels for each element of the global set to partition. In this case one label for each element in S. Each label should correspond to the cluster in which it belongs to, therefore elements with same label are in same cluster. To solve the example:
S = [1, 2, 3, 4, 5, 6]
P1 = [[1, 2], [3,4], [5,6]]
P2 = [ [1,2,3,4], [5, 6]]
labs_1 = [ 1, 1, 2, 2, 3, 3]
labs_2 = [1, 1, 1, 1, 2, 2]
metrics.mutual_info_score(labs_1, labs_2)
answer is then:
0.636514168294813
If we want to compute mutual information for the format of partitions originally given then one can use the following code:
from sklearn import metrics
from __future__ import division
import numpy as np
S = [1, 2, 3, 4, 5, 6]
P1 = [[1, 2], [3,4], [5,6]]
P2 = [ [1,2,3,4], [5, 6]]
set_partition1 = [set(p) for p in P1]
set_partition2 = [set(p) for p in P2]
def prob_dist(clustering, cluster, N):
return len(clustering[cluster])/N
def prob_joint_dist(clustering1, clustering2, cluster1, cluster2, N):
'''
N(int): total number of elements.
clustering1(list): first partition
clustering2(list): second partition
cluster1(int): index of cluster of the first partition
cluster2(int): index of cluster of second partition
'''
c1 = clustering1[cluster1]
c2 = clustering2[cluster2]
n_ij = len(set(c1).intersection(c2))
return n_ij/N
def mutual_info(clustering1, clustering2, N):
'''
clustering1(list): first partition
clustering2(list): second partition
Note for it to work division from __future__ must be imported
'''
n_clas = len(clustering1)
n_com = len(clustering2)
mutual_info = 0
for i in range(n_clas):
for j in range(n_com):
p_i = prob_dist(clustering1, i, N)
p_j = prob_dist(clustering2, j, N)
R_ij = prob_joint_dist(clustering1, clustering2, i, j, N)
if R_ij > 0:
mutual_info += R_ij*np.log( R_ij / (p_i * p_j))
return mutual_info
mutual_info(set_partition1, set_partition2, len(S))
which gives the same answer:
0.63651416829481278
Note that we are using natural logarithm and not log2. The code can be easily adapted though.
I was trying to get the same result fitting lasso using Python's scikit-learn and R's glmnet. A helpful link
If I specify "normalize =True" in Python and "standardize = T" in R, they gave me the same result.
Python:
from sklearn.linear_model import Lasso
X = np.array([[1, 1, 2], [3, 4, 2], [6, 5, 2], [5, 5, 3]])
y = np.array([1, 0, 0, 1])
reg = Lasso(alpha =0.01, fit_intercept = True, normalize =True)
reg.fit(X, y)
np.hstack((reg.intercept_, reg.coef_))
Out[95]: array([-0.89607695, 0. , -0.24743375, 1.03286824])
R:
reg_glmnet = glmnet(X, y, alpha = 1, lambda = 0.02,standardize = T)
coef(reg_glmnet)
4 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.8960770
V1 .
V2 -0.2474338
V3 1.0328682
However, if I don't want to standardize variables and set normalize =False and standardize = F, they gave me quite different results.
Python:
from sklearn.linear_model import Lasso
Z = np.array([[1, 1, 2], [3, 4, 2], [6, 5, 2], [5, 5, 3]])
y = np.array([1, 0, 0, 1])
reg = Lasso(alpha =0.01, fit_intercept = True, normalize =False)
reg.fit(Z, y)
np.hstack((reg.intercept_, reg.coef_))
Out[96]: array([-0.88 , 0.09384212, -0.36159299, 1.05958478])
R:
reg_glmnet = glmnet(X, y, alpha = 1, lambda = 0.02,standardize = F)
coef(reg_glmnet)
4 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.76000000
V1 0.04441697
V2 -0.29415542
V3 0.97623074
What's the difference between "normalize" in Python's Lasso and "standardize" in R's glmnet?
Currently, with regard to the normalize parameter the docs state "If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.''
So evidently normalize and standardize are not the same with sklearn.linear_model.Lasso. Having read the StandardScaler docs I fail to understand the difference, but the fact that there is one is implied by the provided description of the normalize parameter.