TensorFlow - numpy-like tensor indexing - python

In numpy, we can do this:
x = np.random.random((10,10))
a = np.random.randint(0,10,5)
b = np.random.randint(0,10,5)
x[a,b] # gives 5 entries from x, indexed according to the corresponding entries in a and b
When I try something equivalent in TensorFlow:
xt = tf.constant(x)
at = tf.constant(a)
bt = tf.constant(b)
xt[at,bt]
The last line gives a "Bad slice index tensor" exception. It seems TensorFlow doesn't support indexing like numpy or Theano.
Does anybody know if there is a TensorFlow way of doing this (indexing a tensor by arbitrary values). I've seen the tf.nn.embedding part, but I'm not sure they can be used for this and even if they can, it's a huge workaround for something this straightforward.
(Right now, I'm feeding the data from x as an input and doing the indexing in numpy but I hoped to put x inside TensorFlow to get higher efficiency)

You can actually do that now with tf.gather_nd. Let's say you have a matrix m like the following:
| 1 2 3 4 |
| 5 6 7 8 |
And you want to build a matrix r of size, let's say, 3x2, built from elements of m, like this:
| 3 6 |
| 2 7 |
| 5 3 |
| 1 1 |
Each element of r corresponds to a row and column of m, and you can have matrices rows and cols with these indices (zero-based, since we are programming, not doing math!):
| 0 1 | | 2 1 |
rows = | 0 1 | cols = | 1 2 |
| 1 0 | | 0 2 |
| 0 0 | | 0 0 |
Which you can stack into a 3-dimensional tensor like this:
| | 0 2 | | 1 1 | |
| | 0 1 | | 1 2 | |
| | 1 0 | | 2 0 | |
| | 0 0 | | 0 0 | |
This way, you can get from m to r through rows and cols as follows:
import numpy as np
import tensorflow as tf
m = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
rows = np.array([[0, 1], [0, 1], [1, 0], [0, 0]])
cols = np.array([[2, 1], [1, 2], [0, 2], [0, 0]])
x = tf.placeholder('float32', (None, None))
idx1 = tf.placeholder('int32', (None, None))
idx2 = tf.placeholder('int32', (None, None))
result = tf.gather_nd(x, tf.stack((idx1, idx2), -1))
with tf.Session() as sess:
r = sess.run(result, feed_dict={
x: m,
idx1: rows,
idx2: cols,
})
print(r)
Output:
[[ 3. 6.]
[ 2. 7.]
[ 5. 3.]
[ 1. 1.]]

LDGN's comment is correct. This is not possible at the moment, and is a requested feature. If you follow issue#206 on github you'll get updated if/when this is available. Many people would like this feature.

For Tensorflow 0.11, basic indexing has been implemented. More advanced indexing (like boolean indexing) is still missing but apparently is planned for future versions.
Advanced indexing can be tracked with https://github.com/tensorflow/tensorflow/issues/4638

Related

Creating an image contour (1) in a zero matrix, numpy

I'm using opencv to find contours of an object, the cuntours are in the the matrix of shape (7873, 1, 2)
(e.g. matrix zero below) in the form of [[[x1, y1]], [[x2, y2]], ...] whre x and y are indexes of pixels of an image.
Is it possible using numpy trickery to pass a list of all coordinates of the contour and change them to 1?
I'd like to avoid loops as this is time sensitive. Apart from numpy is there another time efficient way to do it?
zero = np.zeros((5, 5))
test = np.array([[[2,1]], [[3, 1]], [[1, 0]]])
zero[test] = 1
desired OUPUT (for this example):
x 0 1 2 3 4
y _____________
0| 0 1 0 0 0
1| 0 0 1 1 0
2| 0 0 0 0 0
3| 0 0 0 0 0
4| 0 0 0 0 0
You can do:
idx = test.reshape(-1,2).T
zero[idx[1], idx[0]] = 1

Creating a value matrix in python

I have a dataset as follow
d = {'dist': [100, 200, 200, 400],'id': [1, 2, 3, 4]}
df = pd.DataFrame(data= d)
I would like to create a value matrix around the id
with the calcul : dist(id1) - dist(id2)
null | 1 | 2 | 3 | 4
1 | 0 | 100 | 100 | 300
2 |-100 | 0 | 0 | 200
3 |-100 | 0 | 0 | 200
4 |-300 |-200 |-200 | 0
Any advices will be appreciated
(Edit) Here's the simplified version via the beauty of numpy:
import numpy as np
d = {'dist': [100, 200, 200, 400],'id': [1, 2, 3, 4]}
a = np.array(d['dist']).reshape(1,-1)
b = np.array(a).reshape(-1,1)
# the solution
print a-b
# [[ 0 100 100 300]
# [-100 0 0 200]
# [-100 0 0 200]
# [-300 -200 -200 0]]
(Old Answer) You can do it with a little matrix algebra:
import numpy as np
d = {'dist': [100, 200, 200, 400],'id': [1, 2, 3, 4]}
a = np.array(d['dist']).reshape(1,-1)
b = np.array(a).reshape(-1,1)
# some matrix algebra
c = b.dot(a)
e = c/a
f = c/b
# the solution
print f-e
# [[ 0 100 100 300]
# [-100 0 0 200]
# [-100 0 0 200]
# [-300 -200 -200 0]]
I'm not familiar with numpy, but you could create the matrix, given the existing data structure, using this mildly complicated dictionary comprehension:
matrix = {id: {v: d.get("dist")[i] - d.get("dist")[j] for j, v in enumerate(d.get("id"))} for i, id in enumerate(d.get("id"))}
Keys of the matrix are the columns, and keys of each column are the rows. You could probably write this in a much neater fashion, but this a built-ins only answer that conforms to your request.

Intraclass Correlation in Python Module?

I'm looking to calculate intraclass correlation (ICC) in Python. I haven't been able to find an existing module that has this feature. Is there an alternate name, or should I do it myself? I'm aware this question was asked a year ago on Cross Validated by another user, but there were no replies. I am looking to compare the continuous scores between two raters.
There are several implementations of the ICC in R. These can be used from Python via the rpy2 package. Example:
from rpy2.robjects import DataFrame, FloatVector, IntVector
from rpy2.robjects.packages import importr
from math import isclose
groups = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,
4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8]
values = [1, 2, 0, 1, 1, 3, 3, 2, 3, 8, 1, 4, 6, 4, 3,
3, 6, 5, 5, 6, 7, 5, 6, 2, 8, 7, 7, 9, 9, 9, 9, 8]
r_icc = importr("ICC")
df = DataFrame({"groups": IntVector(groups),
"values": FloatVector(values)})
icc_res = r_icc.ICCbare("groups", "values", data=df)
icc_val = icc_res[0] # icc_val now holds the icc value
# check whether icc value equals reference value
print(isclose(icc_val, 0.728, abs_tol=0.001))
You can find an implementation at ICC or Brain_Data.icc
The pengouin library computes ICC in 6 different ways, along with associated confidence levels and p values.
You can install it with pip install pingouin or conda install -c conda-forge pingouin
import pingouin as pg
data = pg.read_dataset('icc')
icc = pg.intraclass_corr(data=data, targets='Wine', raters='Judge',
ratings='Scores')
data.head()
| | Wine | Judge | Scores |
|---:|-------:|:--------|---------:|
| 0 | 1 | A | 1 |
| 1 | 2 | A | 1 |
| 2 | 3 | A | 3 |
| 3 | 4 | A | 6 |
| 4 | 5 | A | 6 |
| 5 | 6 | A | 7 |
| 6 | 7 | A | 8 |
| 7 | 8 | A | 9 |
| 8 | 1 | B | 2 |
| 9 | 2 | B | 3 |
icc
| | Type | Description | ICC | F | df1 | df2 | pval | CI95% |
|---:|:-------|:------------------------|------:|-------:|------:|------:|------------:|:-------------|
| 0 | ICC1 | Single raters absolute | 0.773 | 11.199 | 5 | 12 | 0.000346492 | [0.39, 0.96] |
| 1 | ICC2 | Single random raters | 0.783 | 27.966 | 5 | 10 | 1.42573e-05 | [0.25, 0.96] |
| 2 | ICC3 | Single fixed raters | 0.9 | 27.966 | 5 | 10 | 1.42573e-05 | [0.65, 0.98] |
| 3 | ICC1k | Average raters absolute | 0.911 | 11.199 | 5 | 12 | 0.000346492 | [0.65, 0.99] |
| 4 | ICC2k | Average random raters | 0.915 | 27.966 | 5 | 10 | 1.42573e-05 | [0.5, 0.99] |
| 5 | ICC3k | Average fixed raters | 0.964 | 27.966 | 5 | 10 | 1.42573e-05 | [0.85, 0.99] |
The R package psych has an implementation of the Intraclass Correlations (ICC) that calculates many types of variants including ICC(1,1), ICC(1,k), ICC(2,1), ICC(2,k), ICC(3,1) and ICC(3,k) plus other metrics.
This page has a good comparison between the different variants,
You can use the R ICC function via rpy2 package.
Example:
First install psych and lme4 in R:
install.packages("psych")
install.packages("lme4")
Calculate ICC coefficients in Python using rpy2:
import rpy2
from rpy2.robjects import IntVector, pandas2ri
from rpy2.robjects.packages import importr
psych = importr("psych")
values = rpy2.robjects.r.matrix(
IntVector(
[9, 2, 5, 8,
6, 1, 3, 2,
8, 4, 6, 8,
7, 1, 2, 6,
10, 5, 6, 9,
6, 2, 4, 7]),
ncol=4, byrow=True
)
icc = psych.ICC(values)
# Convert to Pandas DataFrame
icc_df = pandas2ri.rpy2py(icc[0])
Results:
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.165783 1.794916 5.0 18.0 0.164720 -0.132910 0.722589
Single_random_raters ICC2 0.289790 11.026650 5.0 15.0 0.000135 0.018791 0.761107
Single_fixed_raters ICC3 0.714829 11.026650 5.0 15.0 0.000135 0.342447 0.945855
Average_raters_absolute ICC1k 0.442871 1.794916 5.0 18.0 0.164720 -0.884193 0.912427
Average_random_raters ICC2k 0.620080 11.026650 5.0 15.0 0.000135 0.071153 0.927240
Average_fixed_raters ICC3k 0.909311 11.026650 5.0 15.0 0.000135 0.675657 0.985891
Based on Brain_Data, I modified the code in order to calculate the correlation coefficients ICC(2,1), ICC(2,k), ICC(3,1) or ICC(3,k) for data input as a table Y (subjects in rows and repeated measurements in columns).
import os
import numpy as np
from numpy import ones, kron, mean, eye, hstack, dot, tile
from numpy.linalg import pinv
def icc(Y, icc_type='ICC(2,1)'):
''' Calculate intraclass correlation coefficient
ICC Formulas are based on:
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in
assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1: x_ij = mu + beta_j + w_ij
icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc
https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
Args:
Y: The data Y are entered as a 'table' ie. subjects are in rows and repeated
measures in columns
icc_type: type of ICC to calculate. (ICC(2,1), ICC(2,k), ICC(3,1), ICC(3,k))
Returns:
ICC: (np.array) intraclass correlation coefficient
'''
[n, k] = Y.shape
# Degrees of Freedom
dfc = k - 1
dfe = (n - 1) * (k-1)
dfr = n - 1
# Sum Square Total
mean_Y = np.mean(Y)
SST = ((Y - mean_Y) ** 2).sum()
# create the design matrix for the different levels
x = np.kron(np.eye(k), np.ones((n, 1))) # sessions
x0 = np.tile(np.eye(n), (k, 1)) # subjects
X = np.hstack([x, x0])
# Sum Square Error
predicted_Y = np.dot(np.dot(np.dot(X, np.linalg.pinv(np.dot(X.T, X))),
X.T), Y.flatten('F'))
residuals = Y.flatten('F') - predicted_Y
SSE = (residuals ** 2).sum()
MSE = SSE / dfe
# Sum square column effect - between colums
SSC = ((np.mean(Y, 0) - mean_Y) ** 2).sum() * n
MSC = SSC / dfc # / n (without n in SPSS results)
# Sum Square subject effect - between rows/subjects
SSR = SST - SSC - SSE
MSR = SSR / dfr
if icc_type == 'icc1':
# ICC(2,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error +
# k*(mean square columns - mean square error)/n)
# ICC = (MSR - MSRW) / (MSR + (k-1) * MSRW)
NotImplementedError("This method isn't implemented yet.")
elif icc_type == 'ICC(2,1)' or icc_type == 'ICC(2,k)':
# ICC(2,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error +
# k*(mean square columns - mean square error)/n)
if icc_type == 'ICC(2,k)':
k = 1
ICC = (MSR - MSE) / (MSR + (k-1) * MSE + k * (MSC - MSE) / n)
elif icc_type == 'ICC(3,1)' or icc_type == 'ICC(3,k)':
# ICC(3,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error)
if icc_type == 'ICC(3,k)':
k = 1
ICC = (MSR - MSE) / (MSR + (k-1) * MSE)
return ICC

Matrix multiplication with SFrame and SArray with Graphlab and/or Numpy

Given a graphlab.SArray named coef:
+-------------+----------------+
| name | value |
+-------------+----------------+
| (intercept) | 87910.0724924 |
| sqft_living | 315.403440552 |
| bedrooms | -65080.2155528 |
| bathrooms | 6944.02019265 |
+-------------+----------------+
[4 rows x 2 columns]
And a graphlab.SFrame (shown below first 10) named x:
+-------------+----------+-----------+-------------+
| sqft_living | bedrooms | bathrooms | (intercept) |
+-------------+----------+-----------+-------------+
| 1430.0 | 3.0 | 1.0 | 1 |
| 2950.0 | 4.0 | 3.0 | 1 |
| 1710.0 | 3.0 | 2.0 | 1 |
| 2320.0 | 3.0 | 2.5 | 1 |
| 1090.0 | 3.0 | 1.0 | 1 |
| 2620.0 | 4.0 | 2.5 | 1 |
| 4220.0 | 4.0 | 2.25 | 1 |
| 2250.0 | 4.0 | 2.5 | 1 |
| 1260.0 | 3.0 | 1.75 | 1 |
| 2750.0 | 4.0 | 2.0 | 1 |
+-------------+----------+-----------+-------------+
[1000 rows x 4 columns]
How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below?:
87910.0724924 * 1
+ 315.403440552 * 1430.0
+ -65080.2155528 * 3.0
+ 6944.02019265 * 1.0
= 350640.36601600994
I've currently doing silly things converting SFrame / SArray into lists and then converting it into numpy arrays to do np.multiply. Even after converting into numpy arrays, it's not giving the right matrix-vector multiplication. My current attempt:
import numpy as np
coef # as should in SArray above.
x # as should in the SFrame above.
intercept = list(x['(intercept)'])
sqftliving = list(x['sqft_living'])
bedrooms = list(x['bedrooms'])
bathrooms = list(x['bathrooms'])
x_new = np.column_stack((intercept, sqftliving, bedrooms, bathrooms))
coef_new = np.array(list(coef['value']))
np.multiply(coef_new, x_new)
(wrong) [out]:
[[ 87910.07249236 451026.91998949 -195240.64665846 6944.02019265]
[ 87910.07249236 930440.14962867 -260320.86221128 20832.06057795]
[ 87910.07249236 539339.88334408 -195240.64665846 13888.0403853 ]
...,
[ 87910.07249236 794816.67019127 -260320.86221128 17360.05048162]
[ 87910.07249236 728581.94767533 -260320.86221128 17360.05048162]
[ 87910.07249236 321711.50936313 -130160.43110564 5208.01514449]]
The output of my attempt is wrong too, it should return a single vector scalar values. There must be an easier way to do it.
How do I manipulate SArray and SFrame such that the multiplication will return a single vector SArray that has the first row as computed as below?
And with numpy Dataframes, how should one perform the matrix-vector multiplcation?
I think your best bet is to convert both the SFrame and SArray to numpy arrays and use the numpy dot method.
import graphlab
sf = graphlab.SFrame({'a': [1., 2.], 'b': [3., 5.], 'c': [7., 11]})
sa = graphlab.SArray([1., 2., 3.])
X = sf.to_dataframe().values
y = sa.to_numpy()
ans = X.dot(y)
I'm using simpler data here than what you have, but this should work for you as well. The only complication I can see is that you have to make sure the values in your SArray are in the same order as the columns in your SFrame (in your example they aren't).
I think this can be done with an SFrame apply as well, but unless you have a lot of data, the dot product route is probably simpler.
To manipulate SArray and SFrame to perform linear algebra operations you need first to convert them to Numpy Array. Make sure that you get right dimensions and order of columns.
(I have coef SArray and features SFrame which is exactly your x)
In [15]: coef = coef.to_numpy()
In [17]: features = features.to_numpy()
Now coef and features are both Numpy arrays. So now multiplying them is as easy as:
In [23]: prod = numpy.dot(features, coef)
In [24]: print prod
[ 350640.36601601 778861.42048755 445897.34956322 641765.45839626
243403.19622833 671306.27500907 1174215.7748441 554607.00200482
302229.79626666 708836.7121845 ]
In [25]: prod.shape
Out[25]: (10,)
In Numpy multiply() and * perform element-wise multiplication. But dot() performs matrix multiplication which is exactly what you need.
Besides your output
[[ 87910.07249236 451026.91998949 -195240.64665846 6944.02019265]
[ 87910.07249236 930440.14962867 -260320.86221128 20832.06057795]
[ 87910.07249236 539339.88334408 -195240.64665846 13888.0403853 ]
...,
[ 87910.07249236 794816.67019127 -260320.86221128 17360.05048162]
[ 87910.07249236 728581.94767533 -260320.86221128 17360.05048162]
[ 87910.07249236 321711.50936313 -130160.43110564 5208.01514449]]
is half wrong. If you now sum values in each row you will get your first element of vector:
In [26]: 87910.07249236 + 451026.91998949 + (-195240.64665846) + 6944.02019265
Out[26]: 350640.3660160399
But dot() does all this for you, so you don't need to worry.
P.S. Are you in Machine Learning Specialization? Me too, that's why I know this :-)

NumPy Tensor / Kronecker product of matrices coming out shuffled

I'm trying to compute the tensor product (update: what I wanted was actually called the Kronecker product, and this naming confusion was why I couldn't find np.kron) of multiple matrices, so that I can apply transformations to vectors that are themselves the tensor product of multiple vectors. I'm running into trouble with flattening the result correctly.
For example, say I want to compute the tensor product of [[0,1],[1,0]] against itself. The result should be something like:
| 0*|0,1| 1*|0,1| |
| |1,0| |1,0| |
| |
| 1*|0,1| 0*|0,1| |
| |1,0| |1,0| |
which I then want to flatten to:
| 0 0 0 1 |
| 0 0 1 0 |
| 0 1 0 0 |
| 1 0 0 0 |
Unfortunately, the things I try all either fail to flatten the matrix or flatten it too much or permute the entries so that some columns are empty. More specifically, the output of the python program:
import numpy as np
flip = np.matrix([[0, 1], [1, 0]])
print np.tensordot(flip, flip, axes=0)
print np.reshape(np.tensordot(flip, flip, axes=0), (4, 4))
is
[[[[0 0]
[0 0]]
[[0 1]
[1 0]]]
[[[0 1]
[1 0]]
[[0 0]
[0 0]]]]
[[0 0 0 0]
[0 1 1 0]
[0 1 1 0]
[0 0 0 0]]
Neither of which is what I want.
There are a lot of other questions similar to this one, but the things suggested in them haven't worked (or maybe I missed the ones that work). Maybe "tensor product" means something slightly different than I thought; but the example above should make it clear.
From the answers to this and this question, I learned what you want is called the "Kronecker product". It's actually built into Numpy, so just do:
np.kron(flip, flip)
But if you want to make the reshape approach work, first rearrange the rows in the tensor:
flip = [[0,1],[1,0]]
tensor4d = np.tensordot(flip, flip, axes=0)
print tensor4d.swapaxes(2, 1).reshape((4,4))

Categories