Is there a sparse version of tf.multiply? - python

Does Tensorflow has a sparse element wise multiplication?
I.e. A sparse version of tf.multiply()
I only found tf.sparse_tensor_dense_matmul(), but it's not element wise multiplication.

The function you might be looking for is: __mul__
Additional details from official documentation:
The output locations corresponding to the implicitly zero elements in the sparse tensor will be zero (i.e., will not take up storage space), regardless of the contents of the dense tensor (even if it's +/-INF and that INF*0 == NaN).
Limitation: this Op only broadcasts the dense side to the sparse side, but not the other direction.
Example:
sp_mat = tf.SparseTensor([[0,0],[0,2],[1,2],[2,1]], np.ones(4), [3,3])
const1 = tf.constant([[1,2,3],[4,5,6],[7,8,9]], dtype=tf.float64)
const2 = tf.constant(np.array([1,2,3]),dtype=tf.float64)
elementwise_result = sp_mat.__mul__(const1)
broadcast_result = sp_mat.__mul__(const2)
print("Sparse Matrix:\n",tf.sparse_tensor_to_dense(sp_mat).eval())
print("\n\nElementwise:\n",tf.sparse_tensor_to_dense(elementwise_result).eval())
print("\n\nBroadcast:\n",tf.sparse_tensor_to_dense(broadcast_result).eval())
Output:
Sparse Matrix:
[[ 1. 0. 1.]
[ 0. 0. 1.]
[ 0. 1. 0.]]
Elementwise:
[[ 1. 0. 3.]
[ 0. 0. 6.]
[ 0. 8. 0.]]
Broadcast:
[[ 1. 0. 3.]
[ 0. 0. 3.]
[ 0. 2. 0.]]

Related

How can I replicate numpy.roll for sparse matrix without converting it into dense matrix?

I am working on some problem which requires rolling the elements in a matrix. Below is the example of using numpy to roll a numpy array as desired. I want to replicate the same for scipy sparse csr_matrix without converting it into dense matrix as in actual use case I am working on would be having very large sparse matrix.
The numpy version:
A=np.eye(3,3)
print(np.roll(A,[0,3]))
Outputs:
[[0. 0. 1.]
[1. 0. 0.]
[0. 1. 0.]]
The desired functionality must do something like this:
A = np.eye(3, 3)
A = sparse.csr_matrix(A)
print(sparse_roll(A, [0, 3]).todense())
Outputs:
[[0. 0. 1.]
[1. 0. 0.]
[0. 1. 0.]]
where sparse_roll is the function to be implemented.

From 3D world into a 2D screen by lookat matrix?

I would like to get 2D screen coordinates of a 3D coordinates point by LookAt matrix. Is there any simple function to do this?
For example:
I get one matrix by lookAt:
[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. -1.]
[ 0. 0. 0. 1.]]
And I have one 3D vector [1,0,1]
What is its "2D screen coordinates"?
Thanks a lot.

Accessing element in a python numpy.matrix

I seem to be stuck with something seemlingy trivial: I need to access elements in a numpy.matrix. But the matrix doesn't behave as I expect:
>>> mymatrix
matrix([[0.02700243, 0. , 0. , ..., 0. , 0. ,
0. ]])
>>> type(mymatrix)
<class 'numpy.matrix'>
>>> mymatrix.shape
(1, 10000)
>>> mymatrix[0]
matrix([[0.02700243, 0. , 0. , ..., 0. , 0. ,
0. ]])
>>> mymatrix[0][0]
matrix([[0.02700243, 0. , 0. , ..., 0. , 0. ,
0. ]])
>>> mymatrix[0][0][0]
matrix([[0.02700243, 0. , 0. , ..., 0. , 0. ,
0. ]])
i.e. no matter whether I take the matrix itself, or the [0] element of the matrix or the [0][0] element of the [0][0][0], i always get the same object ... How is that possible?
According to NumPy Manual:
A matrix is a specialized 2-D array that retains its 2-D nature
through operations
And:
It is no longer recommended to use this class, even for linear
algebra. Instead use regular arrays. The class may be removed in the
future.
Maybe you could consider using a regular array instead. You can return your matrix as an array using:
mymatrix.A
mymatrix.A[0]
mymatrix.A[0][0]
You need to transpose your matrix to index the first element in a matrix with this shape.
Try:
mymatrix.T[0]

How do I change column type in Python from int to object for sklearn?

I am really new to Python and scikit-learn (sklearn) and I am trying to load this dataset which consists of 7 columns of attributes and 1 column of the data classification (class/data target). But there's this one attribute which consists of data [1,2,3,4,5] which actually marks a stage of something, thus making it a nominal, not numeric. But of course python recognizes it as a numerical data (int64), when in fact I want it to be treated as a nominal data (object). How do I change the column type to nominal?
I have done the following.
print(data.dtypes)
data["col_name"]=data["col_name"].astype(numpy.object)
print(data.dtypes)
In the first print, it still recognizes my data["col_name"] as an int64, but after the astype line, it has changed it object. But it doesn't make any difference to the data, since when I try to use matplotlib and create a histogram, it still recognizes both the X and Y as numbers instead of object.
Also I have read about the One Hot Encoding and Label Encoding on the documentation, but I figured they are not what I need in my case. I wonder if I have misunderstood something or maybe there's another solution.
Thanks
Reading through the documents for sklearn. This package has thorough documentation. In particular the Preprocessing section on encoding categorical features:
In regards to keeping categorical features represented in an array of integers, ie [1,2,3,4,5], we have this:
Such integer representation can not be used directly with scikit-learn
estimators, as these expect continuous input, and would interpret the
categories as being ordered, which is often not desired (i.e. the set
of browsers was ordered arbitrarily). One possibility to convert
categorical features to features that can be used with scikit-learn
estimators is to use a one-of-K or one-hot encoding, which is
implemented in OneHotEncoder. This estimator transforms each
categorical feature with m possible values into m binary features,
with only one active.
So what you can to do is convert your array into 5 new columns (this case, since you have 5 possible values) using one-hot encoding.
Here is some working code. The input is a column of categorical parameters [1,2,3,4,5], the ouput is a matrix, 5 columns, 1 for each of the 5 possible choices:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit([[1],[2],[3],[4],[5]])
OneHotEncoder(categorical_features='all', dtype='numpy.float64', handle_unknown='error', n_values='auto', sparse=True)
print enc.transform([[1],[2],[3],[4],[5]]).toarray()
Output:
[[ 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 1.]]
Say your categorical parameters were in this order: [1,3,2,5,4,3,2,1,3,4,2]. You would get this output:
[[ 1. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 1. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0.]
[ 0. 1. 0. 0. 0.]]
So this 1 column will convert into 5 columns.
print(data.dtypes)
data["col_name"]=data["col_name"].astype(str)
print(data.dtypes)

Sklearn digits dataset

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
print(digits.data)
classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.data[:-1], digits.target[:-1]
x = x.reshape(1,-1)
y = y.reshape(-1,1)
print((x))
classifier.fit(x, y)
###
print('Prediction:', classifier.predict(digits.data[-3]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
I have reshaped the x and y as well. Still I'm getting an error saying :
Found input variables with inconsistent numbers of samples: [1, 1796]
Y has 1-d array with 1796 elements whereas x has many. How does it show 1 for x?
Actually scrap what I suggested below:
This link describes the general dataset API. The attribute data is a 2d array of each image, already flattened:
import sklearn.datasets
digits = sklearn.datasets.load_digits()
digits.data.shape
#: (1797, 64)
This is all you need to provide, no reshaping required. Similarly, the attribute data is a 1d array of each label:
digits.data.shape
#: (1797,)
No reshaping necessary. Just split into training and testing and run with it.
Try printing x.shape and y.shape. I feel that you're going to find something like: (1, 1796, ...) and (1796, ...) respectively. When calling fit for classifiers in scikit it expects two identically shaped iterables.
The clue, why are the arguments when reshaping different ways around:
x = x.reshape(1, -1)
y = y.reshape(-1, 1)
Maybe try:
x = x.reshape(-1, 1)
Completely unrelated to your question, but you're predicting on digits.data[-3] when the only element left out of the training set is digits.data[-1]. Not sure if that was intentional.
Regardless, it could be good to check your classifier over more results using the scikit metrics package. This page has an example of using it over the digits dataset.
The reshaping will transform your 8x8 matrix to a 1-dimensional vector, which can be used as a feature. You need to reshape the entire X vector, not only those of the training data, since the one's you will use for prediction need to have the same format.
The following code shows how:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.images, digits.target
#only reshape X since its a 8x8 matrix and needs to be flattened
n_samples = len(digits.images)
x = x.reshape((n_samples, -1))
print("before reshape:" + str(digits.images[0]))
print("After reshape" + str(x[0]))
classifier.fit(x[:-2], y[:-2])
###
print('Prediction:', classifier.predict(x[-2]))
###
plt.imshow(digits.images[-2], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
###
print('Prediction:', classifier.predict(x[-1]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
It will output:
before reshape:[[ 0. 0. 5. 13. 9. 1. 0. 0.]
[ 0. 0. 13. 15. 10. 15. 5. 0.]
[ 0. 3. 15. 2. 0. 11. 8. 0.]
[ 0. 4. 12. 0. 0. 8. 8. 0.]
[ 0. 5. 8. 0. 0. 9. 8. 0.]
[ 0. 4. 11. 0. 1. 12. 7. 0.]
[ 0. 2. 14. 5. 10. 12. 0. 0.]
[ 0. 0. 6. 13. 10. 0. 0. 0.]]
After reshape[ 0. 0. 5. 13. 9. 1. 0. 0. 0. 0. 13. 15. 10. 15. 5.
0. 0. 3. 15. 2. 0. 11. 8. 0. 0. 4. 12. 0. 0. 8.
8. 0. 0. 5. 8. 0. 0. 9. 8. 0. 0. 4. 11. 0. 1.
12. 7. 0. 0. 2. 14. 5. 10. 12. 0. 0. 0. 0. 6. 13.
10. 0. 0. 0.]
And a correct prediction for the last 2 images, which weren't used for training - you can decide however to make a bigger split between testing and training set.

Categories