Issues printing an estimator.predict tensor

Issues printing an estimator.predict tensor - python

I am currently using tf 1.4, and I need help looking at the predictions of a tf.contrib.factorization.KMeansClustering estimator. My current code segment looks like:
km = KMeansClustering(num_clusters=8,initial_clusters=KMeansClustering.KMEANS_PLUS_PLUS_INIT,model_dir=MODEL,relative_tolerance=0.01)
result = km.train(input_fn=lambda: gen_input(body))
input_fn = tf.estimator.inputs.pandas_input_fn(x={'x':tst}, shuffle=False)
y = result.predict(input_fn)
Where body and tst are pandas dataframes. print(y) gives:
<generator object Estimator.predict at 0x11ebecba0>
And trying things that I've searched up like calling print(list(y)), print(next(y)) or iterating through y like:
for i in y:
...
for i in y.items():
...
for i in enumerate(y):
...
etc, gives the error TypeError: data must be either a numpy array or pandas DataFrame if pandas is installed; got dict. I can't find any other ways to try and print this online. Thanks

Your code is too less to confirm on what's wrong/missing. Also at least a full stack trace is expected. This answer may prove to be well off after you add more information.
Is it that you're expecting the call to pandas_input_fn return something else than what you expect? It returns a function with signature ()->(dict of features, target) See docs for details.
Also, it you don't seem to be running a TensorFlow session. Until you do so, all tensors, computations (predictions in your case) etc are just part of a graph, they will have values only after running a TF session.
See these docs for more details.

Related

Is there a way of knowing if `.repeat`/`.batch`/`.shuffle` have been used on a tensorflow dataset?

One gets an already built tensorflow dataset object (tf.data.Dataset) named data.
Is there a way to know if the function repeat/batch/shuffle was called on this object, by inspecting data ? (and possibly get other informations like the argument of batch and repeat)
(I assume eager execution)
edit 1: seems line the str method carries some information. Looking into that.
edit 2: the attribute output_shapes give information on the batch size and shapes.

The only solution I could think of is getting into tensorflow code. gen_dataset_ops.py is generated during building from source, so it could only be found locally.
Another file is dataset_ops.py, it's available in the link below. You just insert print statement before relevant function's return. For example shuffle function from dataset_ops.py:
def shuffle(self, buffer_size, seed=None, reshuffle_each_iteration=None):
"""Randomly shuffles the elements of this dataset.
...
print('Dataset shuffled') #inserted print here
return ShuffleDataset(self, buffer_size, seed, reshuffle_each_iteration)
Dataset object is wrapped into DatasetV1Adapter, so you can't know anything about it advance. The only difference in eager mode is that it supports explicit iteration, but it'll be extremely inefficient to do smth like
array = np.random.rand(10)
dataset = tf.data.Dataset.from_tensor_slices(array)
if len([i for i in dataset]) != array.shape[0]:
print('repeated')
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/data/ops/dataset_ops.py

What is the second argument of TensorFlow's tf.data.filter() that I find no documentation of?

I recently had a TypeError when using
def lie_filter(line):
return tf.equal(line['lie_id'], 2)
in
dataset = (
tf.data
.TextLineDataset('shots.csv')
.skip(1)
.map(decode_line)
.filter(lie_filter)
.cache())
The exact error was TypeError: lie_filter() takes 1 positional argument but 2 were given.
Simply changing the function signature to lie_filter(line, x) made the error go away and the filtering appears to work as intended. However, it left me wondering what is this mysterious second argument.
TensorFlow manual for tf.data.filter() only specifies one argument. There are also numerous examples by TensorFlow where filtering is done as per my attempt above. Take a look at, e.g., imports85.py.
Printing the x inside lie_filter yields Tensor("arg12:0", shape=(), dtype=float32).
What is the second argument and where can I find documentation about it?
Thank you!

Okay, of course I finally got it just after submitting the question. As I suspected, this was my own doing. The map() is returning a tuple of (features, label). The second argument is of course the label as a tensor.
Hopefully this is of help for someone from the future :)

Typecasting error in TensorFlow

Suppose I am implementing a linear layer on some training data that looks like
The following code
import tensorflow as tf
import numpy as np
weights = tf.Variable(np.random.uniform(0.0, 1.0, 3))
bias = tf.Variable(0.0)
trainingData = np.array(np.arange(15).astype(float).reshape(3,5))
output = tf.expand_dims(weights, 0) # trainingData + bias
produces
This can be fixed by instead changing the last line to say
tf.cast(tf.expand_dims(weights, 0) # trainingData, tf.float32) + bias
OK, so it doesn't like adding a float32_ref to a float64, but it's OK with adding a float32_ref to a float32. But I must be doing something wrong, because I'm doing something very simple, and it's throwing an error. (I'm new to TensorFlow.) I understand why it didn't like what I wrote, but what basic mistake am I making that's causing this problem?
I'm looking for an answer like "Oh, you should never initialize bias with a float like 0.0, because..." "That will lead to typecasting errors more generally."

Oh, you should never use tf.Variable unless you have a very good reason. You should use tf.get_variable instead to avoid issues.
Oh, you should never use float64 as the data type, unless you do have a good reason. NumPy uses float64 as a default, so you should write something like
W = tf.get_variable("w", initializer=np.random.randn().astype(np.float32))

Making Prediction with tensorflow's estimator.DNNRegressor

I am quite new to tensorflow and in order to learn to use it I am currently trying to implement a very simple DNNRegressor that predicts the movement of an object in 2D but I can't seem to the the predict function to work.
for this purpose I have some Input data - x and y coordinates of the object in a number of previous time steps. I want the output to a reasonable estimation of the location the object if it continues to move in the same direction with the same speed.
I am using tensorflow version 1.8.0
My regressor is defined like this:
CSV_COLUMN_NAMES = ['X_0', 'X_1', 'X_2', 'X_3', 'X_4', 'Y_0', 'Y_1', 'Y_2', 'Y_3', 'Y_4', 'Y_5']
my_feature_columns = []
for key in columnNames:
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
regressor = estimator.DNNRegressor(feature_columns=my_feature_columns,
label_dimension=1,
hidden_units=hidden_layers,
model_dir=MODEL_PATH,
dropout=dropout,
config=test_config)
my input is, like the one in the tensorflow tutorial on premade estimators, a dict with the column as key.
An example for this input can be seen here.
regressor.train(arguments) and regressor.evaluate(arguments) seem to work just fine, but predict does not.
parallel to the code on the tensorflow site I tried to do this:
y_pred = regressor.predict(input_fn=eval_input_fn(X_test, labels=None, batch_size=1))
and it seems like that works as well.
The problem I'm facing now is that I can't get anything from that y_pred object.
when I enter print(y_pred) I get <generator object Estimator.predict at 0x7fd9e8899888> which would suggest to me that should be able to iterate over it but
for elem in y_pred:
print(elem)
results in TypeError: unsupported callable
Again, I'm quite new to this and I am sorry if the answer is obvious but I would be very grateful if someone could tell me what I'm doing wrong here.

The input_fn to regressor.predict should be a function. See the definition:
input_fn: A function that constructs the features.
You need to change your code to:
y_pred = regressor.predict(input_fn=eval_input_fn)

python statsmodels.predict does not work

x=range(100)
y=sin(x)
result=sm.OLS(x,y).fit()
result.predict(x)
Gives:
ValueError: matrices are not aligned
This is very simple code, not sure why it's not working? I searched lots of forums but could not find exact solution.

quick answer:
I think you want x and y reversed result=sm.OLS(y, x).fit()
The dependent variable (y) comes first, and then the array of explanatory variables (x).
The call to predict works with statsmodels master, but maybe you need a 2-D x in an older version:
result.predict(x[:,None])
to make the explanatory variable into a column_array. I don't remember when this was changed for 1-D x.)
Note also, that there is no constant/intercept added automatically when we don't use the formula interface.
The predict for the sample or training data can also be accessed through results.fittedvalues.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issues printing an estimator.predict tensor - python

Related

Is there a way of knowing if `.repeat`/`.batch`/`.shuffle` have been used on a tensorflow dataset?

What is the second argument of TensorFlow's tf.data.filter() that I find no documentation of?

Typecasting error in TensorFlow

Making Prediction with tensorflow's estimator.DNNRegressor

python statsmodels.predict does not work

Categories

Resources