tf.constant Vs tf.placeholder - python

I am going through Andrew Ng's deep learning course and I don't understand the basics purpose of using constants. When place holders can do the trick, why do we need constants? Suppose I need to calculate a function..the same can be performed by taking constants as well as placeholders. I am very confused. Shall be really grateful if anyone can shed some light.

Constants and placeholders are both nodes in the computation graph with zero inputs and one outputs -- that is, they represent constant values.
The difference is when you as the programmer specify those values. With a constant, the value is a part of the computation graph itself, specified when the constant is created: tf.constant(4), for instance. With a placeholder, every time you run the computation graph, you can feed in a different value in your feed_dict.
In machine learning, placeholders are usually used for nodes that hold data, because we may want to run the same graph again and again, in a loop, with different parts of our dataset. (This would be impossible using constants.) People also use placeholders for parameters that change during training, like the learning rate. (Training generally involves running your computation graph over and over again with different placeholder values.) Constants are used only for things that are actually constant. For those things, we don't want to use placeholders, because we don't want to have to specify them over and over every time we run our graph.
If you're curious, this Jupyter notebook has an in-depth explanation of the computation graph and the role played by placeholders, constants, and variables: https://github.com/kevinjliang/Duke-Tsinghua-MLSS-2017/blob/master/01B_TensorFlow_Fundamentals.ipynb

As their names indicate, a placeholder does not have any fixed value but just 'holds place for a tensor' which is needed in a computation graph. Whereas constant is something (which also holds a tensor) which holds a fixed value. A constant does not change its value during its lifetime (not just a session). Once defined (during programming), it's fixed at that. A placeholder on the other hand, does not indicate any value during graph definition (programming), but gets its value fed in at the time of session run start. In fact, all the placeholders should get their value in such manner.
session.run(a_variable, feed_dict={a_placeholder: [1.0, 2.1]})
Now it might come to one's mind that how is a placeholder different than a tf.variable, well a placeholder can't be asked to be evaluated to a session, like a variable can be:
session.run(a_tf_variable)
Typical use of placeholders is for input nodes, where we feed in the values for different inputs (and we don't expect them to be asked to be evaluated). Typical use for constants is holding values like PI or areas of geographical blocks/districts in population study.

Related

Can doc2vec training result could change with same input data, and same parameter?

I'm using Doc2Vec in gensim library, and finding similiarity between movie, with its name as input.
model = doc2vec.Doc2Vec(vector_size=100, alpha=0.025, min_alpha=0.025, window=5)
model.build_vocab(tagged_corpus_list)
model.train(tagged_corpus_list, total_examples=model.corpus_count, epochs=50)
I set parameter like this, and didn't change preprocessing mechanism of input data, didn't changed original data.
similar_doc = model.dv.most_similar(input)
I also used this code to find most similar movie.
When I restarted code to train this model, the most similar movie has changed, with changed score.
Is this possible? Why? If then, how can I fix the training result?
Yes, this sort of change from run to run is normal. It's well-explained in question 11 of the Gensim FAQ:
Q11: I've trained my Word2Vec / Doc2Vec / etc model repeatedly using the exact same text corpus, but the vectors are different each time. Is there a bug or have I made a mistake? (*2vec training non-determinism)
Answer: The *2vec models (word2vec, fasttext, doc2vec…) begin with random initialization, then most modes use additional randomization
during training. (For example, the training windows are randomly
truncated as an efficient way of weighting nearer words higher. The
negative examples in the default negative-sampling mode are chosen
randomly. And the downsampling of highly-frequent words, as controlled
by the sample parameter, is driven by random choices. These
behaviors were all defined in the original Word2Vec paper's algorithm
description.)
Even when all this randomness comes from a
pseudorandom-number-generator that's been seeded to give a
reproducible stream of random numbers (which gensim does by default),
the usual case of multi-threaded training can further change the exact
training-order of text examples, and thus the final model state.
(Further, in Python 3.x, the hashing of strings is randomized each
re-launch of the Python interpreter - changing the iteration ordering
of vocabulary dicts from run to run, and thus making even the same
string-of-random-number-draws pick different words in different
launches.)
So, it is to be expected that models vary from run to run, even
trained on the same data. There's no single "right place" for any
word-vector or doc-vector to wind up: just positions that are at
progressively more-useful distances & directions from other vectors
co-trained inside the same model. (In general, only vectors that were
trained together in an interleaved session of contrasting uses become
comparable in their coordinates.)
Suitable training parameters should yield models that are roughly as
useful, from run-to-run, as each other. Testing and evaluation
processes should be tolerant of any shifts in vector positions, and of
small "jitter" in the overall utility of models, that arises from the
inherent algorithm randomness. (If the observed quality from
run-to-run varies a lot, there may be other problems: too little data,
poorly-tuned parameters, or errors/weaknesses in the evaluation
method.)
You can try to force determinism, by using workers=1 to limit
training to a single thread – and, if in Python 3.x, using the
PYTHONHASHSEED environment variable to disable its usual string hash
randomization. But training will be much slower than with more
threads. And, you'd be obscuring the inherent
randomness/approximateness of the underlying algorithms, in a way that
might make results more fragile and dependent on the luck of a
particular setup. It's better to tolerate a little jitter, and use
excessive jitter as an indicator of problems elsewhere in the data or
model setup – rather than impose a superficial determinism.
If the change between runs is small – nearest neighbors mostly the same, with a few in different positions – it's best to tolerate it.
If the change is big, there's likely some other problem, like insufficient training data or poorly-chosen parameters.
Notably, min_alpha=0.025 isn't a sensible value - the training is supposed to use a gradually-decreasing value, and the usual default (min_alpha=0.0001) usually doesn't need changing. (If you copied this from an online example: that's a bad example! Don't trust that site unless it explains why it's doing an odd thing.)
Increasing the number of training epochs, from the default epochs=5 to something like 10 or 20 may also help make run-to-run results more consistent, especially if you don't have plentiful training data.

How can I prevent PyTorch from making little changes to my assigned values

PyTorch makes little changes to my assigned values, which causes really different results in my neural network. E.g.:
a = [234678.5462495405945]
b = torch.tensor(a)
print(b.item())
The output is:
234678.546875
The little change PyTorch made to my variable a caused an entirely different result in my neural network. My neural network is a very sensitive one. How can I prevent PyTorch from making little changes to assigned values?
Your question is pretty broad; you haven't shown us your network. That means none of us can address the real issue. But the code sample you show has a more limited scope: why is PyTorch changing my floats?
PyTorch by default uses single-precision floating point (nowadays called binary32). Python by default uses double-precision floating point (nowadays called binary64). When you convert from a Python float to a PyTorch FloatTensor, you lose precision. (This is called rounding.)
If you want, you can specify the data type, but then your entire network will have to be converted to binary64.
Just for your example:
import torch
a = 234678.5462495405945
b = torch.tensor(a, dtype=torch.float64)
print(b.item())
# 234678.54624954058
If your network is that sensitive, you probably have bigger problems. You're likely vastly overfitted, or you're too focused on one training example. A lot of work on quantizing networks and showing performance curves as you use lower-precision numbers has been done.

Reinforcement learning DQN environment structure

I am wondering how best to feed back the changes my DQN agent makes on its environment, back to itself.
I have a battery model whereby an agent can observe a time-series forecast of 17 steps, and 5 features. It then makes a decision on whether to charge or discharge.
I want to includes its current state of charge (empty, half full, full etc) in its observation space (i.e. somewhere within the (17,5) dataframes I am feeding it).
I have several options, I can either set a whole column to the state of charge value, a whole row, or I can flatten the whole dataframe and set one value to the state of charge value.
Is any of these unwise? It seem a little rudimentary to me to set a whole columns to a single value, but should it actually impact performance? I am wary of flattening the whole thing as I plan to use either conv or lstm layers (although the current model is just dense layers).
You would not want to add in unnecessary features which are repetitive in the state representation as it might hamper your RL agent convergence later when you would want to scale your model to larger input sizes(if that is in your plan).
Also, the decision of how much of information you would want to give in the state representation is mostly experimental. The best way to start would be to just give in a single value as the battery state. But if the model does not converge, then maybe you could try out the other options you have mentioned in your question.

<lifelines> Solving Cox Proportional Hazard after creating interaction variable with time

I am using lifelines package to do Cox Regression. After trying to fit the model, I checked the CPH assumptions for any possible violations and it returned some problematic variables, along with the suggested solutions.
One of the solution that I would like to try is the one suggested here:
https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Introduce-time-varying-covariates
However, the example written here is using CoxTimeVaryingFitter which, unlike CoxPHFitter, does not have concordance score, which will help me gauge the model performance. Additionally, CoxTimeVaryingFitter does not have check assumption feature. Does this mean that by putting it into episodic format, all the assumptions are automatically satisfied?
Alternatively, after reading a SAS textbook on survival analysis, it seemed like their solution is to create the interaction term directly (multiplying the problematic variable with the survival time) without changing the format to episodic format (as shown in the link). This way, I was hoping to just keep using CoxPHFitter due to its model scoring capability.
However, after doing this alternative, when I call check_assumptions again on the model with the time-interaction variable, the CPH assumption on the time-interaction variable is violated.
Now I am torn between:
Using CoxTimeVaryingFitter without knowing what the model performance is (seems like a bad idea)
Using CoxPHFitter, but the assumption is violated on the time-interaction variable (which inherently does not seem to fix the problem)
Any help regarding to solve this confusion is greatly appreciated
Here is one suggestion:
If you choose the CoxTimeVaryingFitter, then you need to somehow evaluate the quality of your model. Here is one way. Use the regression coefficients B and write down your model. I'll write it as S(t;x;B), where S is an estimator of the survival, t is the time, and x is a vector of covariates (age, wage, education, etc.). Now, for every individual i, you have a vector of covariates x_i. Thus, you have the survival function for each individual. Consequently, you can predict which individual will 'fail' first, which 'second', and so on. This produces a (predicted) ranking of survival. However, you know the real ranking of survival since you know the failure times or times-to-event. Now, quantify how many pairs (predicted survival, true survival) share the same ranking. In essence, you would be estimating the concordance.
If you opt to use CoxPHFitter, I don't think it was meant to be used with time-varying covariates. Instead, you could use two other approaches. One is to stratify your variable, i.e., cph.fit(dataframe, time_column, event_column, strata=['your variable to stratify']). The downside is that you no longer obtain a hazard ratio for that variable. The other approach is to use splines. Both of these methods are explained in here.

How do I input nearest object in an artificial life simulation in the inputs of a neural network?

I just started working on an artificial life simulation (again... I lost the other one) in Python and Pygame using Pybrain, and I'm planning how this is going to work. So far I have an environment with some "food pellets". A food pellet is added every minutes. I haven't made my agents (aka "Creatures") yet, but I know I want them to have simple feed forward neural networks with some inputs and the outputs will be its' movement. I want the inputs to show what's in front of them, sort of like they are seeing the simulated world in front of them. How should I go about this? I either want them to actually "see" the colors in their line of vision, or just input the nearest object into their NN. Which one would be best, and how will I implement them?
Having a full field of vision is technically possible in a neural network, but requires a LOT of inputs and massive processing; not a direction you should expect to be able to evolve in any kind of meaningful way.
A neural network deals with values and thresholds. I'd recommend using two inputs associated with the nearest individual - one of them has a value for distance (of the nearest) and the other its angle (with zero being directly ahead, less than zero being on the left and greater than zero bring on the right).
Make sure that these values are easy to process into outputs. For example, if one output goes to a rotation actuator, make sure that the input values and output values are on the same scale. Then it will be easy to both turn toward or away from a particular individual.
If you want them to be able to see multiple individuals, simple include multiple pairs of inputs. I was going to suggest putting them in distance order, but it might be easier for them if as soon as an organism sees something it always comes in to the same inputs until it's no longer tracked.

Categories