I am desperately trying to use cohen kappa's metric as either a loss function or an evaluation metric in my keras neural network. I have tried many different implementations of it online and none of them seem to be maintained. In particular due to the fact that tf.contrib no longer exists in tensorflow 2.0. Any help in pointing in me in the right direction of a working implementation will be much appreciated!
When using the tf add-on class found here https://www.tensorflow.org/addons/api_docs/python/tfa/metrics/cohens_kappa.
I keep getting the following error and have no idea how I would go about debugging this.
ValueError: Number of samples in y_true and y_pred are different
Related
I would like to implement a cost-sensitive loss function in PyTorch. My two-class training dataset is heavily imbalanced, where 75% of the data are label '0' and only 25% of the data are label '1'.
I am new to PyTorch but my supervisor is adamant that I use it (they have more experience with it).
I found some implementations in Keras, but I am not that strong in coding to be able to port it over to PyTorch.
I have read around to find some resources to create a cost-sensitive loss function.
This paper uses something which I think might work (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9417097), but I do not understand how the code is implemented despite having access to it here (https://github.com/emadeldeen24/AttnSleep/blob/f993511426900f9fca20594a738bf8bee1116381/utils/util.py).
This website describes the math very detailedly but I do not understand it: https://medium.com/rv-data/how-to-do-cost-sensitive-learning-61848bf4f5e7
Here is an implementation in Keras which I have trouble with converting to PyTorch: https://towardsdatascience.com/fraud-detection-with-cost-sensitive-machine-learning-24b8760d35d9
I also found this implementation in PyTorch, but have trouble with understanding it: https://discuss.pytorch.org/t/dealing-with-imbalanced-datasets-in-pytorch/22596/21
Could you please help me to understand the last link's implementation of the cost-sensitive loss function?
Thank you.
Does anyone has experience with mixed-precision training using the tensorflow estimator api?
I tried casting my inputs to tf.float16 and the results of the network back to tf.float32. For scaling the loss I used tf.contrib.mixed_precision.LossScaleOptimizer.
The error messages I get are relatively uninformative: "Tried to convert 'x' to a tensor and failed. Error: None values not supported",
I found the issue: I used tf.get_variable to store the learning rate. This variable has no gradient. Normal optimizers do not care, but tf.contrib.mixed_precision.LossScaleOptimizer crashes. Therefore, make sure these variables are not added to tf.GraphKeys.TRAINABLE_VARIABLES.
I am trying to convert a keras model to tpu model in google colab, but this model has another model inside.
Take a look at the code:
https://colab.research.google.com/drive/1EmIrheKnrNYNNHPp0J7EBjw2WjsPXFVJ
This is a modified version of one of the examples in the google tpu documentation:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb
If the sub_model is converted and used directly it works, but if the sub model is inside another model it does not work. I need the sub model type of network because i am trying to train a GAN network that has 2 networks inside (gan=generator+discriminator) so if this test works probably it will work with the gan too.
I have tried several things:
Convert to tpu the model without converting the sub model, in that case when training starts an error is prompted related to the inputs of the sub model.
Convert both the model and sub model to tpu, in that case an error is prompted when converting the "parent" model, the exception only says at the end "layers".
Convert only the sub model to tpu, in that case no error is prompted but the training is not accelerated by the tpu and it is extremely slow like if no conversion to tpu was made at all.
Using fixed batch size or not, both have the same result, the model does not work.
Any ideas? Thanks a lot.
Divide into parts only use submodel at tpu first. Then put something simple instead of submodel and use the model in TPU. If this does not work , create something very simple which includes similar structure with models you are sure that are working and then step by step add things to converge your complex model which you want to use in TPU.
I am struggling with such things. What I did at the very beginning using MNIST is trained the model and get the coefficients outside rewrite relu dense dropout and NN matricies myself and run the model using numpy and then cupy and then pyopencl and then I replaced functions with my own raw cuda C and opencl functions so that getting deeper and simpler I can find what is wrong when something does not work. At last I write my genetic selective training algo and learned a lot.
And most important it gave me the opportunity to try some crazy ideas for training and modelling and manuplating and making sense of NN coffecients.
The problem in my opinion is TF - Keras etc are too high level. Optimizers - Solvers , there is too much unknown. Even neural networks are not under control. GAN is problematic while training it does not converge everytime takes days to train most of the time. Even if you train. You dont know any idea how it converges. Most of the tricks - techniques which protects you from vanishing gradient are not mathematically backed they are nevertheless works very amazingly. (?!?)
**Go simpler deeper and and complexity step by step. Follow a practicing on which you comprehend as much as you can ** It will cost some time and energy but you will benefit it tremendously in my opinion.
I am really at wit's end and don't know where else I can ask so I am asking here. I know that my question may not be of the best quality but I am hoping for at least some guidance on the direction I should look to figure out my problem.
I am replicating sci-kit learn's implementation of Elastic Net Multiple Linear Regression in tensorflow and tensorboard as a learning exercise so I can eventually move on to implement and visualize more difficult machine learning algorithms.
I have some code that does a Multiple Linear Regression using the Elastic Net Regularization as the loss function. With gradient descent, it converges to a suboptimal solution compared to sci-kit learn's algorithm. Through some searching, I learned that sci-kit learn initializes weights using the Xavier method, so I did that in tensorflow as well. Performance improved slightly but still was no where close to sklearn. My next improvement was to change the optimizer to attempt to match performance although my research told me that scikit learn uses coordinate descent which is a method that isn't implemented in tensorflow.
However, this is where I am stuck. It seems that simply switching out the optimizer for another optimizer does not seem to work (not that I expected it to, but I'm also having trouble finding material that will tell me how to set up properly). Currently I've simply performed the switch the following way, can anyone give me a hint why my gradients are 0?
Thanks!
# Declare optimizer
my_opt = tf.train.GradientDescentOptimizer(0.001)
my_opt = tf.train.AdamOptimizer(epsilon = 0.1)
Histogram of gradients:
Loss function showing that Adam optimizer isn't doing anything:
EDIT:
I have updated my learning rate to be higher, but convergence still doesn't seem that great. I think I will proceed to try to implement Coordinate Descent in tensorflow to match sci-kit learn's method as close as possible. I've attached an image of the difference for those curious:
In comparison to SGD:
I'm trying to use a tensorflow op inside a Keras model. I previously tried to wrap it with a Lambda layer but I believe this disables that layers' backpropagation.
More specifically, I'm trying to use the layers from here in a Keras model, without porting it to Keras layers (I hope to deploy to tensorflow later on). I can compile these layers in a shared library form and load these into python. This gives me tensorflow ops and I don't know how to combine this in a Keras model.
A simple example of a Keras MNIST model, where for example one Conv2D layer is replaced by a tf.nn.conv2d op, would be exactly what I'm looking for.
I've seen this tutorial but it appears to do the opposite of what I am looking for. It seems to insert Keras layers into a tensorflow graph. I'm looking to do the exact opposite.
Best regards,
Hans
Roughly two weeks have passed and it seems I am able to answer my own question now.
It seems like tensorflow can look up gradients if you register them using this decorator. As of writing, this functionality is not (yet) available in C++, which is what I was looking for. A workaround would be to define a normal op in C++ and wrap it in a python method using the mentioned decorator. If these functions with corresponding gradients are registered with tensorflow, backpropagation will happen 'automagically'.