Tensorflow (Contrib.learn) KeyError when fitting model - python

I'm somewhat new to coding and very new to Tensorflow, but I've taken an online machine learning course, so I have some background and an example under my belt.
I am using the Contrib.learn module(?), which puts Tensorflow in a scikit-learn style experience for the user. Anyway, my data set is 20 columns of float64 values, and just over 2000 rows long. Each column is named. I setup my feature_columns with,
feature_columns = []
for i in X_train.columns:
feature_columns.append(tf.contrib.layers.real_valued_column(i, dtype=type(X_train[i][0])))
And then I instantiate my deep neural network model with,
classifier = learn.DNNClassifier(hidden_units=[10,10,10],
feature_columns=feature_columns,
n_classes=2)
So far, so good. Then I try to fit my model with,
classifier.fit(X_train, y_train, steps=100, batch_size=32)
And I get a really deep Trackback that ultimately ends in,
KeyError: 'IQR'
which is the name of my 6th data column.
There isn't a lot of examples of people using Contrib.learn, and I'm guessing the few people who are using it aren't as clueless as I am. If anyone happens to know what it might be referring to, I could really use the help, since I'm basically out of ideas. If you need any more information from me, or if you want me to paste the whole Traceback, just let me know.
Thanks for your time!
edit: Link to Traceback (via Pastebin)

I ended up ditching Contrib.learn and doing it "the hard way" using Tensorflow's core functionality to build a multilayer perceptron.
Pretty cool stuff!

Function throws as it looks for a feature IQR (which is defined in your feature_columns).
In order to use predict you need to do something like:
predicted_values = list(m.predict(input_fn=lambda: input_fn(df_test), as_iterable=False))
Where input_fn is the input function and df_test is Panda's DataFrame with required features
Hope this helps.

Related

spacy how to add patterns to existing Entity ruler?

My spacy version is 2.3.7. I have an existing trained custom NER model with NER and Entity Ruler pipes.
I want to update and retrain this existing pipeline.
The code to create the entity ruler pipe was as follows-
ruler = EntityRuler(nlp)
for i in patt_dict:
ruler.add_patterns(i)
nlp.add_pipe(ruler, name = "entity_ruler")
Where patt_dict is the original patterns dictionary I had made.
Now, after finishing the training, now I have more input data and want to train the model more with the new input data.
How can I modify the above code to add more of patterns dictionary to the entity ruler when I load the spacy model later and want to retrain it with more input data?
It is generally better to retrain from scratch. If you train only on new data you are likely to run into "catastrophic forgetting", where the model forgets anything not in the new data.
This is covered in detail in this spaCy blog post. As of v3 the approach outlined there is available in spaCy, but it's still experimental and needs some work. In any case, it's still kind of a workaround, and the best thing is to train from scratch with all data.
I'd also recommend polm23's suggestion to retrain fully in this situation.
Here is why: we are asking the model to produce inferences based on weights derived from matching input data to labels/classes/whatever over and over. These weights are toggled via backprop to reduce the error gradient vis a vis the labels/classes/whatever. When the weights, given whatever data, produce errors as close to 0 as possible eventually the loss reaches an equilibrium or you just call it via hyper parameters (epochs).
However, by only using the new data, you will only optimize for that specific data. The model will generalize poorly, but really only because it is learning exactly what you asked it to learn and nothing else. When you add in that retraining fully is usually not the end of the world, it just kinda makes sense as a best practice.
(This is my imperfect understanding of the catastrophic forgetting issue, happy to learn more if other's have deeper knowledge).

Understanding shap_values() function in determing what machine learning model is doing

I'm currently following a tutorial on determining why my machine learning model made its predictions by using the "shap" python package
I'm not fully sure however what is happening in the following code shap_values = explainer.shap_values(X=X_test[:1])
I understand that I am looking for shap_values on the first row of my test data, however what does this mean?
In the tutorial, they have nsamples, and l1_reg also passed into .shap_values and I'm not sure what either of these parameters do. Could someone explain simply what these parameters are used for as everything online is a little too low level for my understanding.
I have 1270 features and my model is a sklearn SVC model if this would aid in explaining the nsamples parameter

How to upload an image and try to predict it using the DL model

I'm kinda new to the whole machine learning/deep learning field and I have a few doubts.
I was doing this tutorial on the tensorflow page, https://www.tensorflow.org/tutorials/images/classification, and everything went according to normal, but the tutorial ends at the validation loss/accuracy.
I was wondering on how I could upload a photo of my dog and put it in the model to test it, but I couldn't find an answer to that.
I tried using a few images of the training dataset to try to predict them, but the classes it predicts, along with the numbers predicted are weird, as shown here: https://imgur.com/a/45jubOh. My results are really similar to the tutorial's.
Can someone help me with that? Uploading the image and testing it and interpreting the classes along with the predicted numbers.
Thanks a lot!

Election winning Prediction of 5 Candidates using Linear Regression in python

I got one project in which I need to build a prediction model using linear regression in build. The case study is, need to predict winning of 5 candidates in an election. In this, I don't have any data and need to build data on my own but I am not able to visualize parameters. Can anybody help me in data building, it would be highly helpful.
you can start by get previous years election winner as your'e train data, if you don't have any train data , you have a problem in using linear regression (or supervised learning) after that if you want to use python try this step's :
use some code from this tutorial : https://machinelearningmastery.com/machine-learning-in-python-step-by-step/ or any other good beginner tutorial , you can join some comunity like https://www.kaggle.com/ and get some ideas from their kernel regarding to some processing of the data and tuning parameters
As I understand this question, you need to create a model based on data you don't have yet. Presumably you will get the data later on, by which time the model should already be implemented. You can create a fake data set using the numpy.random library. We's need more details on what exactly you're trying to do, though.

How to plot the tree of a Light GBM .joblib model?

I'm very new to Machine Learning! My problem concern a model created with LighGBM. I'm not the creator of this model, so I want to see the tree that this model generates. The model is in the format .joblib, and I want to know as much information as possible of it. On the LighGBMclassifier Documentation i don't find anything that can solve my problem. Thanks to the code below, I only know the number of classes.
model = joblib.load("*.joblib.dat")
model.classes_
Output:
array([0, 1])
I want to know the number of rows, the rules and if it's possible, even the plot of the tree. Thank you all!
When you can not find something in the LightGBM documentation, look for it at XGBoost documentation. LightGBM documentation has loss of missing features that the framework has actually in it.

Categories