Understanding shap_values() function in determing what machine learning model is doing - python

I'm currently following a tutorial on determining why my machine learning model made its predictions by using the "shap" python package
I'm not fully sure however what is happening in the following code shap_values = explainer.shap_values(X=X_test[:1])
I understand that I am looking for shap_values on the first row of my test data, however what does this mean?
In the tutorial, they have nsamples, and l1_reg also passed into .shap_values and I'm not sure what either of these parameters do. Could someone explain simply what these parameters are used for as everything online is a little too low level for my understanding.
I have 1270 features and my model is a sklearn SVC model if this would aid in explaining the nsamples parameter

Related

XGBoost multi:softmax objective function

I have a question regarding the multi:softmax objective function in relation to XGBoost. I've been playing around with the objective function a bit in the context of a multi-class classification and I've noticed something I don't quite understand.
Suppose we have a multi class classification problem with three different classes. So I want to use multi:softmax as objective and set num_class = 3 as recommended in the documentation of XGBoost. Everything works as expected.
https://xgboost.readthedocs.io/en/stable/parameter.html
Now I set num_class = 2 for the same problem setting and XGBoost still works as before.
Why does it still work even though num_class was set incorrectly?

How to predict the results of a tobit model (trained using AER library in R) with rpy2?

I have a tobit regression model in R working completely well, where I am also able to predict the actual output values for the test set using Inverse-Mills Ratio. However, the rest of the code for my project is in python so I wanted to explore and use the rpy2 API to migrate the code to python from R. I have been able to achieve the bit until model training using AER.tobit() from the library AER (used in R). However when it comes to predicting on test data, the code is not performing as expected. When I use robjects.r.predict(model,newdata) from rpy2, it just gives me fitted values for the training data instead of responses for the test data. If anybody knows a way around this and can let me know, it will be great help! Thanks in advance!
Let me know if you'd need more clarification on the problem.

Scikit-learn: Utilizing ElasticNet with GridSearchCV

I am novice when it comes to Machine Learning, but I am very interested on this topic. I have a few questions so bear with me.
This is a time-series analysis.
I am using ElasticNet with GridSearchCV to figure out the best Hyperparameters for my model. I went through the steps with feature selection to reduce my features (I am using an f_regression at the basic <0.05 sig level). I am not using any test to figure out Multicollinearity, because I assume that elastic net would use the L1 ratio = 1 (Or close to it) to get around this issue. The parameter grids are listed below:
l1_space = np.linspace(.30, .90, 30)
alpha_space = np.logspace(-4, 1, 30)
However, the best parameters I keep getting are l1_space = 0.3 and alpha_space = 0.0001. Which defeats the purpose of using ElasticNet I assume? This gives me a really good adjusted R^2 and RMSE. However, when I change the parameter grids slightly, my metrics are horrible.
Its either the model is bad (Overfitting), or I just do not understand what is going on. I have read the documentations over and over again, but still am not understanding.
Thank you in advance!

Election winning Prediction of 5 Candidates using Linear Regression in python

I got one project in which I need to build a prediction model using linear regression in build. The case study is, need to predict winning of 5 candidates in an election. In this, I don't have any data and need to build data on my own but I am not able to visualize parameters. Can anybody help me in data building, it would be highly helpful.
you can start by get previous years election winner as your'e train data, if you don't have any train data , you have a problem in using linear regression (or supervised learning) after that if you want to use python try this step's :
use some code from this tutorial : https://machinelearningmastery.com/machine-learning-in-python-step-by-step/ or any other good beginner tutorial , you can join some comunity like https://www.kaggle.com/ and get some ideas from their kernel regarding to some processing of the data and tuning parameters
As I understand this question, you need to create a model based on data you don't have yet. Presumably you will get the data later on, by which time the model should already be implemented. You can create a fake data set using the numpy.random library. We's need more details on what exactly you're trying to do, though.

Tensorflow (Contrib.learn) KeyError when fitting model

I'm somewhat new to coding and very new to Tensorflow, but I've taken an online machine learning course, so I have some background and an example under my belt.
I am using the Contrib.learn module(?), which puts Tensorflow in a scikit-learn style experience for the user. Anyway, my data set is 20 columns of float64 values, and just over 2000 rows long. Each column is named. I setup my feature_columns with,
feature_columns = []
for i in X_train.columns:
feature_columns.append(tf.contrib.layers.real_valued_column(i, dtype=type(X_train[i][0])))
And then I instantiate my deep neural network model with,
classifier = learn.DNNClassifier(hidden_units=[10,10,10],
feature_columns=feature_columns,
n_classes=2)
So far, so good. Then I try to fit my model with,
classifier.fit(X_train, y_train, steps=100, batch_size=32)
And I get a really deep Trackback that ultimately ends in,
KeyError: 'IQR'
which is the name of my 6th data column.
There isn't a lot of examples of people using Contrib.learn, and I'm guessing the few people who are using it aren't as clueless as I am. If anyone happens to know what it might be referring to, I could really use the help, since I'm basically out of ideas. If you need any more information from me, or if you want me to paste the whole Traceback, just let me know.
Thanks for your time!
edit: Link to Traceback (via Pastebin)
I ended up ditching Contrib.learn and doing it "the hard way" using Tensorflow's core functionality to build a multilayer perceptron.
Pretty cool stuff!
Function throws as it looks for a feature IQR (which is defined in your feature_columns).
In order to use predict you need to do something like:
predicted_values = list(m.predict(input_fn=lambda: input_fn(df_test), as_iterable=False))
Where input_fn is the input function and df_test is Panda's DataFrame with required features
Hope this helps.

Categories