If I run
from sklearn.datasets import load_breast_cancer
import lightgbm as lgb
breast_cancer = load_breast_cancer()
data = breast_cancer.data
target = breast_cancer.target
params = {
"task": "convert_model",
"convert_model_language": "cpp",
"convert_model": "test.cpp",
}
gbm = lgb.train(params, lgb.Dataset(data, target))
then I was expecting that a file called test.cpp would be created, with the model saved in c++ format.
However, nothing appears in my current directory.
I have read the documentation (https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters), but can't tell what I'm doing wrong.
Here's a real 'for dummies' answer:
Install the CLI version of lightgbm: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html
Make note of your installation path, and find the executable. For example, for me, this was ~/LightGBM/lightgbm.
Run the following in a Jupyter notebook:
from sklearn.datasets import load_breast_cancer
import pandas as pd
breast_cancer = load_breast_cancer()
data = pd.DataFrame(breast_cancer.data)
target = pd.DataFrame(breast_cancer.target)
pd.concat([target, data], axis=1).to_csv("regression.train", header=False, index=False)
train_conf = """
task = train
objective = binary
metric = auc
data = regression.train
output_model = trained_model.txt
"""
with open("train.conf", "w") as f:
f.write(train_conf)
conf_convert = """
task = convert_model
input_model= trained_model.txt
"""
with open("convert.conf", "w") as f:
f.write(conf_convert)
! ~/LightGBM/lightgbm config=train.conf
! ~/LightGBM/lightgbm config=convert.conf
Your model with be saved in your current directory.
In the doc they say:
Note: can be used only in CLI version
under the convert_model and convert_model_language parameters.
That means that you should probably use the CLI (Command Line Interfarce) of LGBM instead of the python wrapper to do this.
Link to Quick Start CLI version.
Related
I am trying to learn the use of gravityai and frankly i am a bit new to this. For that i followed https://www.youtube.com/watch?v=i6qL3NqFjs4 from Ania Kubow. When i do this, at the end i encounter the error message. This message appears in gravity ai, when trying to run the job, i.e. after uploading all zipped files three .pkl files, one .py file, one .txt file, one .json file), after docker is initialized and run:
Error running executable: usage: classify_financial_articles.py [-h] {run,serve} ... classify_financial_articles.py: error: argument subcommand: invalid choice: '/tmp/gai_temp/0675f15ca0b04cf98071474f19e38f3c/76f5cdc86a1241af8c01ce1b4d441b0c' (choose from 'run', 'serve').
I do not understand the error message and therefore cannot fix it. Is it an error in the code? or in the configuration on the gravityai platform? At no point do i run the .py file explicitly so i conclude, that it must be from the gravityai. Yet i dont get the error. Can anyone help me?
i added the .py file, as it is the one throwing the error
from gravityai import gravityai as grav
import pickle
import pandas as pd
model = pickle.load(open('financial_text_classifier.pkl', 'rb'))
tfidf_vectorizer = pickle.load(open('financial_text_vectorizer.pkl','rb'))
label_encder = pickle.load(open('financial_text_encoder.pkl', 'rb'))
def process(inPath, outPath):
# read csv input file
input_df = pd.read_csv(inPath)
# read the data
features = tfidf_vectorizer.transform(input_df['body'])
# predict classes
predictions = model.predict(features)
#convert outpulabels to categories
input_df['category'] = label_encder.inverse_transform(predictions)
#save results to csv
output_df = input_df(['id', 'category'])
output_df.csv(outPath, index=False)
grav.wait_for_requests(process)
I can't find any errors in the .py file
The error you got comes from the line that imports gravityai library:
from gravityai import gravityai as grav
I believe you need to upload your projet to the gravityai platform and then you will be able to test it
To test localy you need to:
1)Add line"import sys". 2)Comment or delete line"grav.wait_for_requests(process)" 3)Add line: "process(inPath=sys.argv[2], outPath=sys.argv[3])". 4)Run from command line "python classify_financial_articles.py run test_set.csv test_set_out.csv"
Example of code:
from gravityai import gravityai as grav
import pickle
import pandas as pd
import sys
model = pickle.load(open('financial_text_clasifier.pkl', 'rb'))
tfidf_vectorizer = pickle.load(open('financial_text_vectorizer.pkl', 'rb'))
label_encoder = pickle.load(open('financial_text_encoder.pkl', 'rb'))
def process(inPath, outPath):
input_df = pd.read_csv(inPath)
features = tfidf_vectorizer.transform(input_df['body'])
predictions = model.predict(features)
input_df['category'] = label_encoder.inverse_transform(predictions)
output_df = input_df[['id', 'category']]
output_df.to_csv(outPath, index=False)
process(inPath=sys.argv[2], outPath=sys.argv[3])
I am following an example on datacamp which is using a deprecated version of rasa_nlu. The sample code of the datacamp example looks like this.
# Import necessary modules
from rasa_nlu.converters import load_data
from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer
# Create args dictionary
args = {"pipeline": "spacy_sklearn"}
# Create a configuration and trainer
config = RasaNLUConfig(cmdline_args=args)
trainer = Trainer(config)
# Load the training data
training_data = load_data("./training_data.json")
# Create an interpreter by training the model
interpreter = trainer.train(training_data)
# Test the interpreter
print(interpreter.parse("I'm looking for a Mexican restaurant in the North of town"))
This example imports RasaNLUConfig from rasa_nlu.config to create a config and trainer.
My question is how do I make something like this with the newer rasa 1.1x? The code that I wrote looks like this
from rasa_nlu.training_data import load_data
#Instead of from rasa_nlu import config the deprecated version used 'from rasa_nlu.config import RasaNLUConfig'
from rasa_nlu import config
from rasa_nlu.model import Trainer
#create args dictionary
args = {"pipeline": "spacy_sklearn"}
#create a configuration and trainer
config= RasaNLUConfig(cmdline_args=args)
trainer = Trainer(config)
#load training data
training_data = load_data('/content/nluintent.md')
#Create an interpreter by training the model
interpreter = trainer.train(training_data)
print(interpreter.parse('Hi, can you help me?'))
How would I be able to train the model using the new version of Rasa?
I have the following code, which is called test_build, and it has a test case to save a scikit-learn model along with x_train, y_train and score data, in a tuple object to a ".pkl" file.
from build import *
import os
import pandas as pd
import sklearn
from sklearn import *
import unittest
from sklearn.model_selection import train_test_split
import numpy as np
import tempfile
class TestMachineLearningUtils(unittest.TestCase):
def test_save_model(self):
X, y = np.arange(10).reshape((5,2)), range(5)
model = RandomForestClassifier(n_estimators = 300,
oob_score = True,
n_jobs = -1,
random_state = 123)
X_train, X_test, y_train, y_test = train_test_split(\
X, y, test_size=0.33, random_state=42)
clf = model.fit(X_train, y_train)
score = model.score(X_test, y_test)
dir_path = os.path.dirname(os.path.realpath(__file__))
f = tempfile.TemporaryDirectory(dir = dir_path)
pkl_file_name = f.name + "/" + "pickle_model.pkl"
tuple_objects = (clf, X_train, y_train, score)
path_model = save_model(tuple_objects, pkl_file_name)
exists_model = os.path.exists(path_model)
self.assertExists(exists_model, True)
if __name__ == "__main__":
unittest.main()
This is the content of the save_model function found in the build module I imported in my test file.
def save_model(tuple_objects, model_path):
pickle.dump(tuple_objects, open(model_path), 'wb')
return model_path
The problem I am running to, is that I cannot test if the file is created within a temporal directory. It is apparently created, but it is cleaned after it has been created, from the error message I receive.
C:\Users\User\AppData\Local\Continuum\miniconda3\envs\geoenv\lib\tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory>
Does anyone knows a solution to this problem? How could one supress the cleaning up of a temporary directory created using the tempfile module in python?
It looks to me like your code does exactly what you want it to do and you just get confused by the warning. The warning is merely telling you that you should explicitly delete the temporary directory, but that the module is so kind as to do it for you.
To delete the temporary directory yourself, either use it in a context manager or call the close method on it.
You simply don't. If you want the directory to outlast the scope you create "not a temporary directory".
Or more likely when testing - you create directory in test setup, fill, test, teardown, so each test is independent.
I have trained a CNN using fastai on Kaggle and also on my local machine. After calling learn.fit_one_cycle(1) on Kaggle I get the following table as output:
I executed the exact same code on my local machine (with Spyder ide and Python 3.7) and everything works, but I cannot see that output table. How can I display it?
This is the complete code:
from fastai import *
from fastai.vision import *
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
bs = 32
path = 'C:\\DB\\UCMerced_LandUse\\UCMerced_LandUse\\Unfoldered_Images'
pat = r"([^/\d]+)[^/]*$"
fnames = get_image_files(path)
data = ImageDataBunch.from_name_re(path, fnames, pat, ds_tfms=get_transforms(),
size = 224, bs = bs, num_workers = 0).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet34, metrics=[accuracy])
learn.fit_one_cycle(1)
The problem was that the console in Spyder was set to 'execute in current console' which doesn't seem to be able to displaye the result table. Setting it to 'execute in an external system terminal' solved the problem.
I installed sklearn in my enviorment and running it now on jupyter notebook on windows.
How can I avoid the error:
URLError: urlopen error [Errno 11004] getaddrinfo failed
I am running the following code:
import sklearn
import sklearn.ensemble
import sklearn.metrics
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
which gives the error with line 5:
----> 3 newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
I am behind a proxy on my working computer, is there any option to avoid this error and to be able to use the sample datasets?
According to source code, scikit-learn will download the file from:
https://ndownloader.figshare.com/files/5975967
I am assuming that you cannot reach this location from behind the proxy.
Can you access the dataset by some other means? If yes, then you can download it manually and then run the following script on it:
and keep it at the location:
~/scikit_learn_data/
Here ~ refers to the user home folder. You can use the following code to know the default location of that folder according to your system.
from sklearn.datasets import get_data_home
print(get_data_home())
Update: Once done, use the following script to make it in a form in which scikit-learn keeps its caches
import codecs, pickle, tarfile, shutil
from sklearn.datasets import load_files
data_folder = '~/scikit_learn_data/'
target_folder = data_folder+'20news_home/'
tarfile.open(data_folder+'20newsbydate.tar.gz', "r:gz").extractall(path=target_folder)
cache = dict(train=load_files(target_folder+'20news-bydate-train', encoding='latin1'),
test=load_files(target_folder+'20news-bydate-test', encoding='latin1'))
compressed_content = codecs.encode(pickle.dumps(cache), 'zlib_codec')
with open(data_folder+'20news-bydate_py3.pkz', 'wb') as f:
f.write(compressed_content)
shutil.rmtree(target_folder)
Scikit-learn will always check if the dataset exists locally before attempting to download from internet. For that it will check the above location.
After that you can run the import normally.