Download sklearn datasets behind a proxy

Download sklearn datasets behind a proxy - python

I installed sklearn in my enviorment and running it now on jupyter notebook on windows.
How can I avoid the error:
URLError: urlopen error [Errno 11004] getaddrinfo failed
I am running the following code:
import sklearn
import sklearn.ensemble
import sklearn.metrics
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
which gives the error with line 5:
----> 3 newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
I am behind a proxy on my working computer, is there any option to avoid this error and to be able to use the sample datasets?

According to source code, scikit-learn will download the file from:
https://ndownloader.figshare.com/files/5975967
I am assuming that you cannot reach this location from behind the proxy.
Can you access the dataset by some other means? If yes, then you can download it manually and then run the following script on it:
and keep it at the location:
~/scikit_learn_data/
Here ~ refers to the user home folder. You can use the following code to know the default location of that folder according to your system.
from sklearn.datasets import get_data_home
print(get_data_home())
Update: Once done, use the following script to make it in a form in which scikit-learn keeps its caches
import codecs, pickle, tarfile, shutil
from sklearn.datasets import load_files
data_folder = '~/scikit_learn_data/'
target_folder = data_folder+'20news_home/'
tarfile.open(data_folder+'20newsbydate.tar.gz', "r:gz").extractall(path=target_folder)
cache = dict(train=load_files(target_folder+'20news-bydate-train', encoding='latin1'),
test=load_files(target_folder+'20news-bydate-test', encoding='latin1'))
compressed_content = codecs.encode(pickle.dumps(cache), 'zlib_codec')
with open(data_folder+'20news-bydate_py3.pkz', 'wb') as f:
f.write(compressed_content)
shutil.rmtree(target_folder)
Scikit-learn will always check if the dataset exists locally before attempting to download from internet. For that it will check the above location.
After that you can run the import normally.

Related

Importing transformers, evaluate, or torch python library and Internal Server Error 500

It seems that when I call a python function from an xmlhttp request so that its output prints to the browser, the script will lead to an error if any of these import statements (commented) are run:
import traceback
import transformers
"""
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification, TrainingArguments, Trainer
import evaluate
import torch
"""
import pandas as pd
import numpy as np
import datasets
from datasets import load_dataset, load_metric, load_from_disk, Value
import cgi, cgitb
print("Content-Type: text/html\n")
print("Hello!")
That is, if I keep them commented, my hello statement will show on the browser, otherwise, I get an internal server error 500. Any help would be greatly appreciated!!
I already tried suppressing warnings and logs!

How to upgrade TF1 GAN notebook on Colab to TF2? It does't work because Colab is't supporting TF1 anymore

I was trying to run this notebook on colab,
https://colab.research.google.com/github/https-deeplearning-ai/GANs-Public/blob/master/C1W1_(Colab)_Inputs_to_a_pre_trained_GAN.ipynb ,
but first I got this :
ValueError: Tensorflow 1 is unsupported in Colab.
then I upgraded it using this script:
import tensorflow as tf
!tf_upgrade_v2 \
--intree stylegan/ \
--inplace
and I did comment these:
%tensorflow_version 1.x
tflib.init_tf()
but I got this one! and couldn't solve:
AttributeError: Can't get attribute 'Network' on <module 'dnnlib.tflib.network' from '/content/stylegan/dnnlib/tflib/network.py'>
Can somebody help?
# Clone the official StyleGAN repository from GitHub
!git clone https://github.com/NVlabs/stylegan.git
%tensorflow_version 1.x
import os
import pickle
import numpy as np
import PIL.Image
import stylegan
from stylegan import config
from stylegan.dnnlib import tflib
from tensorflow.python.util import module_wrapper
module_wrapper._PER_MODULE_WARNING_LIMIT = 0
# Initialize TensorFlow
tflib.init_tf()
# Go into that cloned directory
path = 'stylegan/'
if "stylegan" not in os.getcwd():
os.chdir(path)
# Load pre-trained network
# url = 'https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ' # Downloads the pickled model file: karras2019stylegan-ffhq-1024x1024.pkl
url = 'https://bitbucket.org/ezelikman/gans/downloads/karras2019stylegan-ffhq-1024x1024.pkl'
with stylegan.dnnlib.util.open_url(url, cache_dir=config.cache_dir) as f:
print(f)
_G, _D, Gs = pickle.load(f)
# Gs.print_layers() # Print network details

GravityAI deployment of AI throws "error: argument subcommand: invalid choice:"

I am trying to learn the use of gravityai and frankly i am a bit new to this. For that i followed https://www.youtube.com/watch?v=i6qL3NqFjs4 from Ania Kubow. When i do this, at the end i encounter the error message. This message appears in gravity ai, when trying to run the job, i.e. after uploading all zipped files three .pkl files, one .py file, one .txt file, one .json file), after docker is initialized and run:
Error running executable: usage: classify_financial_articles.py [-h] {run,serve} ... classify_financial_articles.py: error: argument subcommand: invalid choice: '/tmp/gai_temp/0675f15ca0b04cf98071474f19e38f3c/76f5cdc86a1241af8c01ce1b4d441b0c' (choose from 'run', 'serve').
I do not understand the error message and therefore cannot fix it. Is it an error in the code? or in the configuration on the gravityai platform? At no point do i run the .py file explicitly so i conclude, that it must be from the gravityai. Yet i dont get the error. Can anyone help me?
i added the .py file, as it is the one throwing the error
from gravityai import gravityai as grav
import pickle
import pandas as pd
model = pickle.load(open('financial_text_classifier.pkl', 'rb'))
tfidf_vectorizer = pickle.load(open('financial_text_vectorizer.pkl','rb'))
label_encder = pickle.load(open('financial_text_encoder.pkl', 'rb'))
def process(inPath, outPath):
# read csv input file
input_df = pd.read_csv(inPath)
# read the data
features = tfidf_vectorizer.transform(input_df['body'])
# predict classes
predictions = model.predict(features)
#convert outpulabels to categories
input_df['category'] = label_encder.inverse_transform(predictions)
#save results to csv
output_df = input_df(['id', 'category'])
output_df.csv(outPath, index=False)
grav.wait_for_requests(process)
I can't find any errors in the .py file

The error you got comes from the line that imports gravityai library:
from gravityai import gravityai as grav
I believe you need to upload your projet to the gravityai platform and then you will be able to test it

To test localy you need to:
1)Add line"import sys". 2)Comment or delete line"grav.wait_for_requests(process)" 3)Add line: "process(inPath=sys.argv[2], outPath=sys.argv[3])". 4)Run from command line "python classify_financial_articles.py run test_set.csv test_set_out.csv"
Example of code:
from gravityai import gravityai as grav
import pickle
import pandas as pd
import sys
model = pickle.load(open('financial_text_clasifier.pkl', 'rb'))
tfidf_vectorizer = pickle.load(open('financial_text_vectorizer.pkl', 'rb'))
label_encoder = pickle.load(open('financial_text_encoder.pkl', 'rb'))
def process(inPath, outPath):
input_df = pd.read_csv(inPath)
features = tfidf_vectorizer.transform(input_df['body'])
predictions = model.predict(features)
input_df['category'] = label_encoder.inverse_transform(predictions)
output_df = input_df[['id', 'category']]
output_df.to_csv(outPath, index=False)
process(inPath=sys.argv[2], outPath=sys.argv[3])

fastai cnn_learner output table of fit_one_cycle()

I have trained a CNN using fastai on Kaggle and also on my local machine. After calling learn.fit_one_cycle(1) on Kaggle I get the following table as output:
I executed the exact same code on my local machine (with Spyder ide and Python 3.7) and everything works, but I cannot see that output table. How can I display it?
This is the complete code:
from fastai import *
from fastai.vision import *
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
bs = 32
path = 'C:\\DB\\UCMerced_LandUse\\UCMerced_LandUse\\Unfoldered_Images'
pat = r"([^/\d]+)[^/]*$"
fnames = get_image_files(path)
data = ImageDataBunch.from_name_re(path, fnames, pat, ds_tfms=get_transforms(),
size = 224, bs = bs, num_workers = 0).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet34, metrics=[accuracy])
learn.fit_one_cycle(1)

The problem was that the console in Spyder was set to 'execute in current console' which doesn't seem to be able to displaye the result table. Setting it to 'execute in an external system terminal' solved the problem.

Save LGBM model in `.cpp` format from Python

If I run
from sklearn.datasets import load_breast_cancer
import lightgbm as lgb
breast_cancer = load_breast_cancer()
data = breast_cancer.data
target = breast_cancer.target
params = {
"task": "convert_model",
"convert_model_language": "cpp",
"convert_model": "test.cpp",
}
gbm = lgb.train(params, lgb.Dataset(data, target))
then I was expecting that a file called test.cpp would be created, with the model saved in c++ format.
However, nothing appears in my current directory.
I have read the documentation (https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters), but can't tell what I'm doing wrong.

Here's a real 'for dummies' answer:
Install the CLI version of lightgbm: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html
Make note of your installation path, and find the executable. For example, for me, this was ~/LightGBM/lightgbm.
Run the following in a Jupyter notebook:
from sklearn.datasets import load_breast_cancer
import pandas as pd
breast_cancer = load_breast_cancer()
data = pd.DataFrame(breast_cancer.data)
target = pd.DataFrame(breast_cancer.target)
pd.concat([target, data], axis=1).to_csv("regression.train", header=False, index=False)
train_conf = """
task = train
objective = binary
metric = auc
data = regression.train
output_model = trained_model.txt
"""
with open("train.conf", "w") as f:
f.write(train_conf)
conf_convert = """
task = convert_model
input_model= trained_model.txt
"""
with open("convert.conf", "w") as f:
f.write(conf_convert)
! ~/LightGBM/lightgbm config=train.conf
! ~/LightGBM/lightgbm config=convert.conf
Your model with be saved in your current directory.

In the doc they say:
Note: can be used only in CLI version
under the convert_model and convert_model_language parameters.
That means that you should probably use the CLI (Command Line Interfarce) of LGBM instead of the python wrapper to do this.
Link to Quick Start CLI version.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download sklearn datasets behind a proxy - python

Related

Importing transformers, evaluate, or torch python library and Internal Server Error 500

How to upgrade TF1 GAN notebook on Colab to TF2? It does't work because Colab is't supporting TF1 anymore

GravityAI deployment of AI throws "error: argument subcommand: invalid choice:"

fastai cnn_learner output table of fit_one_cycle()

Save LGBM model in `.cpp` format from Python

Categories

Resources