I perform many logistic regression analyses with different parameters. From time to time I get an annoying message that the iteration limit is reached.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/arnold/bin/anaconda/envs/vehicles/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
I don't want a message, I have 1000's of them in my project during one run. Is there a way to suppress it?
What i'd like is some indication that something had gone wrong, e.g. raising an exception so that I can check afterwards which analyses were ok and which were wrong. Is there a way to do that?
The message is a custom warning defined sklearn.exceptions. You can suppress it (as noted in the comments), and you can catch it as if it was an error. The catch feature allows you to record the message. That might help you check which analyses were okay afterward.
The following code sample should help you get started. It is based on the python warnings documentation. The with block catches and records the warning produced by the logistic regression.
import warnings
from sklearn import datasets, linear_model,exceptions
import matplotlib.pyplot as plt
#>>>Start: Create dummy data
blob = datasets.make_blobs(n_samples=100,centers=1)[0]
x = blob[:,0].reshape(-1,1)
# y needs to be integer for logistic regression
y = blob[:,1].astype(int)
plt.scatter(x,y)
#<<<End: Create dummy data
#<<Create logistic regression. set max_iteration to a low number
lr = linear_model.LogisticRegression(max_iter=2)
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
# Trigger a warning.
lr.fit(x,y)
After running the code, you can check the contents of variable w.
print(type(w))
print(w[-1].category)
print(w[-1].message)
Output:
<class 'list'>
<class 'sklearn.exceptions.ConvergenceWarning'>
lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Related
I am using Python to predict values and getting many warnings like:
Increase the number of iterations (max_iter) or scale the data as
shown in:
https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
C:\Users\ASMGX\anaconda3\lib\site-packages\sklearn\linear_model_logistic.py:762:
ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL
NO. of ITERATIONS REACHED LIMIT.
this prevents me from seeing the my own printed results.
Is there any way I can stop these warnings from showing?
You can use the warnings-module to temporarily suppress warnings. Either all warnings or specific warnings.
In this case scikit-learn is raising a ConvergenceWarning so I suggest suppressing exactly that type of warning. That warning-class is located in sklearn.exceptions.ConvergenceWarning so import it beforehand and use the context-manager catch_warnings and the function simplefilter to ignore the warning, i.e. not print it to the screen:
import warnings
from sklearn.exceptions import ConvergenceWarning
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=ConvergenceWarning)
optimizer_function_that_creates_warning()
You can also ignore that specific warning globally to avoid using the context-manager:
import warnings
warnings.simplefilter("ignore", category=ConvergenceWarning)
optimizer_function_that_creates_warning()
I suggest using the context-manager though since you are sure about where you suppress warnings. This way you will not suppress warnings from unexpected places.
Use the solver and max_iter to solve the problem...
from sklearn.linear_model import LogisticRegression
clf=LogisticRegression(solver='lbfgs', max_iter=500000).fit(x_train, y_train)
I have doing an analysis trying to see the relation between training time and maximum iteration in SVC. The data I use is some randomly generated number and I plotted the training time against max_iter of SVC fit. I checked logs and each binary classifier has reached the max_iter (I output all console logs which showed detailed warning for each binary classifier and count them). However, I was assuming that the training time will be strictly linear to the iteration but actually, in the case that the training data has many labels e.g. say 40, then the plot does not show it's linear.
It seems as the maximum iteration goes up, each iteration takes slight less time than before. While if we changed label_size to be 2 (which means each fit contains only 1 binary classifier), the line is straight.
What causes that to happen?
Here is my source code:
# -*- coding: utf-8 -*-
import numpy as np
from sklearn.svm import SVC
import time
import pandas as pd
def main(row_size, label_size):
np.random.seed(2019)
y = np.array([i for i in range(label_size) for j in range(row_size
/ label_size)])
if len(y) < row_size:
y = np.append(y, [y[-1]] * (row_size - len(y)))
X = np.random.rand(row_size, 300)
print X.shape, y.shape
return (X, y)
def train_svm(X, y, max_iter):
best_params = {'C': 1}
clf = SVC(
C=best_params['C'],
kernel=str('linear'),
probability=False,
class_weight='balanced',
max_iter=max_iter,
random_state=2018,
verbose=True,
)
start = time.time()
clf.fit(X, y)
end = time.time()
return end - start
if __name__ == '__main__':
row_size = 20000
m_iter = range(10, 401, 20)
label_size = [40]
data = {
'label_size': [],
'max_iter': [],
'row_size': [],
'time': [],
}
for it in m_iter:
for l in label_size:
(X, y) = main(row_size, l)
t = train_svm(X, y, max_iter=it)
data['label_size'].append(l)
data['max_iter'].append(it)
data['row_size'].append(row_size)
data['time'].append(t)
df = pd.DataFrame(data)
df.to_csv('svc_iter.csv', index=None)
Well, there could be loads of reasons for that "very slight change". Scikit-Learn doesn't operate natively, it's built upon different libraries and it may be using loads of optimizers..etc!
Besides, your first graph is very close to linear!
Nevertheless, a big noticeable reasonable factor that contributed in those tiny changes is the Decomposition Method in Support Vector Machine.
The idea of decomposition methodology for classification tasks is to break down a complex classification task into several simpler and more manageable sub-tasks that are solvable by using existing induction methods, then joining their solutions together in order to solve the original problem.
This method is an iterative process and in each iteration only few variables are updated.
For more details about the mathematical approach, please refer to this paper, section 6.2 The Decomposition Method..
Moreover and specifically speaking, SVM implements two tricks called shrinking and caching for the decomposition method.
The Shrinking idea is that an optimal solution α of the SVM dual problem may contain some bounded elements (i.e., αi = 0 or C). These elements may have already been bounded in the middle of the decomposition iterations. To save the training time, the shrinking technique tries to identify and remove some bounded elements, so a smaller optimization problem is solved.
The Caching idea is an effective technique for reducing the computational time of the decomposition method, so elements are calculated as needed. We can use available memory (called kernel cache) to store some recently used permutation of the matrix Qij. Then, some kernel elements may not need to be recalculate.
For more details about the mathematical approach, please refer to this paper, section 5 Shrinking and Caching.
Technical Proof:
I repeated your experiment (that's way I asked for your code to follow the same exact approach), with and without using the shrinking and caching optimization.
Using Shrinking and Caching
The default value of the parameter shrinking in sklearn SVC is set to True, keeping that as it is, produced the following output:
Plotting it gives:
Note how at some point, the time drops noticeably reflecting the effect of shrinking and caching.
Without Using Shrinking and Caching
Using the same exact approach but this time, setting the parameter shrinking explicitly to False as follows:
clf = SVC(
C=best_params['C'],
kernel=str('linear'),
probability=False,
class_weight='balanced',
max_iter=max_iter,
random_state=2018,
verbose=True,
shrinking=False
)
Produced the following output:
Plotting it gives:
Note how unlike previous plot, there is no noticeable drop in time at some point, it's rather just a very tiny fluctuations along with the entire plot.
Comparing Pearson Correlations
In conclusion:
Without using the Shrinking and Caching (updated later with caching), the linearity improved, although it's not 100% linear, but if you take into account that Scikit-Learn internally uses libsvm library to handle all computations. And this library is wrapped using C and Cython, you would have a higher tolerance to your definition about 'Linear' of the relationship between maximum iterations and time. Also, here is a cool discussion about why algorithms may not give the exact same precise definite running time every time.
And that would be even clearer to you if you plot the interval times, so you can see clearly how the drops happen suddenly noticeably in more than one place.
While it keeps almost the same flow without using the optimization tricks.
Important Update
It turned out that the aforementioned reason for this issue (i.e. Shrinking and Caching) is correct, or more precisely, it's a very big factor of that phenomenon.
But the thing I missed is the following:
I was talking about Shrinking and Caching but I missed the later parameter for caching which is set by default to 200 MB.
Repeating the same simulations more than one time and setting the cache_size parameter to a very small number (because zero is not acceptable and throws an error) in addition to shrinking=False, resulted in an extremely-close-to linear pattern between max_iter and time:
clf = SVC(
C=best_params['C'],
kernel=str('linear'),
probability=False,
class_weight='balanced',
max_iter=max_iter,
random_state=2018,
verbose=False,
shrinking=False,
cache_size = 0.000000001
)
By the way, you don't need to set verbose=True, you can check if it reached the maximum iteration via the ConvergenceWarning, so you can redirect those warnings to a file and it'll be million times easier to follow, just add this code:
import warnings, sys
def customwarn(message, category, filename, lineno, file=None, line=None):
with open('warnings.txt', 'a') as the_file:
the_file.write(warnings.formatwarning(message, category, filename, lineno))
warnings.showwarning = customwarn
Also you don't need to re-generate the dataset after each iteration, so take it out the loop like this:
(X, y) = main(row_size, 40)
for it in m_iter:
....
....
Final Conclusion
Shrinking and Caching tricks coming from Decomposition Method in SVM play a big significant role in improving the execution time as the number of iterations increases. Besides, there are other small players that may be contributing in this matter such as internal usage of libsvm library to handle all computations which is wrapped using C and Cython.
I have some code that does an LDA model on a bunch of CSV lines.
lda_model = LatentDirichletAllocation(
n_components=20, # Number of topics
max_iter=10, # Max learning iterations
learning_method='online',
random_state=100, # Random state
batch_size=128, # n docs in each learning iter
evaluate_every = -1, # compute perplexity every n iters, default: Don't
n_jobs = -1, # Use all available CPUs
)
if __name__ == "__main__":
lda_output = lda_model.fit_transform(data_vectorized)
print(lda_model)
I ran it the first time without the if_name_ line and had no issues. The second time I ran it, I had the error
ImportError: [joblib] Attempting to do parallel computing without
protecting your import on a system that does not support forking. To
use parallel-computing in a script, you must protect your main loop
using "if __name__ == '__main__'". Please see the joblib documentation
on Parallel for more information.
So I tried to add the if_name_ code to get it to work. I'm still having issues. I've tried inserting that in various spots of my code (I'm on Windows) and not getting anything to work. Do I need to add something else?
I am trying to solve the problem of a least squares fit of a power law spliced to a third order polynomial in python using gradient descent. I have computed gradients with respect to the parameters in Matlab. The boundary conditions I computed by hand. I am running into a syntax error in my chi-squared minimization algorithm, which must take into account the boundary conditions. I am doing this for a machine learning class in which I am completing a somewhat self-directed and self-proposed long term project, but I am stuck because of this syntax error that I am not sure how to overcome. I will not get class credit for this. It is simply something to put on my resume.
def polypowerderiv(x,a1,b1,c1,a2,b2,c2,d2,boundaryx,ydat):
#need to minimize square of ydat-polypower
#from Mathematica, to be careful
gradd2=2*(d2+c2*x+b2*x**2+a2*x**3-ydat)
gradc2=gradd2*x
gradb2=gradc2*x
grada2=gradb2*x
#again from Mathematica, to be careful
gradc1=2(c+a1*x**b1-ydat)
grada1=gradc1*x**b1
gradb1=grada1*a1*log(x)
return [np.sum(grada1),np.sum(gradb1),\
np.sum(gradc1),np.sum(grada2),np.sum(gradb2),\
np.sum(gradc2),np.sum(gradd2)]
def manualleastabsolutedifference(xdat, ydat, params,seed, maxiter, learningrate):
chisq=0 #chisq is the L2 error of the fit relative to the ydata
dof=len(xdat)-len(params)
xparams=seed
for step in np.arange(maxiter):
a1,b1,c1,a2,b2,c2,d2=params
chisq=polypowerlaw(xdat,params)
for i in np.arange(len(xdat)):
grad=np.zeros(len(seed))
for i in np.arange(seed):
polypowerlawboundarysolver=\
polypowerboundaryconstraint(xdat,a1,b1,c1,a2,b2,c2)
boundaryx=minimize(polypowerlawboundarysolver,x0=1000)
#hard coded to be half of len(xdat)
chisq+=abs(ydat-\
polypower(xdat,a1,b1,c1,a2,b2,c2,d2,boundaryx)
grad=\
polypowerderiv(xdat,a1,b1,c1,\
a2,b2,c2,d2,boundaryx,ydat)
params+=learningrate*grad
return params
The error I get is:
File "", line 14
grad=polypowerderiv(xdat,a1,b1,c1,a2,b2,c2,d2,boundaryx,ydat)
^
SyntaxError: invalid syntax
Also, I'm having some small trouble with formatting. Please help. This one of my first few posts to Stack Overflow ever, after many years of up and down votes. Thank you for your extensive help, community.
As per Alan-Fey, you forgot a closing bracket:
chisq+=abs(ydat-\
polypower(xdat,a1,b1,c1,a2,b2,c2,d2,boundaryx)
should be
chisq+=abs(ydat-\
polypower(xdat,a1,b1,c1,a2,b2,c2,d2,boundaryx))
I have built a few off-the-shelf classifiers from sklearn and there are some expected scenarios where I know the classifier is bound to perform badly and not predict anything correctly. The sklearn.svm package runs without an error but raises the following warning.
~/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1074: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.
'precision', 'predicted', average, warn_for)
I wish to suppress this warning and instead replace with a message to stdout, say for instance, "poor classifier performance".
Is there any way to suppress warnings in general?
Suppressing all warnings is easy with -Wignore (see warning flag docs)
The warnings module can do some finer-tuning with filters (ignore just your warning type).
Capturing just your warning (assuming there isn't some API in the module to tweak it) and doing something special could be done using the warnings.catch_warnings context manager and code adapted from "Testing Warnings":
import warnings
class MyWarning(Warning):
pass
def something():
warnings.warn("magic warning", MyWarning)
with warnings.catch_warnings(record=True) as w:
# Trigger a warning.
something()
# Verify some things
if ((len(w) == 1)
and issubclass(w[0].category, MyWarning)
and "magic" in str(w[-1].message)):
print('something magical')