LightGBM ignore warning about "boost_from_average"

LightGBM ignore warning about "boost_from_average" - python

I use LightGBM model (version 2.2.1). It shows next warning on train:
[LightGBM] [Warning] Starting from the 2.1.2 version, default value
for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the
previous versions of LightGBM. Try to set boost_from_average=false, if
your old models produce bad results
I found what it is about:github link.
But I don't use any old models or legacy code (it's new project created on 2.2.1 version of LightGBM), so I don't need to see this warning every time.
Also I know I can change verbose and turn off all warnings. But it's not really good - some another can be useful!
So my question is: Is it possible to turn off (hide) just this Warning?

Mikhail
Try to set the parameter boost_from_average when you create the model. Either True or False.
Best Regards,

Related

FastText version before most recent change

I was going through old FastText code, and started to realize it doesn't work anymore and expects different parameters. When looking at the dcoumentation , it appears the documentation has been partially updated.
Which it can be seen size and iter are not in the class definition shown in the docs despite being in the parameters. I was wondering if anyone knew exact version where this change has occured as it appears I've accidentally updated it to something newer.

Most changes occurred in gensim-4.0.0. There are a series of notes on the changes & how to adapt your code in the project wiki page, "Migrating from Gensim 3.x to 4":
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
In most cases small changes to the method & variable names older code is using will restore full functionality.
There have been significant fixes and optimizations to the FastText implementation, especially in the realm of reducing memory usage, so you probably don't want to stay on any older version (like gensim-3.8.3) except as a temporary quickie workaround.

Any ideas on what might be causing this TensorFlow error? (Object was never used)

I am training a sequence-to-sequence model on Keras with a Tensorflow backend, mostly following the tutorial here.
I'm using TensorFlow v1.2.1 on a IBM Power8 machine with a P100 GPU
When it hits my model.fit_generator() line, TensorFlow throws the following error:
Object was never used (type <class 'tensorflow.python.ops.tensor_array_ops.TensorArray'>):
<tensorflow.python.ops.tensor_array_ops.TensorArray object at 0x3bfffc096dd8>
If you want to mark it as used call its "mark_used()" method.
I tried looking for unused operations/tensors, but couldn't find any. Then, I marked every operation/tensor as used, but I still cannot get rid of this error.

Usually this error fires up when some stateful operation in tensorflow is never passed to session.run or used as a control dependency, which means some updates will get silently dropped leading to wrong behavior. That said, try upgrading to see if the fault is some internal library and not your code.

I encountered this error, while working with TensorArrays, and as per the official documentation of TensorArrays: https://www.tensorflow.org/api_docs/python/tf/TensorArray
It is mentioned "Note: The output of this function should be used. If it is not, a warning will be logged or an error may be raised. To mark the output as used, call its .mark_used() method."
So, I was facing this error while using the "write()" method of the TensorArray.
tfa=tf.TensorArray(tf.float32, size=10)
tfa.write(1,10)
Which was later resolved when used in this way:
tfa=tf.TensorArray(tf.float32, size=10)
tfa.write(1,10).mark_used()

Debugger times out at "Collecting data..."

I am debugging a Python (3.5) program with PyCharm (PyCharm Community Edition 2016.2.2 ; Build #PC-162.1812.1, built on August 16, 2016 ; JRE: 1.8.0_76-release-b216 x86 ; JVM: OpenJDK Server VM by JetBrains s.r.o) on Windows 10.
The problem: when stopped at some breakpoints, the Debugger window is stuck at "Collecting data", which eventually timeout. (with Unable to display frame variables)
The data to be displayed is neither special, nor particularly large. It is somehow available to PyCharm since a conditional break point on some values of the said data works fine (the program breaks) -- it looks like the process to gather it for display only (as opposed to operational purposes) fails.
When I step into a function around the place I have my breakpoint, its data is displayed correctly. When I go up the stack (to the calling function, the one I stepped down from and where I wanted initially to have the breakpoint) - I am stuck with the "Collecting data" timeout again.
There have been numerous issues raised with the same point since at least 2005. Some were fixed, some not. The fixes were usually updates to the latest version (which I have).
Is there a general direction I can go to in order to fix or work around this family of problems?
EDIT: a year later the problem is still there and there is still no reaction from the devs/support after the bug was raised.
EDIT April 2018: It looks like the problem is solved in the 2018.1 version, the following code which was hanging when setting a breakpoint on the print line now works (I can see the variables):
import threading
def worker():
a = 3
print('hello')
threading.Thread(target=worker).start()

I had the same issue with Pycharm 2018.2 when working on a complex Flask project with SocketIO.
When I put a debug breakpoint inside the code and pressed the debug button, it stopped at the breakpoint, but the variables didn't load. It was just infinitely collecting data. I enabled Gevent compatibility and it resolved the issue. Here is where you can find the setting:

In case you landed here because you are using PyTorch (or any other deep learning library) and try to debug in PyCharm (torch 1.31, PyCharm 2019.2 in my case) but it's super slow:
Enable Gevent compatible in the Python Debugger settings as linkliu mayuyu pointed out. The problem might be caused due to debugging large deep learning models (BERT transformer in my case), but I'm not entirely sure about this.
I'm adding this answer as it's end of 2019 and this doesn't seem to be fixed yet. Further I think this is affecting many engineers using deep learning, so I hope my answer-formatting triggers their stackoverflow algorithm :-)
Note (June 2020):
While adding the Gevent compatible allows you to debug PyTorch models, it will prevent you from debug your Flask application in PyCharm! My breakpoints were not working anymore and it took me a while to figure out that this flag is the reason for it. So make sure to enable it only on a per-project base.

I also had this issue when I was working on code using sympy and the Python module 'Lea' aiming to calculate probability distributions.
The action I took that resolved the timeout issue was to change the 'Variables Loading Policy' in the debug setting from the default 'Asynchronously' to 'Synchronously'.

I think that this is caused by some classes having a default method __str__() that is too verbose. Pycharm calls this method to display the local variables when it hits a breakpoint, and it gets stuck while loading the string.
A trick I use to overcome this is manually editing the class that is causing the error and substitute the __str__() method for something less verbose.
As an example, it happens for pytorch _TensorBase class (and all tensor classes extending it), and can be solved by editing the pytorch source torch/tensor.py, changing the __str__() method as:
def __str__(self):
# All strings are unicode in Python 3, while we have to encode unicode
# strings in Python2. If we can't, let python decide the best
# characters to replace unicode characters with.
return str() + ' Use .numpy() to print'
#if sys.version_info > (3,):
# return _tensor_str._str(self)
#else:
# if hasattr(sys.stdout, 'encoding'):
# return _tensor_str._str(self).encode(
# sys.stdout.encoding or 'UTF-8', 'replace')
# else:
# return _tensor_str._str(self).encode('UTF-8', 'replace')
Far from optimum, but comes in hand.
UPDATE: The error seems solved in the last PyCharm version (2018.1), at least for the case that was affecting me.

I met the same problem when I try to run some Deep Learning scripts written by PyTorch (PyCharm 2019.3).
I finally figured out that the problem is I set num_workers in DataLoader to a large value (in my case 20).
So, in the debug mode, I would suggest to set num_workers to 1.

For me, the solution was removing manual watches every-time before starting to debug. If there were any existing manual watches in the "variables" window then it would remain stuck in "Collecting data...".

Using Odoo or Other Large Python Server
None of the above solution worked for me despite I tried all.
It normally works but saldomly gives this annoying Collecting data... or sometimes Timed Out....
The solution is to restart Pycharm and set less breakpoints as possible. after that it starts to work again.
I don't know way is doing that (maybe too many breakpoint) but it worked.

Appengine Search API - Globally Consistent

I've been using the appengine python experimental searchAPI. It works great. With release 1.7.3 I updated all of the deprecated methods. However, I am now getting this warning:
DeprecationWarning: consistency is deprecated. GLOBALLY_CONSIST
However, I'm not sure how to address it in my code. Can anyone point me in the right direction?

This depends on whether or not you have any globally consistent indexes. If you do, then you should migrate all of your data from those indexes to new, per-document-consistent (which is the default) indexes. To do this:
Loop through the documents you have stored in the global index and reindexing them in the new index.
Change references from the global index to the new per-document index.
Ensure everything works, then delete the documents from your global index (not necessary to complete the migration, but still a good idea).
You then should remove any mention of consistency from your code; the default is per-document consistent, and eventually we will remove the ability to specify a consistency at all.
If you don't have any data in a globally consistent index, you're probably getting the warning because you're specifying a consistency. If you stop specifying the consistency it should go away.
Note that there is a known issue with the Python API that causes a lot of erroneous deprecation warnings about consistency, so you could be seeing that as well. That issue will be fixed in the next release.

Regression testing when "test oracle" is an informal output comparison

I maintain a Python program that provides advice on certain topics. It does this by applying a complicated algorithm to the input data.
The program code is regularly changed, both to resolve newly found bugs, and to modify the underlying algorithm.
I want to use regression tests. Trouble is, there's no way to tell what the "correct" output is for a certain input - other than by running the program (and even then, only if it has no bugs).
I describe below my current testing process. My question is whether there are tools to help automate this process (and of course, if there is any other feedback on what I'm doing).
The first time the program seemed to run correctly for all my input cases, I saved their outputs in a folder I designated for "validated" outputs. "Validated" means that the output is, to the best of my knowledge, correct for a given version of my program.
If I find a bug, I make whatever changes I think would fix it. I then rerun the program on all the input sets, and manually compare the outputs. Whenever the output changes, I do my best to informally review those changes and figure out whether:
the changes are exclusively due to the bug fix, or
the changes are due, at least in part, to a new bug I introduced
In case 1, I increment the internal version counter. I mark the output file with a suffix equal to the version counter and move it to the "validated" folder. I then commit the changes to the Mercurial repository.
If in the future, when this version is no longer current, I decide to branch off it, I'll need these validated outputs as the "correct" ones for this particular version.
In case 2, I of course try to find the newly introduced bug, and fix it. This process continues until I believe the only changes versus the previous validated version are due to the intended bug fixes.
When I modify the code to change the algorithm, I follow a similar process.

Here's the approach I'll probably use.
Have Mercurial manage the code, the input files, and the regression test outputs.
Start from a certain parent revision.
Make and document (preferably as few as possible) modifications.
Run regression tests.
Review the differences with the parent revision regression test output.
If these differences do not match the expectations, try to see whether a new bug was introduced or whether the expectations were incorrect. Either fix the new bug and go to 3, or update the expectations and go to 4.
Copy the output of regression tests to the folder designated for validated outputs.
Commit the changes to Mercurial (including the code, the input files, and the output files).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.