Appengine Search API - Globally Consistent - python

I've been using the appengine python experimental searchAPI. It works great. With release 1.7.3 I updated all of the deprecated methods. However, I am now getting this warning:
DeprecationWarning: consistency is deprecated. GLOBALLY_CONSIST
However, I'm not sure how to address it in my code. Can anyone point me in the right direction?

This depends on whether or not you have any globally consistent indexes. If you do, then you should migrate all of your data from those indexes to new, per-document-consistent (which is the default) indexes. To do this:
Loop through the documents you have stored in the global index and reindexing them in the new index.
Change references from the global index to the new per-document index.
Ensure everything works, then delete the documents from your global index (not necessary to complete the migration, but still a good idea).
You then should remove any mention of consistency from your code; the default is per-document consistent, and eventually we will remove the ability to specify a consistency at all.
If you don't have any data in a globally consistent index, you're probably getting the warning because you're specifying a consistency. If you stop specifying the consistency it should go away.
Note that there is a known issue with the Python API that causes a lot of erroneous deprecation warnings about consistency, so you could be seeing that as well. That issue will be fixed in the next release.

Related

How to get programatic access to list of Pep8 Error codes

I'm having a tough time finding how how to get a list of the pep8 error codes and some table or function to lookup their messages or properties.
There is a list here:
https://gist.github.com/mjgreen/bda13692b696669cb3fcd5a0fb682958
I've tried pep8, pycodestyle, flake8, and autoflake8, but I'm not having luck. Whatever way exists to grab these doesn't exist the way I thought it would. (I was expecting a hard-coded message table). I'm probably just being dense. My thought is flake8 must have a way to get them, it prints them out.
How do I import flake8 and then just generate some dictionary that maps error codes to error messages / and or properties?
if you're looking particularly for "codes that could be emitted from flake8 plugins" given your current plugin set there isn't a way to know all of those
the way plugin registration works is flake8 is handed a prefix of the error codes that a plugin might emit -- the plugin is then free to emit anything in that range (and with any arbitrary string for message)
the plugin is also free to emit error codes outside of their prefix, but they are ignored (this may be come an error in the future, I haven't quite decided what the correct behaviour here is yet)
collecting those prefixes is relatively straightforward (though you'd probably also want to think about flake8:local-plugins as well):
for dist in importlib.metadata.distributions():
for ep in dist.entry_points or ():
if ep.group == 'flake8.extension':
print(ep.name)
note here that pycodestyle is a special case here (due to historical reasons)
pycodestyle (previously named pep8) - the tool which implements the "pep8" error codes also does not provide programmatic access to the codes and messages (they are inlined in the functions which produce the errors)
as such any such list of codes and messages is unofficial and likely outdated / incorrect
disclaimer: I am the current flake8 maintainer and I am one of the maintainers of pycodestyle

FastText version before most recent change

I was going through old FastText code, and started to realize it doesn't work anymore and expects different parameters. When looking at the dcoumentation , it appears the documentation has been partially updated.
Which it can be seen size and iter are not in the class definition shown in the docs despite being in the parameters. I was wondering if anyone knew exact version where this change has occured as it appears I've accidentally updated it to something newer.
Most changes occurred in gensim-4.0.0. There are a series of notes on the changes & how to adapt your code in the project wiki page, "Migrating from Gensim 3.x to 4":
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
In most cases small changes to the method & variable names older code is using will restore full functionality.
There have been significant fixes and optimizations to the FastText implementation, especially in the realm of reducing memory usage, so you probably don't want to stay on any older version (like gensim-3.8.3) except as a temporary quickie workaround.

Difference between approxCountDsitinct and approx_count_distinct in spark functions

Can anyone tell the difference between pyspark.sql.functions.approxCountDistinct (I know it is deprecated) and pyspark.sql.functions.approx_count_distinct? I have used both versions in a project and have experienced different values
As you mentioned it, pyspark.sql.functions.approxCountDistinct is deprecated. The reason is most likely just a style concern. They probably wanted everything to be in snake case. As you can see in the source code pyspark.sql.functions.approxCountDistinct simply calls pyspark.sql.functions.approx_count_distinct, nothing more except giving you a warning. So regardless the one you use, the very same code runs in the end.
Also, still according to the source code, approx_count_distinct is based on the HyperLogLog++ algorithm. I am not very familiar with the algorithm but it is based on repetitive set merging. Therefore, the result will most likely depend on the order in which the various results of the executors are merged. Since this is not deterministic with spark, this could explain why you witness different results.

Combining -= and += modifiers in buildout scripts

This doesn't seem to work:
[buildout]
extends = buildout.cfg
eggs -= python-ldap
eggs += psycopg2
The behaviour always seems to be as though the eggs += psycopg2 line was not present. It doesn't matter which order the two lines are in.
Is this a bug? Is there a way to achieve this result?
Unfortunately, zc.buildout up to version 1.5.2 doesn't support this use-case. Either the addition or the subtraction will succeed.
What happens internally is this:
For each key, value pair defined in the inheriting section:
If the key is using +=, take the inherited value, add things, and store it as the new value.
If the key is using -=, take the inherited value, remove things, and store it as the new value.
After these updates the inherited section is copied, updated with the new values and this is used as the final result.
The ordering is defined by the usual python mapping semantics, thus undefined; either the addition or the subtraction runs last. Because both operations take their input from the inherited section, modify it, then store it as the new value, the operation that runs last overwrites the result of the operation that ran before.
I've committed a fix for this; I don't have rights to release a new version of buildout to pypi though, I'll have to poke those who do.
Edit: zc.buildout version 1.6 contains this fix.

Is it OK to inspect properties beginning with underscore?

I've been working on a very simple crud generator for pylons. I came up with something that inspects
SomeClass._sa_class_manager.mapper.c
Is it ok to inspect this (or to call methods begining with underscore)? I always kind of assumed this is legal though frowned upon as it relies heavily on the internal structure of a class/object. But hey, since python does not really have interfaces in the Java sense maybe it is OK.
It is intentional (in Python) that there are no "private" scopes. It is a convention that anything that starts with an underscore should not ideally be used, and hence you may not complain if its behavior or definition changes in a next version.
In general, this usually indicates that the method is effectively internal, rather than part of the documented interface, and should not be relied on. Future versions of the library are free to rename or remove such methods, so if you care about future compatability without having to rewrite, avoid doing it.
If it works, why not? You could have problems though when _sa_class_manager gets restructured, binding yourself to this specific version of SQLAlchemy, or creating more work to track the changes. As SQLAlchemy is a fast moving target, you may be there in a year already.
The preferable way would be to integrate your desired API into SQLAlchemy itself.
It's generally not a good idea, for reasons already mentioned. However, Python deliberately allows this behaviour in case there is no other way of doing something.
For example, if you have a closed-source compiled Python library where the author didn't think you'd need direct access to a certain object's internal state—but you really do—you can still get at the information you need. You have the same problems mentioned before of keeping up with different versions (if you're lucky enough that it's still maintained) but at least you can actually do what you wanted to do.

Categories