What does "..." means in a python def - python

This might be a very stupid question but I can't understand what the three dots stands for in a python def. I was trying to comprehend the cost of the in operator in a deque object (from the collections module), so I navigated through the code and here's what I found:
I thought they mean the method will use the "upper" definition when called, but if I navigate to the overridden method I can't find nothing if not an abstract method in the Container class.. so I still don't get how the in operator works on a deque object.

You are looking at a .pyi stub file. Referring to this post, a stub file, as the name suggests, is only meant to describe the interface and not the implementation inside. Hence, ... in a Python def really means that this file is just a def and you cannot find the implementation here.
Regarding your question about the cost of in operator in deque, refer to https://wiki.python.org/moin/TimeComplexity
It mentions deque is represented internally as a doubly-linked list, and also mentions that in operator for a list has O(n) complexity. I don't think it being a doubly-linked list changes the time complexity, as you would still need to go through each element, i.e., O(n).

Related

What is faster: iterating through Python AST to find particular type nodes, or override the visit_type method?

The ast module in Python allows multiple traversal strategies. I want to understand, is there any significant gain in terms of complexity when choosing a specific way of traversal?
Here are two examples:
Example 1
class GlobalVisitor(ast.NodeTransformer):
def generic_visit(self, tree):
for node in tree.body:
if isinstance(node, ast.Global):
*transform the ast*
Example 2
class GlobalVisitor(ast.NodeTransformer):
def visit_Global(self, tree):
*transform the ast*
In Example 1, I override the generic_visit method, providing my own implementation of how I want to traverse the tree. This, however, happens through through visiting every node in the body, so O(n).
In Example 2, I override the visit_Global, and I am thus able to do stuff with all Global type nodes immediately. That's how ast works.
I want to understand, in Example 2, does ast have instant O(1) access to the nodes I specify through overriding visit_field(self, node), or it just goes through the tree again in O(n), looking for the nodes I need in the background, just simplifying my life a little bit?
Some takeaways from the comments provided by #metatoaster, #user2357112 and #rici :
1. Example 1 is completely wrong. One should not aim to traverse the tree in the way that was described, because iterating over tree.body is completely wrong - tree.body isn't a collection of every node in an AST. It's an attribute of Module nodes that gives a list of the nodes for top-level statements in the module. It will miss every global statement that matters (since barring extremely weird exec cases, a correct global statement is never top-level), it will crash on non-Module node input..
If you want to implement a correct version of Example 1, just recursively iterate using ast.iter_child_nodes. However, note that iter_child_nodes is correctly named. It is not iter_descendant_nodes. It does not visit anything other than direct children. The recursive walk must be implemented in the action performed on each child.
2. When implemented correctly, two approached are equivalent, and imply a recursive traversal, however overriding a visit_type(self, node) saves you some time. No gain in terms of complexity will be achieved.
3. Only use NodeTransformer if you want to alter the AST, otherwise just use NodeVisitor.
Finally, ast doesn't seem to be documented exhaustively enough, refer to this for a more detailed documentation. It is a bit outdated (by ~ a year), but explains some fundamentals better than the original ast.

How do I iterate through a dictionary/set in SLY?

So, I'm trying to transition my code from my earlier PLY implementation to SLY. Previously, I had some code that loaded a binary file with a wide range of reserved words scraped from documentation of the scripting language I'm trying to implement. However, when I try to iterate through the scraped items in the lexer for SLY, I get an error message inside LexerMetaDict's __setitem__ when trying to iterate through the resulting set of:
Exception has occurred: AttributeError
Name transition redefined
File "C:\dev\sly\sly\lex.py", line 126, in __setitem__
raise AttributeError(f'Name {key} redefined')
File "C:\dev\sly\example\HeroLab\HeroLab.py", line 24, in HeroLabLexer
for transition in transition_set:
File "C:\dev\sly\example\HeroLab\HeroLab.py", line 6, in <module>
class HeroLabLexer(Lexer):
The code in question:
from transitions import transition_set, reference_set
class HeroLabLexer(Lexer):
# initial token assignments
for transition in transition_set:
tokens.add(transition)
I might not be as surprised if it were happening when trying to add to the tokens, since I'm still trying to figure out how to interface with the SLY method for defining things, but if I change that line to a print statement, it still fails when I iterate through the second item in "transition_set". I've tried renaming the various variables, but to little avail.
The error you get is the result of a modification Sly makes to the Lexer metaclass, which I discuss below. But for a simple answer: I assume tokens is a set, so you can easily avoid the problem with
tokens |= transition_set
If transition_set were an iterable but not a set, you could use the update method, which works with any iterable (and any number of iterable arguments):
tokens.update(transition_set)
tokens doesn't have to be a set. Sly should work with any iterable. But you might need to adjust the above expressions. If tokens is a tuple or a list, you'd use += instead of |= and, in the case of lists, extend instead of update. (There are some minor differences, as with the fact that set.update can be used to merge several sets.)
That doesn't answer your direct question, "How do I iterate... in SLY". I interpret that as asking:
How do I write a for loop at class scope in a class derived from sly.lexer?
and that's a harder question. The games which Sly plays with Python namespaces make it difficult to use for loops in the class scope of a lexer, because Sly replaces the attribute dictionary of Lexer class (and its subclasses) with a special dictionary which doesn't allow redefinition of attributes with string values. Since the iteration variable in a for statement is in the enclosing scope --in this case is the class scope--, any for loop with a string index variable and whose body runs more than once will trigger the "Name redefined" error which you experienced.
It's also worth noting that if you use a for statement at class scope in any class, the last value of the iteration variable will become a class attribute. That's hardly ever desirable, and, really, that construction is not good practice in any class. But it doesn't usually throw an error.
At class scope, you can use comprehensions (whose iteration variables are effectively locals). Of course, in this case, there's no advantage in writing:
tokens.update(transition for transition in transition_set)
But the construct might be useful for other situations. Note, however, that other variables at class scope (such as tokens) are not visible in the body of the comprehension, which might also create difficulties.
Although it's extremely ugly, you can declare the iteration variable as a global, which makes it a module variable rather than a class variable (and therefore just trades one bad practice for another one, although you can later remove the variable from the module).
You could do the computation in a different scope (such as a global function), or you could write a (global) generator to use with tokens.update(), which is probably the most general solution.
Finally, you can make sure that the index variable is never an instance of a str.

In python, can I lazily generate copies of an iterator using tee?

I'm trying to create an iterator which lazily creates (potentially infinitely many) copies of an iterator. Is this possible?
I know I can create any fixed finite number of copies by simply doing
from itertools import tee
iter_copies = tee(my_iter, n=10)
but this breaks down if you don't know n ahead of time or if n is infinite.
I would usually try something along the lines of
from itertools import tee
def inf_tee(my_iter):
while True:
yield tee(my_iter)[1]
But the documentation states that after using tee on an iterator the original iterator can no longer be used, so this won't work.
In case you're interested in the application: the idea is to create a lazy unzip function, potentially for use in pytoolz. My current implementation can handle a finite number of infinite iterators (which is better than plain zip(*seq)), but not an infinite number of infinite iterators. Here's the pull request if you're interested in the details.
This is only barely touched upon in a single example near the bottom of the Python 2 itertools documentation, but itertools.tee supports copying:
import itertools, copy
def infinite_copies(some_iterable):
master, copy1 = itertools.tee(some_iterable)
yield copy1
while True:
yield copy.copy(master)
The example in the documentation actually uses the __copy__ magic method, which is the hook used to customize copy.copy behavior. (Apparently tee.__copy__ was added as part of a copyable iterators project that didn't go anywhere.)
Note that this will require storing every element ever produced by the original iterator, which can get very expensive. There is no way to avoid this cost.

Duck typing trouble. Duck typing test for "i-am-like-a-list"

USAGE CONTEXT ADDED AT END
I often want to operate on an abstract object like a list. e.g.
def list_ish(thing):
for i in xrange(0,len(thing)):
print thing[i]
Now this appropriate if thing is a list, but will fail if thing is a dict for example. what is the pythonic why to ask "do you behave like a list?"
NOTE:
hasattr('__getitem__') and not hasattr('keys')
this will work for all cases I can think of, but I don't like defining a duck type negatively, as I expect there could be cases that it does not catch.
really what I want is to ask.
"hey do you operate on integer indicies in the way I expect a list to do?" e.g.
thing[i], thing[4:7] = [...], etc.
NOTE: I do not want to simply execute my operations inside of a large try/except, since they are destructive. it is not cool to try and fail here....
USAGE CONTEXT
-- A "point-lists" is a list-like-thing that contains dict-like-things as its elements.
-- A "matrix" is a list-like-thing that contains list-like-things
-- I have a library of functions that operate on point-lists and also in an analogous way on matrix like things.
-- for example, From the users point of view destructive operations like the "spreadsheet-like" operations "column-slice" can operate on both matrix objects and also on point-list objects in an analogous way -- the resulting thing is like the original one, but only has the specified columns.
-- since this particular operation is destructive it would not be cool to proceed as if an object were a matrix, only to find out part way thru the operation, it was really a point-list or none-of-the-above.
-- I want my 'is_matrix' and 'is_point_list' tests to be performant, since they sometimes occur inside inner loops. So I would be satisfied with a test which only investigated element zero for example.
-- I would prefer tests that do not involve construction of temporary objects, just to determine an object's type, but maybe that is not the python way.
in general I find the whole duck typing thing to be kinda messy, and fraught with bugs and slowness, but maybe I dont yet think like a true Pythonista
happy to drink more kool-aid...
One thing you can do, that should work quickly on a normal list and fail on a normal dict, is taking a zero-length slice from the front:
try:
thing[:0]
except TypeError:
# probably not list-like
else:
# probably list-like
The slice fails on dicts because slices are not hashable.
However, str and unicode also pass this test, and you mention that you are doing destructive edits. That means you probably also want to check for __delitem__ and __setitem__:
def supports_slices_and_editing(thing):
if hasattr(thing, '__setitem__') and hasattr(thing, '__delitem__'):
try:
thing[:0]
return True
except TypeError:
pass
return False
I suggest you organize the requirements you have for your input, and the range of possible inputs you want your function to handle, more explicitly than you have so far in your question. If you really just wanted to handle lists and dicts, you'd be using isinstance, right? Maybe what your method does could only ever delete items, or only ever replace items, so you don't need to check for the other capability. Document these requirements for future reference.
When dealing with built-in types, you can use the Abstract Base Classes. In your case, you may want to test against collections.Sequence or collections.MutableSequence:
if isinstance(your_thing, collections.Sequence):
# access your_thing as a list
This is supported in all Python versions after (and including) 2.6.
If you are using your own classes to build your_thing, I'd recommend that you inherit from these abstract base classes as well (directly or indirectly). This way, you can ensure that the sequence interface is implemented correctly, and avoid all the typing mess.
And for third-party libraries, there's no simple way to check for a sequence interface, if the third-party classes didn't inherit from the built-in types or abstract classes. In this case you'll have to check for every interface that you're going to use, and only those you use. For example, your list_ish function used __len__ and __getitem__, so only check whether these two methods exist. A wrong behavior of __getitem__ (e.g. a dict) should raise an exception.
Perhaps their is no ideal pythonic answer here, so I am proposing a 'hack' solution, but don't know enough about the class structure of python to know if I am getting this right:
def is_list_like(thing):
return hasattr(thing, '__setslice__')
def is_dict_like(thing):
return hasattr(thing, 'keys')
My reduce goals here are to simply have performant tests that will:
(1) never call a dict-thing, nor a string-like-thing a list List item
(2) returns the right answer for python types
(3) will return the right answer if someone implement a "full" set of core method for a list/dict
(4) is fast (ideally does not allocate objects during the test)
EDIT: Incorporated ideas from #DanGetz

Best way to store and use a large text-file in python

I'm creating a networked server for a boggle-clone I wrote in python, which accepts users, solves the boards, and scores the player input. The dictionary file I'm using is 1.8MB (the ENABLE2K dictionary), and I need it to be available to several game solver classes. Right now, I have it so that each class iterates through the file line-by-line and generates a hash table(associative array), but the more solver classes I instantiate, the more memory it takes up.
What I would like to do is import the dictionary file once and pass it to each solver instance as they need it. But what is the best way to do this? Should I import the dictionary in the global space, then access it in the solver class as globals()['dictionary']? Or should I import the dictionary then pass it as an argument to the class constructor? Is one of these better than the other? Is there a third option?
If you create a dictionary.py module, containing code which reads the file and builds a dictionary, this code will only be executed the first time it is imported. Further imports will return a reference to the existing module instance. As such, your classes can:
import dictionary
dictionary.words[whatever]
where dictionary.py has:
words = {}
# read file and add to 'words'
Even though it is essentially a singleton at this point, the usual arguments against globals apply. For a pythonic singleton-substitute, look up the "borg" object.
That's really the only difference. Once the dictionary object is created, you are only binding new references as you pass it along unless if you explicitly perform a deep copy. It makes sense that it is centrally constructed once and only once so long as each solver instance does not require a private copy for modification.
Adam, remember that in Python when you say:
a = read_dict_from_file()
b = a
... you are not actually copying a, and thus using more memory, you are merely making b another reference to the same object.
So basically any of the solutions you propose will be far better in terms of memory usage. Basically, read in the dictionary once and then hang on to a reference to that. Whether you do it with a global variable, or pass it to each instance, or something else, you'll be referencing the same object and not duplicating it.
Which one is most Pythonic? That's a whole 'nother can of worms, but here's what I would do personally:
def main(args):
run_initialization_stuff()
dictionary = read_dictionary_from_file()
solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ]
HTH.
Depending on what your dict contains, you may be interested in the 'shelve' or 'anydbm' modules. They give you dict-like interfaces (just strings as keys and items for 'anydbm', and strings as keys and any python object as item for 'shelve') but the data is actually in a DBM file (gdbm, ndbm, dbhash, bsddb, depending on what's available on the platform.) You probably still want to share the actual database between classes as you are asking for, but it would avoid the parsing-the-textfile step as well as the keeping-it-all-in-memory bit.

Categories