How do I iterate through a dictionary/set in SLY? - python

So, I'm trying to transition my code from my earlier PLY implementation to SLY. Previously, I had some code that loaded a binary file with a wide range of reserved words scraped from documentation of the scripting language I'm trying to implement. However, when I try to iterate through the scraped items in the lexer for SLY, I get an error message inside LexerMetaDict's __setitem__ when trying to iterate through the resulting set of:
Exception has occurred: AttributeError
Name transition redefined
File "C:\dev\sly\sly\lex.py", line 126, in __setitem__
raise AttributeError(f'Name {key} redefined')
File "C:\dev\sly\example\HeroLab\HeroLab.py", line 24, in HeroLabLexer
for transition in transition_set:
File "C:\dev\sly\example\HeroLab\HeroLab.py", line 6, in <module>
class HeroLabLexer(Lexer):
The code in question:
from transitions import transition_set, reference_set
class HeroLabLexer(Lexer):
# initial token assignments
for transition in transition_set:
tokens.add(transition)
I might not be as surprised if it were happening when trying to add to the tokens, since I'm still trying to figure out how to interface with the SLY method for defining things, but if I change that line to a print statement, it still fails when I iterate through the second item in "transition_set". I've tried renaming the various variables, but to little avail.

The error you get is the result of a modification Sly makes to the Lexer metaclass, which I discuss below. But for a simple answer: I assume tokens is a set, so you can easily avoid the problem with
tokens |= transition_set
If transition_set were an iterable but not a set, you could use the update method, which works with any iterable (and any number of iterable arguments):
tokens.update(transition_set)
tokens doesn't have to be a set. Sly should work with any iterable. But you might need to adjust the above expressions. If tokens is a tuple or a list, you'd use += instead of |= and, in the case of lists, extend instead of update. (There are some minor differences, as with the fact that set.update can be used to merge several sets.)
That doesn't answer your direct question, "How do I iterate... in SLY". I interpret that as asking:
How do I write a for loop at class scope in a class derived from sly.lexer?
and that's a harder question. The games which Sly plays with Python namespaces make it difficult to use for loops in the class scope of a lexer, because Sly replaces the attribute dictionary of Lexer class (and its subclasses) with a special dictionary which doesn't allow redefinition of attributes with string values. Since the iteration variable in a for statement is in the enclosing scope --in this case is the class scope--, any for loop with a string index variable and whose body runs more than once will trigger the "Name redefined" error which you experienced.
It's also worth noting that if you use a for statement at class scope in any class, the last value of the iteration variable will become a class attribute. That's hardly ever desirable, and, really, that construction is not good practice in any class. But it doesn't usually throw an error.
At class scope, you can use comprehensions (whose iteration variables are effectively locals). Of course, in this case, there's no advantage in writing:
tokens.update(transition for transition in transition_set)
But the construct might be useful for other situations. Note, however, that other variables at class scope (such as tokens) are not visible in the body of the comprehension, which might also create difficulties.
Although it's extremely ugly, you can declare the iteration variable as a global, which makes it a module variable rather than a class variable (and therefore just trades one bad practice for another one, although you can later remove the variable from the module).
You could do the computation in a different scope (such as a global function), or you could write a (global) generator to use with tokens.update(), which is probably the most general solution.
Finally, you can make sure that the index variable is never an instance of a str.

Related

Python: `locals()` as a default function argument

Suppose I have a module PyFoo.py that has a function bar. I want bar to print all of the local variables associated with the namespace that called it.
For example:
#! /usr/bin/env python
import PyFoo as pf
var1 = 'hi'
print locals()
pf.bar()
The two last lines would give the same output. So far I've tried defining bar as such:
def bar(x=locals):
print x()
def bar(x=locals()):
print x
But neither works. The first ends up being what's local to bar's namespace (which I guess is because that's when it's evaluated), and the second is as if I passed in globals (which I assume is because it's evaluated during import).
Is there a way I can have the default value of argument x of bar be all variables in the namespace which called bar?
EDIT 2018-07-29:
As has been pointed out, what was given was an XY Problem; as such, I'll give the specifics.
The module I'm putting together will allow the user to create various objects that represent different aspects of a numerical problem (e.x. various topology definitions, boundary conditions, constitutive models, ect.) and define how any given object interacts with any other object(s). The idea is for the user to import the module, define the various model entities that they need, and then call a function which will take all objects passed to it, make needed adjustments to ensure capability between them, and then write out a file that represents the entire numerical problem as a text file.
The module has a function generate that accepts each of the various types of aspects of the numerical problem. The default value for all arguments is an empty list. If a non-empty list is passed, then generate will use those instances for generating the completed numerical problem. If an argument is an empty list, then I'd like it to take in all instances in the namespace that called generate (which I will then parse out the appropriate instances for the argument).
EDIT 2018-07-29:
Sorry for any lack of understanding on my part (I'm not that strong of a programmer), but I think I might understand what you're saying with respect to an instance being declared or registered.
From my limited understanding, could this be done by creating some sort of registry dataset (like a list or dict) in the module that will be created when the module is imported, and that all module classes take this registry object in by default. During class initialization self can be appended to said dataset, and then the genereate function will take the registry as a default value for one of the arguments?
There's no way you can do what you want directly.
locals just returns the local variables in whatever namespace it's called in. As you've seen, you have access to the namespace the function is defined in at the time of definition, and you have access to the namespace of the function itself from within the function, but you don't have access to any other namespaces.
You can do what you want indirectly… but it's almost certainly a bad idea. At least this smells like an XY problem, and whatever it is you're actually trying to do, there's probably a better way to do it.
But occasionally it is necessary, so in case you have one of those cases:
The main good reason to want to know the locals of your caller is for some kind of debugging or other introspection function. And the way to do introspection is almost always through the inspect library.
In this case, what you want to inspect is the interpreter call stack. The calling function will be the first frame on the call stack behind your function's own frame.
You can get the raw stack frame:
inspect.currentframe().f_back
… or you can get a FrameInfo representing it:
inspect.stack()[1]
As explained at the top of the inspect docs, a frame object's local namespace is available as:
frame.f_locals
Note that this has all the same caveats that apply to getting your own locals with locals: what you get isn't the live namespace, but a mapping that, even if it is mutable, can't be used to modify the namespace (or, worse in 2.x, one that may or may not modify the namespace, unpredictably), and that has all cell and free variables flattened into their values rather than their cell references.
Also, see the big warning in the docs about not keeping frame objects alive unnecessarily (or calling their clear method if you need to keep a snapshot but not all of the references, but I think that only exists in 3.x).

Why is __code__ for a function(Python) mutable

In a previous question yesterday, in comments, I came to know that in python __code__ atrribute of a function is mutable. Hence I can write code as following
def foo():
print "Hello"
def foo2():
print "Hello 2"
foo()
foo.__code__ = foo2.__code__
foo()
Output
Hello
Hello 2
I tried googling, but either because there is no information(I highly doubt this), or the keyword (__code__) is not easily searchable, I couldn't find a use case for this.
It doesn't seem like "because most things in Python are mutable" is a reasonable answer either, because other attributes of functions — __closure__ and __globals__ — are explicitly read-only (from Objects/funcobject.c):
static PyMemberDef func_memberlist[] = {
{"__closure__", T_OBJECT, OFF(func_closure),
RESTRICTED|READONLY},
{"__doc__", T_OBJECT, OFF(func_doc), PY_WRITE_RESTRICTED},
{"__globals__", T_OBJECT, OFF(func_globals),
RESTRICTED|READONLY},
{"__module__", T_OBJECT, OFF(func_module), PY_WRITE_RESTRICTED},
{NULL} /* Sentinel */
};
Why would __code__ be writable while other attributes are read-only?
The fact is, most things in Python are mutable. So the real question is, why are __closure__ and __globals__ not?
The answer initially appears simple. Both of these things are containers for variables which the function might need. The code object itself does not carry its closed-over and global variables around with it; it merely knows how to get them from the function. It grabs the actual values out of these two attributes when the function is called.
But the scopes themselves are mutable, so this answer is unsatisfying. We need to explain why modifying these things in particular would break stuff.
For __closure__, we can look to its structure. It is not a mapping, but a tuple of cells. It doesn't know the names of the closed-over variables. When the code object looks up a closed-over variable, it needs to know its position in the tuple; they match up one-to-one with co_freevars which is also read-only. And if the tuple is of the wrong size or not a tuple at all, this mechanism breaks down, probably violently (read: segfaults) if the underlying C code isn't expecting such a situation. Forcing the C code to check the type and size of the tuple is needless busy-work which can be eliminated by making the attribute read-only. If you try to replace __code__ with something taking a different number of free variables, you get an error, so the size is always right.
For __globals__, the explanation is less immediately obvious, but I'll speculate. The scope lookup mechanism expects to have access to the global namespace at all times. Indeed, the bytecode may be hard-coded to go straight to the global namespace, if the compiler can prove no other namespace will have a variable with a particular name. If the global namespace was suddenly None or some other non-mapping object, the C code could, once again, violently misbehave. Again, making the code perform needless type checks would be a waste of CPU cycles.
Another possibility is that (normally-declared) functions borrow a reference to the module's global namespace, and making the attribute writable would cause the reference count to get messed up. I could imagine this design, but I'm not really sure it's a great idea since functions can be constructed explicitly with objects whose lifetimes might be shorter than that of the owning module, and these would need to be special-cased.

Is there a python naming convention for avoiding conflicts with standard module names?

PEP 8 recommends using a single trailing underscore to avoid conflicts with python keywords, but what about conflicts with module names for standard python modules? Should that also be a single trailing underscore?
I'm imagining something like this:
import time
time_ = time.time()
PEP 8 doesn't seem to address it directly.
The trailing underscore is obviously necessary when you're colliding with a keyword, because your code would otherwise raise a SyntaxError (or, if you're really unlucky, compile to mean something completely different than you intended).
So, even in contexts where you have a class attribute, instance attribute, function parameter, or local variable that you want to name class, you have to go with class_ instead.
But the same isn't true for time. And I think in those cases, you shouldn't postfix an underscore for time.
There's precedent for that—multiple classes in the stdlib itself have methods or data attributes named time (and none of them have time_).
Of course there's the case where you're creating a name at the same scope as the module (usually meaning a global variable or function). Then you've got much more potential for confusion, and hiding the ability to access anything on the time module for the rest of the current scope.
I think 90% of the time, the answer is going to be "That shouldn't be a global".
But that still leaves the other 10%.
And there's also the case where your name is in a restricted namespace, but that namespace is a local scope inside a function where you need to access the time module.
Or, maybe, in a long, complicated function (which you shouldn't have any of, but… sometimes you do). If it wouldn't be obvious to a human reader that time is a local rather than the module, that's just as bad as confusing the interpreter.
Here, I think that 99% of the remaining time, the answer is "Just pick a different name".
For example, look at this code:
def dostuff(iterable):
time = time.time()
for thing in iterable:
dothing(thing)
return time.time() - time # oops!
The obvious answer here is to rename the variable start or t0 or something else. Besides solving the problem, it's also a more meaningful name.
But that still leaves the 1%.
For example, there are libraries that generate Python code out of, say, a protocol specification, or a .NET or ObjC interface, where the names aren't under your control; all you can do is apply some kind of programmatic and unambiguous rule to the translated names. In that case, I think a rule that appends _ to stdlib module names as well as keywords might be a good idea.
You can probably come up with other examples where the variable can't just be arbitrarily renamed, and has to (at least potentially) live in the same scope as the time module, and so on. In any such cases, I'd go for the _ suffix.

Is it bad style to reassign long variables as a local abbreviation?

I prefer to use long identifiers to keep my code semantically clear, but in the case of repeated references to the same identifier, I'd like for it to "get out of the way" in the current scope. Take this example in Python:
def define_many_mappings_1(self):
self.define_bidirectional_parameter_mapping("status", "current_status")
self.define_bidirectional_parameter_mapping("id", "unique_id")
self.define_bidirectional_parameter_mapping("location", "coordinates")
#etc...
Let's assume that I really want to stick with this long method name, and that these arguments are always going to be hard-coded.
Implementation 1 feels wrong because most of each line is taken up with a repetition of characters. The lines are also rather long in general, and will exceed 80 characters easily when nested inside of a class definition and/or a try/except block, resulting in ugly line wrapping. Let's try using a for loop:
def define_many_mappings_2(self):
mappings = [("status", "current_status"),
("id", "unique_id"),
("location", "coordinates")]
for mapping in mappings:
self.define_parameter_mapping(*mapping)
I'm going to lump together all similar iterative techniques under the umbrella of Implementation 2, which has the improvement of separating the "unique" arguments from the "repeated" method name. However, I dislike that this has the effect of placing the arguments before the method they're being passed into, which is confusing. I would prefer to retain the "verb followed by direct object" syntax.
I've found myself using the following as a compromise:
def define_many_mappings_3(self):
d = self.define_bidirectional_parameter_mapping
d("status", "current_status")
d("id", "unique_id")
d("location", "coordinates")
In Implementation 3, the long method is aliased by an extremely short "abbreviation" variable. I like this approach because it is immediately recognizable as a set of repeated method calls on first glance while having less redundant characters and much shorter lines. The drawback is the usage of an extremely short and semantically unclear identifier "d".
What is the most readable solution? Is the usage of an "abbreviation variable" acceptable if it is explicitly assigned from an unabbreviated version in the local scope?
itertools to the rescue again! Try using starmap - here's a simple demo:
list(itertools.starmap(min,[(1,2),(2,2),(3,2)]))
prints
[1,2,2]
starmap is a generator, so to actually invoke the methods, you have to consume the generator with a list.
import itertools
def define_many_mappings_4(self):
list(itertools.starmap(
self.define_parameter_mapping,
[
("status", "current_status"),
("id", "unique_id"),
("location", "coordinates"),
] ))
Normally I'm not a fan of using a dummy list construction to invoke a sequence of functions, but this arrangement seems to address most of your concerns.
If define_parameter_mapping returns None, then you can replace list with any, and then all of the function calls will get made, and you won't have to construct that dummy list.
I would go with Implementation 2, but it is a close call.
I think #2 and #3 are equally readable. Imagine if you had 100s of mappings... Either way, I cannot tell what the code at the bottom is doing without scrolling to the top. In #2 you are giving a name to the data; in #3, you are giving a name to the function. It's basically a wash.
Changing the data is also a wash, since either way you just add one line in the same pattern as what is already there.
The difference comes if you want to change what you are doing to the data. For example, say you decide to add a debug message for each mapping you define. With #2, you add a statement to the loop, and it is still easy to read. With #3, you have to create a lambda or something. Nothing wrong with lambdas -- I love Lisp as much as anybody -- but I think I would still find #2 easier to read and modify.
But it is a close call, and your taste might be different.
I think #3 is not bad although I might pick a slightly longer identifier than d, but often this type of thing becomes data driven, so then you would find yourself using a variation of #2 where you are looping over the result of a database query or something from a config file
There's no right answer, so you'll get opinions on all sides here, but I would by far prefer to see #2 in any code I was responsible for maintaining.
#1 is verbose, repetitive, and difficult to change (e.g. say you need to call two methods on each pair or add logging -- then you must change every line). But this is often how code evolves, and it is a fairly familiar and harmless pattern.
#3 suffers the same problem as #1, but is slightly more concise at the cost of requiring what is basically a macro and thus new and slightly unfamiliar terms.
#2 is simple and clear. It lays out your mappings in data form, and then iterates them using basic language constructs. To add new mappings, you only need add a line to the array. You might end up loading your mappings from an external file or URL down the line, and that would be an easy change. To change what is done with them, you only need change the body of your for loop (which itself could be made into a separate function if the need arose).
Your complaint of #2 of "object before verb" doesn't bother me at all. In scanning that function, I would basically first assume the verb does what it's supposed to do and focus on the object, which is now clear and immediately visible and maintainable. Only if there were problems would I look at the verb, and it would be immediately evident what it is doing.

Best way to store and use a large text-file in python

I'm creating a networked server for a boggle-clone I wrote in python, which accepts users, solves the boards, and scores the player input. The dictionary file I'm using is 1.8MB (the ENABLE2K dictionary), and I need it to be available to several game solver classes. Right now, I have it so that each class iterates through the file line-by-line and generates a hash table(associative array), but the more solver classes I instantiate, the more memory it takes up.
What I would like to do is import the dictionary file once and pass it to each solver instance as they need it. But what is the best way to do this? Should I import the dictionary in the global space, then access it in the solver class as globals()['dictionary']? Or should I import the dictionary then pass it as an argument to the class constructor? Is one of these better than the other? Is there a third option?
If you create a dictionary.py module, containing code which reads the file and builds a dictionary, this code will only be executed the first time it is imported. Further imports will return a reference to the existing module instance. As such, your classes can:
import dictionary
dictionary.words[whatever]
where dictionary.py has:
words = {}
# read file and add to 'words'
Even though it is essentially a singleton at this point, the usual arguments against globals apply. For a pythonic singleton-substitute, look up the "borg" object.
That's really the only difference. Once the dictionary object is created, you are only binding new references as you pass it along unless if you explicitly perform a deep copy. It makes sense that it is centrally constructed once and only once so long as each solver instance does not require a private copy for modification.
Adam, remember that in Python when you say:
a = read_dict_from_file()
b = a
... you are not actually copying a, and thus using more memory, you are merely making b another reference to the same object.
So basically any of the solutions you propose will be far better in terms of memory usage. Basically, read in the dictionary once and then hang on to a reference to that. Whether you do it with a global variable, or pass it to each instance, or something else, you'll be referencing the same object and not duplicating it.
Which one is most Pythonic? That's a whole 'nother can of worms, but here's what I would do personally:
def main(args):
run_initialization_stuff()
dictionary = read_dictionary_from_file()
solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ]
HTH.
Depending on what your dict contains, you may be interested in the 'shelve' or 'anydbm' modules. They give you dict-like interfaces (just strings as keys and items for 'anydbm', and strings as keys and any python object as item for 'shelve') but the data is actually in a DBM file (gdbm, ndbm, dbhash, bsddb, depending on what's available on the platform.) You probably still want to share the actual database between classes as you are asking for, but it would avoid the parsing-the-textfile step as well as the keeping-it-all-in-memory bit.

Categories