Methods of creating syntax highlighting in textX? - python

As I cannot find any guidelines about syntax highlighting, I decided to prepare simple write-as-plain-text-and-then-highlight-everything-in-html-preview, which is enough for my scope at the moment.
By overriding many custom meta-model classes I have to_source method, which actually reimplements the whole syntax in reverse, as reverse parsing is not yet available. It's fine, but it ignores user formatting.
To retain user formatting we can use only available thing: _tx_position and _tx_position_end. Descending from main textX rule to its children by stored custom meta-model classes attributes works for most cases, but it fails with primitives.
# textX meta-model file
NonsenseProgram:
"begin" foo=Foo "," count=INT "end";
;
Foo:
"fancy" a=ID "separator" b=ID "finished"
;
# textX custom meta-model classes
class NonsenseProgram():
def __init__(foo, count):
self.foo = foo
self.count = count
def to_source(self):
pass # some recursive magic that use _tx_position and _tx_position_end
class Foo():
def __init__(parent, a, b):
self.parent = parent
self.a = a
self.b = b
def to_source(self):
pass # some recursive magic that use _tx_position and _tx_position_end
Let's consider given example. As we have NonsenseProgram and Foo classes that we can override, we are in control about it's returning source as a whole. We can modify NonsenseProgram generated code, NonsenseProgram.foo fragment (by overriding Foo), by accessing its _tx_* attributes. We can't do the same with NonsenseProgram.count, Foo.a and Foo.b as we have primitive string or int value.
Depending of the usage of primitives is out grammar we have following options:
Wrap every primitive with rule that contains only that primitive and nothing else.
Pros: It just works right now!
Cons: Produces massive overhead of nested values that our grammar toolchain need to handle. It's actually messing with grammar only for being pretty...
Ignore syntax from user and use only our reverse parsing rules.
Pros: It just works too!
Cons: You need reimplement your syntax with nearly every grammar element. It's forces code reformat on every highlight try.
Use some external rules of highlighting.
Pros: It would work...
Cons: Again grammar reimplementation.
Use language server.
Pros: Would be the best option on long run.
Cons: It's only mentioned once without any in-depth docs.
Any suggestions about any other options?

You are right. There is no information on position for primitive types. It seems that you have covered available options at the moment.
What would be an easy to implement option is to add bookkeeping of position directly to textX of all attributes as a special structure on each created object (e.g. a dict keyed by attribute name). It should be straightforward to implement so you can register a feature request in the issue tracker if you wish.
There was some work in the past to support full language services to the textX based languages. The idea is to get all the features you would expect from a decent code editor/IDE for any language specified using textX.
The work staled for a while but resumed recently as the full rewrite. It should be officially supported by the textX team. You can follow the progress here. Although, the project doesn't mention syntax highlighting at the moment, it is on our agenda.

Related

Best practice when defining instance variables

I'm fairly new to Python and have a question regarding the following class:
class Configuration:
def __init__(self):
parser = SafeConfigParser()
try:
if parser.read(CONFIG_FILE) is None:
raise IOError('Cannot open configuration file')
except IOError, error:
sys.exit(error)
else:
self.__parser = parser
self.fileName = CONFIG_FILE
def get_section(self):
p = self.__parser
result = []
for s in p.sections():
result.append('{0}'.format(s))
return result
def get_info(self, config_section):
p = self.__parser
self.section = config_section
self.url = p.get(config_section, 'url')
self.imgexpr = p.get(config_section, 'imgexpr')
self.imgattr1 = p.get(config_section, 'imgattr1')
self.imgattr2 = p.get(config_section, 'imgattr2')
self.destination = p.get(config_section, 'destination')
self.createzip = p.get(config_section, 'createzip')
self.pagesnumber = p.get(config_section, 'pagesnumber')
Is it OK to add more instance variables in another function, get_info in this example, or is it best practice to define all instance variables in the constructor? Couldn't it lead to spaghetti code if I define new instance variables all over the place?
EDIT: I'm using this code with a simple image scraper. Via get_section I return all sections in the config file, and then iterate through them to visit each site that I'm scraping images from. For each iteration I make a call to get_section to get the configuration settings for each section in the config file.
If anyone can come up with another approach it'll be fine! Thanks!
I would definitely declare all instance variables in __init__. To not do so leads to increased complexity and potential unexpected side effects.
To provide an alternate point of view from David Hall in terms of access, this is from the Google Python style guide.
Access Control:
If an accessor function would be trivial you should use public
variables instead of accessor functions to avoid the extra cost of
function calls in Python. When more functionality is added you can use
property to keep the syntax consistent
On the other hand, if access is more complex, or the cost of accessing
the variable is significant, you should use function calls (following
the Naming guidelines) such as get_foo() and set_foo(). If the past
behavior allowed access through a property, do not bind the new
accessor functions to the property. Any code still attempting to
access the variable by the old method should break visibly so they are
made aware of the change in complexity.
From PEP8
For simple public data attributes, it is best to expose just the
attribute name, without complicated accessor/mutator methods. Keep in
mind that Python provides an easy path to future enhancement, should
you find that a simple data attribute needs to grow functional
behavior. In that case, use properties to hide functional
implementation behind simple data attribute access syntax.
Note 1: Properties only work on new-style classes.
Note 2: Try to keep the functional behavior side-effect free, although
side-effects such as caching are generally fine.
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe that
access is (relatively) cheap.
Python isn't java/C#, and it has very strong ideas about how code should look and be written. If you are coding in python, it makes sense to make it look and feel like python. Other people will be able to understand your code more easily and you'll be able to understand other python code better as well.
I would favour setting all the instance variables in the constructor over having functions like get_info() that are required to put the class in a valid state.
With public instance variables that are only instantiated by calls to methods such as your get_info() you create a class that is a bit of a minefield to use.
If you are worried about have certain configuration values which are not always needed and are expensive to calculate (which I guess is why you have get_info(), allowing for deferred execution), then I'd either consider refactoring that subset of config into a second class or introducting properties or functions that return values.
With properties or get style functions you encourage consumers of the class to go through a defined interface and improve the encapsulation 1.
Once you have that encapsulation of the instance variables you give yourself the option to do something more than simply throw a NameError exception - you can perhaps call get_info() yourself, or throw a custom exception.
1.You can't provide 100% encapsulation with Python since private instance variables denoted by a leading double underscore are only private by convention

A idiom or design pattern for class template?

My code base is in Python. Let's say I have a fairly generic class called Report. It takes a large number of parameters
class Report(object):
def __init__(self, title, data_source, columns, format, ...many more...)
And there are many many instantiation of the Report. These instantiation are not entirely unrelated. Many reports share similar set of parameters, differ only with minor variation, like having the same data_source and columns but with a different title.
Rather than duplicating the parameters, some programming construct is applied to make expression this structure easier. And I'm trying to find some help to sort my head to identify some idiom or design pattern for this.
If a subcategory of report need some extra processing code, subclass seems to be a good choice. Say we have a subcategory of ExpenseReport.
class ExpenseReport(Report):
def __init__(self, title, ... a small number of parameters ...)
# some parameters are fixed, while others are specific to this instance
super(ExpenseReport,self).__init__(
title,
EXPENSE_DATA_SOURCE,
EXPENSE_COLUMNS,
EXPENSE_FORMAT,
... a small number of parameters...)
def processing(self):
... extra processing specific to ExpenseReport ...
But in a lot of cases, the subcategory merely fix some parameters without any extra processing. It could easily be done with partial function.
ExpenseReport = functools.partial(Report,
data_source = EXPENSE_DATA_SOURCE,
columns = EXPENSE_COLUMNS,
format = EXPENSE_FORMAT,
)
And in some case, there isn't even any difference. We simply need 2 copies of the same object to be used in different environment, like to be embedded in different page.
expense_report = Report("Total Expense", EXPENSE_DATA_SOURCE, ...)
page1.add(expense_report)
...
page2.add(clone(expense_report))
And in my code base, an ugly technique is used. Because we need 2 separate instances for each page, and because we don't want to duplicate the code with long list of parameter that creates report, we just clone (deepcopy in Python) the report for page 2. Not only is the need of cloning not apparent, neglecting to clone the object and instead sharing one instance creates a lot of hidden problem and subtle bugs in our system.
Is there any guidance in this situation? Subclass, partial function or other idiom? My desire is for this construct to be light and transparent. I'm slight wary of subclassing because it is likely to result in a jungle of subclass. And it induces programmer to add special processing code like what I have in ExpenseReport. If there is a need I rather analyze the code to see if it can be generalized and push to the Report layer. So that Report becomes more expressive without needing special processing in lower layers.
Additional Info
We do use keyword parameter. The problem is more in how to manage and organize the instantiation. We have a large number of instantiation with common patterns:
expense_report = Report("Expense", data_source=EXPENSE, ..other common pattern..)
expense_report_usd = Report("USD Expense", data_source=EXPENSE, format=USD, ..other common pattern..)
expense_report_euro = Report("Euro Expense", data_source=EXPENSE, format=EURO, ..other common pattern..)
...
lot more reports
...
page1.add(expense_report_usd)
page2.add(expense_report_usd) # oops, page1 and page2 shared the same instance?!
...
lots of pages
...
Why don't you just use keyword arguments and collect them all into a dict:
class Report(object):
def __init__(self, **params):
self.params = params
...
I see no reason why you shouldn't just use a partial function.
If your main problem is common arguments in class constructos, possible solution is to write something like:
common_arguments = dict(arg=value, another_arg=anoter_value, ...)
expense_report = Report("Expense", data_source=EXPENSE, **common_arguments)
args_for_shared_usd_instance = dict(title="USD Expense", data_source=EXPENSE, format=USD)
args_for_shared_usd_instance.update(common_arguments)
expense_report_usd = Report(**args_for_shared_usd_instance)
page1.add(Report(**args_for_shared_usd_instance))
page2.add(Report(**args_for_shared_usd_instance))
Better naming, can make it convenient. Maybe there is better design solution.
I found some information myself.
I. curry -- associating parameters with a function « Python recipes « ActiveState Code
http://code.activestate.com/recipes/52549-curry-associating-parameters-with-a-function/
See the entire dicussion. Nick Perkins' comment on 'Lightweight' subclasses is similar to what I've described.
II. PEP 309 -- Partial Function Application
http://www.python.org/dev/peps/pep-0309/
The question is quite old, but this might still help someone who stumbles onto it...
I made a small library called classical to simplify class inheritance cases like this (Python 3 only).
Simple example:
from classical.descriptors import ArgumentedSubclass
class CustomReport(Report):
Expense = ArgumentedSubclass(data_source=EXPENSE, **OTHER_EXPENSE_KWARGS)
Usd = ArgumentedSubclass(format=USD)
Euro = ArgumentedSubclass(format=EURO)
PatternN = ArgumentedSubclass(**PATTERN_N_KWARGS)
PatternM = ArgumentedSubclass(**PATTERN_M_KWARGS)
# Now you can chain these in any combination (and with additional arguments):
my_report_1 = CustomReport.Expense.Usd(**kwargs)
my_report_2 = CustomReport.Expense.Euro(**kwargs)
my_report_3 = CustomReport.Expense.PatternM.PatternN(**kwargs)
In this example it's not really necessary to separate Report and CustomReport classes, but might be a good idea to keep the original class "clean".
Hope this helps :)

Have well-defined, narrowly-focused classes ... now how do I get anything done in my program?

I'm coding a poker hand evaluator as my first programming project. I've made it through three classes, each of which accomplishes its narrowly-defined task very well:
HandRange = a string-like object (e.g. "AA"). getHands() returns a list of tuples for each specific hand within the string:
[(Ad,Ac),(Ad,Ah),(Ad,As),(Ac,Ah),(Ac,As),(Ah,As)]
Translation = a dictionary that maps the return list from getHands to values that are useful for a given evaluator (yes, this can probably be refactored into another class).
{'As':52, 'Ad':51, ...}
Evaluator = takes a list from HandRange (as translated by Translator), enumerates all possible hand matchups and provides win % for each.
My question: what should my "domain" class for using all these classes look like, given that I may want to connect to it via either a shell UI or a GUI? Right now, it looks like an assembly line process:
user_input = HandRange()
x = Translation.translateList(user_input)
y = Evaluator.getEquities(x)
This smells funny in that it feels like it's procedural when I ought to be using OO.
In a more general way: if I've spent so much time ensuring that my classes are well defined, narrowly focused, orthogonal, whatever ... how do I actually manage work flow in my program when I need to use all of them in a row?
Thanks,
Mike
Don't make a fetish of object orientation -- Python supports multiple paradigms, after all! Think of your user-defined types, AKA classes, as building blocks that gradually give you a "language" that's closer to your domain rather than to general purpose language / library primitives.
At some point you'll want to code "verbs" (actions) that use your building blocks to perform something (under command from whatever interface you'll supply -- command line, RPC, web, GUI, ...) -- and those may be module-level functions as well as methods within some encompassing class. You'll surely want a class if you need multiple instances, and most likely also if the actions involve updating "state" (instance variables of a class being much nicer than globals) or if inheritance and/or polomorphism come into play; but, there is no a priori reason to prefer classes to functions otherwise.
If you find yourself writing static methods, yearning for a singleton (or Borg) design pattern, writing a class with no state (just methods) -- these are all "code smells" that should prompt you to check whether you really need a class for that subset of your code, or rather whether you may be overcomplicating things and should use a module with functions for that part of your code. (Sometimes after due consideration you'll unearth some different reason for preferring a class, and that's allright too, but the point is, don't just pick a class over a module w/functions "by reflex", without critically thinking about it!).
You could create a Poker class that ties these all together and intialize all of that stuff in the __init__() method:
class Poker(object):
def __init__(self, user_input=HandRange()):
self.user_input = user_input
self.translation = Translation.translateList(user_input)
self.evaluator = Evaluator.getEquities(x)
# and so on...
p = Poker()
# etc, etc...

Python code readability

I have a programming experience with statically typed languages. Now writing code in Python I feel difficulties with its readability. Lets say I have a class Host:
class Host(object):
def __init__(self, name, network_interface):
self.name = name
self.network_interface = network_interface
I don't understand from this definition, what "network_interface" should be. Is it a string, like "eth0" or is it an instance of a class NetworkInterface? The only way I'm thinking about to solve this is a documenting the code with a "docstring". Something like this:
class Host(object):
''' Attributes:
#name: a string
#network_interface: an instance of class NetworkInterface'''
Or may be there are name conventions for things like that?
Using dynamic languages will teach you something about static languages: all the help you got from the static language that you now miss in the dynamic language, it wasn't all that helpful.
To use your example, in a static language, you'd know that the parameter was a string, and in Python you don't. So in Python you write a docstring. And while you're writing it, you realize you had more to say about it than, "it's a string". You need to say what data is in the string, and what format it should have, and what the default is, and something about error conditions.
And then you realize you should have written all that down for your static language as well. Sure, Java would force you know that it was a string, but there's all these other details that need to be specified, and you have to manually do that work in any language.
The docstring conventions are at PEP 257.
The example there follows this format for specifying arguments, you can add the types if they matter:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
There was also a rejected PEP for docstrings for attributes ( rather than constructor arguments ).
The most pythonic solution is to document with examples. If possible, state what operations an object must support to be acceptable, rather than a specific type.
class Host(object):
def __init__(self, name, network_interface)
"""Initialise host with given name and network_interface.
network_interface -- must support the same operations as NetworkInterface
>>> network_interface = NetworkInterface()
>>> host = Host("my_host", network_interface)
"""
...
At this point, hook your source up to doctest to make sure your doc examples continue to work in future.
Personally I found very usefull to use pylint to validate my code.
If you follow pylint suggestion almost automatically your code become more readable,
you will improve your python writing skills, respect naming conventions. You can also define your own naming conventions and so on. It's very useful specially for a python beginner.
I suggest you to use.
Python, though not as overtly typed as C or Java, is still typed and will throw exceptions if you're doing things with types that simply do not play nice together.
To that end, if you're concerned about your code being used correctly, maintained correctly, etc. simply use docstrings, comments, or even more explicit variable names to indicate what the type should be.
Even better yet, include code that will allow it to handle whichever type it may be passed as long as it yields a usable result.
One benefit of static typing is that types are a form of documentation. When programming in Python, you can document more flexibly and fluently. Of course in your example you want to say that network_interface should implement NetworkInterface, but in many cases the type is obvious from the context, variable name, or by convention, and in these cases by omitting the obvious you can produce more readable code. Common is to describe the meaning of a parameter and implicitly giving the type.
For example:
def Bar(foo, count):
"""Bar the foo the given number of times."""
...
This describes the function tersely and precisely. What foo and bar mean will be obvious from context, and that count is a (positive) integer is implicit.
For your example, I'd just mention the type in the document string:
"""Create a named host on the given NetworkInterface."""
This is shorter, more readable, and contains more information than a listing of the types.

When and how to use the builtin function property() in python

It appears to me that except for a little syntactic sugar, property() does nothing good.
Sure, it's nice to be able to write a.b=2 instead of a.setB(2), but hiding the fact that a.b=2 isn't a simple assignment looks like a recipe for trouble, either because some unexpected result can happen, such as a.b=2 actually causes a.b to be 1. Or an exception is raised. Or a performance problem. Or just being confusing.
Can you give me a concrete example for a good usage of it? (using it to patch problematic code doesn't count ;-)
In languages that rely on getters and setters, like Java, they're not supposed nor expected to do anything but what they say -- it would be astonishing if x.getB() did anything but return the current value of logical attribute b, or if x.setB(2) did anything but whatever small amount of internal work is needed to make x.getB() return 2.
However, there are no language-imposed guarantees about this expected behavior, i.e., compiler-enforced constraints on the body of methods whose names start with get or set: rather, it's left up to common sense, social convention, "style guides", and testing.
The behavior of x.b accesses, and assignments such as x.b = 2, in languages which do have properties (a set of languages which includes but is not limited to Python) is exactly the same as for getter and setter methods in, e.g., Java: the same expectations, the same lack of language-enforced guarantees.
The first win for properties is syntax and readability. Having to write, e.g.,
x.setB(x.getB() + 1)
instead of the obvious
x.b += 1
cries out for vengeance to the gods. In languages which support properties, there is absolutely no good reason to force users of the class to go through the gyrations of such Byzantine boilerplate, impacting their code's readability with no upside whatsoever.
In Python specifically, there's one more great upside to using properties (or other descriptors) in lieu of getters and setters: if and when you reorganize your class so that the underlying setter and getter are not needed anymore, you can (without breaking the class's published API) simply eliminate those methods and the property that relies on them, making b a normal "stored" attribute of x's class rather than a "logical" one obtained and set computationally.
In Python, doing things directly (when feasible) instead of via methods is an important optimization, and systematically using properties enables you to perform this optimization whenever feasible (always exposing "normal stored attributes" directly, and only ones which do need computation upon access and/or setting via methods and properties).
So, if you use getters and setters instead of properties, beyond impacting the readability of your users' code, you are also gratuitously wasting machine cycles (and the energy that goes to their computer during those cycles;-), again for no good reason whatsoever.
Your only argument against properties is e.g. that "an outside user wouldn't expect any side effects as a result of an assignment, usually"; but you miss the fact that the same user (in a language such as Java where getters and setters are pervasive) wouldn't expect (observable) "side effects" as a result of calling a setter, either (and even less for a getter;-). They're reasonable expectations and it's up to you, as the class author, to try and accommodate them -- whether your setter and getter are used directly or through a property, makes no difference. If you have methods with important observable side effects, do not name them getThis, setThat, and do not use them via properties.
The complaint that properties "hide the implementation" is wholly unjustified: most all of OOP is about implementing information hiding -- making a class responsible for presenting a logical interface to the outside world and implementing it internally as best it can. Getters and setters, exactly like properties, are tools towards this goal. Properties just do a better job at it (in languages that support them;-).
The idea is to allow you to avoid having to write getters and setters until you actually need them.
So, to start off you write:
class MyClass(object):
def __init__(self):
self.myval = 4
Obviously you can now write myobj.myval = 5.
But later on, you decide that you do need a setter, as you want to do something clever at the same time. But you don't want to have to change all the code that uses your class - so you wrap the setter in the #property decorator, and it all just works.
but hiding the fact that a.b=2 isn't a
simple assignment looks like a recipe
for trouble
You're not hiding that fact though; that fact was never there to begin with. This is python -- a high-level language; not assembly. Few of the "simple" statements in it boil down to single CPU instructions. To read simplicity into an assignment is to read things that aren't there.
When you say x.b = c, probably all you should think is that "whatever just happened, x.b should now be c".
A basic reason is really simply that it looks better. It is more pythonic. Especially for libraries. something.getValue() looks less nice than something.value
In plone (a pretty big CMS), you used to have document.setTitle() which does a lot of things like storing the value, indexing it again and so. Just doing document.title = 'something' is nicer. You know that a lot is happening behind the scenes anyway.
You are correct, it is just syntactic sugar. It may be that there are no good uses of it depending on your definition of problematic code.
Consider that you have a class Foo that is widely used in your application. Now this application has got quite large and further lets say it's a webapp that has become very popular.
You identify that Foo is causing a bottleneck. Perhaps it is possible to add some caching to Foo to speed it up. Using properties will let you do that without changing any code or tests outside of Foo.
Yes of course this is problematic code, but you just saved a lot of $$ fixing it quickly.
What if Foo is in a library that you have hundreds or thousands of users for? Well you saved yourself having to tell them to do an expensive refactor when they upgrade to the newest version of Foo.
The release notes have a lineitem about Foo instead of a paragraph porting guide.
Experienced Python programmers don't expect much from a.b=2 other than a.b==2, but they know even that may not be true. What happens inside the class is it's own business.
Here's an old example of mine. I wrapped a C library which had functions like "void dt_setcharge(int atom_handle, int new_charge)" and "int dt_getcharge(int atom_handle)". I wanted at the Python level to do "atom.charge = atom.charge + 1".
The "property" decorator makes that easy. Something like:
class Atom(object):
def __init__(self, handle):
self.handle = handle
def _get_charge(self):
return dt_getcharge(self.handle)
def _set_charge(self, charge):
dt_setcharge(self.handle, charge)
charge = property(_get_charge, _set_charge)
10 years ago, when I wrote this package, I had to use __getattr__ and __setattr__ which made it possible, but the implementation was a lot more error prone.
class Atom:
def __init__(self, handle):
self.handle = handle
def __getattr__(self, name):
if name == "charge":
return dt_getcharge(self.handle)
raise AttributeError(name)
def __setattr__(self, name, value):
if name == "charge":
dt_setcharge(self.handle, value)
else:
self.__dict__[name] = value
getters and setters are needed for many purposes, and are very useful because they are transparent to the code. Having object Something the property height, you assign a value as Something.height = 10, but if height has a getter and setter then at the time you do assign that value you can do many things in the procedures, like validating a min or max value, like triggering an event because the height changed, automatically setting other values in function of the new height value, all that may occur at the moment Something.height value was assigned. Remember, you don't need to call them in your code, they are auto executed at the moment you read or write the property value. In some way they are like event procedures, when the property X changes value and when the property X value is read.
It is useful when you try to replace inheritance with delegation in refactoring. The following is a toy example. Stack was a subclass in Vector.
class Vector:
def __init__(self, data):
self.data = data
#staticmethod
def get_model_with_dict():
return Vector([0, 1])
class Stack:
def __init__(self):
self.model = Vector.get_model_with_dict()
self.data = self.model.data
class NewStack:
def __init__(self):
self.model = Vector.get_model_with_dict()
#property
def data(self):
return self.model.data
#data.setter
def data(self, value):
self.model.data = value
if __name__ == '__main__':
c = Stack()
print(f'init: {c.data}') #init: [0, 1]
c.data = [0, 1, 2, 3]
print(f'data in model: {c.model.data} vs data in controller: {c.data}')
#data in model: [0, 1] vs data in controller: [0, 1, 2, 3]
c_n = NewStack()
c_n.data = [0, 1, 2, 3]
print(f'data in model: {c_n.model.data} vs data in controller: {c_n.data}')
#data in model: [0, 1, 2, 3] vs data in controller: [0, 1, 2, 3]
Note if you do use directly access instead of property, the self.model.data does not equal self.data, which is out of our expectation.
You can take codes before __name__=='__main__' as a library.

Categories