Approaches to changing language at runtime with python gettext - python

I have read lots of posts about using Python gettext, but none of them addressed the issue of changing languages at runtime.
Using gettext, strings are translated by the function _() which is added globally to builtins. The definition of _ is language-specific and will change during execution when the language setting changes. At certain points in the code, I need strings in an object to be translated to a certain language. This happens by:
(Re)define the _ function in builtins to translate to the chosen language
(Re)evaluate the desired object using the new _ function - guaranteeing that any calls to _ within the object definition are evaluated using the current definition of _.
Return the object
I am wondering about different approaches to step 2. I thought of several but they all seem to have fundamental flaws.
What is the best way to achieve step 2 in practice?
Is it theoretically possible to achieve step 2 for any arbitrary object, without knowledge of its implementation?
Possible approaches
If all translated text is defined in functions that can be called in step 2, then it's straightforward: calling the function will evaluate using the current definition of _. But there are lots of situations where that's not the case, for instance, translated strings could be module-level variables evaluated at import time, or attributes evaluated when instantiating an object.
Minimal example of this problem with module-level variables is here.
Re-evaluation
Manually reload modules
Module-level variables can be re-evaluated at the desired time using importlib.reload. This gets more complicated if the module imports another module that also has translated strings. You have to reload every module that's a (nested) dependency.
With knowledge of the module's implementation, you can manually reload the dependencies in the right order: if A imports B,
importlib.reload(B)
importlib.reload(A)
# use A...
Problems: Requires knowledge of the module's implementation. Only reloads module-level variables.
Automatically reload modules
Without knowledge of the module's implementation, you'd need to automate reloading dependencies in the right order. You could do this for every module in the package, or just the (recursive) dependencies. To handle more complex situations, you'd need to generate a dependency graph and reload modules in breadth-first order from the roots.
Problems: Requires complex reloading algorithm. There are likely edge cases where it's not possible (cyclic dependencies, unusual package structure, from X import Y-style imports). Only reloads module-level variables.
Re-evaluate only the desired object?
eval allows you to evaluate dynamically generated expressions. Instead could you re-evaluate an existing object's static expression, given a dynamic context (builtins._)? I guess this would involve recursively re-evaluating the object, and every object referenced in its definition, and every object referenced in their definitions...
I looked through the inspect module and didn't find any obvious solution.
Problems: Not sure if this is possible. Security issues with eval and similar.
Delayed evaluation
Lazy evaluation
The Flask-Babel project provides a LazyString that delays evaluation of a translated string. If it could be completely delayed until step 2, that seems like the cleanest solution.
Problems: A LazyString can still get evaluated before it's supposed to. Lots of things may call its __str__ function and trigger evaluation, such as string formatting and concatenating.
Deferred translation
The python gettext docs demonstrate temporarily re-defining the _ function, and only calling the actual translation function when the translated string is needed.
Problems: Requires knowledge of the object's structure, and code customized to each object, to find the strings to translate. Doesn't allow concatenation or formatting of translated strings.
Refactoring
All translated strings could be factored out into a separate module, or moved to functions such that they can be completely evaluated at a given time.
Problems: As I understand it the point of gettext and the global _ function is to minimize the impact of translation on existing code. Refactoring like this could require significant design changes and make the code more confusing.

The only plausible, general approach is to rewrite all relevant code to not only use _ to request translation but to never cache the result. That’s not a fun idea and it’s not a new idea—you already list Refactoring and Deferred translation that rely on the cooperation of the gettext clients—but it is the “best way […] in practice”.
You can try to do a super-reload by removing many things from sys.modules and then doing a real reimport. This approach avoids understanding the import relationships, but works only if the relevant modules are all written in Python and you can guarantee that the state of your program will retain no references to any objects (including types and modules) that used the old language. (I’ve done this, but only in a context where the overarching program was a sort of supervisor utterly uninterested in the features of the discarded modules.)
You can try to walk the whole object graph and replace the strings, but even aside from the intrinsic technical difficulty of such an algorithm (consider __slots__ in base classes and co_consts for just the mildest taste), it would involve untranslating them, which changes from hard to impossible when some sort of transformation has already been performed. That transformation might just be concatenating the translated strings, or it might be pre-substituting known values to format, or padding the string, or storing a hash of it: it’s certainly undecidable in general. (I’ve done this too for other data types, but only with data constructed by a file reader whose output used known, simple structures.)
Any approach based on partial reevaluation combines the problems of the methods above.
The only other possible approach is a super-LazyString that refuses to translate for longer by implementing operations like + to return objects that encode the transformations to eventually apply, but it’s impossible to know when to force those operations unless you control all mechanisms used to display or transmit strings. It’s also impossible to defer past, say, if len(_("…"))>80:.

Related

Interpret Python bytecode in C# (with fine control)

For a project idea of mine, I have the following need, which is quite precise:
I would like to be able to execute Python code (pre-compiled before hand if necessary) on a per-bytecode-instruction basis. I also need to access what's inside the Python VM (frame stack, data stacks, etc.). Ideally, I would also like to remove a lot of Python built-in features and reimplement a few of them my own way (such as file writing).
All of this must be coded in C# (I'm using Unity).
I'm okay with loosing a few of Python's actual features, especially concerning complicated stuff with imports, etc. However, I would like most of it to stay intact.
I looked a little bit into IronPython's code but it remains very obscure to me and it seems quite enormous too. I began translating Byterun (a Python bytecode interpreter written in Python) but I face a lot of difficulties as Byterun leverages a lot of Python's features to... interpret Python.
Today, I don't ask for a pre-made solution (except if you have one in mind?), but rather for some advice, places to look at, etc. Do you have any ideas about the things I should research first?
I've tried to do my own implementation of the Python VM in the distant past and learned a lot but never came even close to a fully working implementation. I used the C implementation as a starting point, specifically everything in https://github.com/python/cpython/tree/main/Objects and
https://github.com/python/cpython/blob/main/Python/ceval.c (look for switch(opcode))
Here are some pointers:
Come to grips with the Python object model. Implement an abstract PyObject class with the necessary methods for instancing, attribute access, indexing and slicing, calling, comparisons, aritmetic operations and representation. Provide concrete implemetations for None, booleans, ints, floats, strings, tuples, lists and dictionaries.
Implement the core of your VM: a Frame object that loops over the opcodes and dispatches, using a giant switch statment (following the C implementation here), to the corresponding methods of the PyObject. The frame should maintains a stack of PyObjects for the operants of the opcodes. Depending on the opcode, arguments are popped from and pushed on this stack. A dict can be used to store and retrieve local variables. Use the Frame object to create a PyObject for function objects.
Get familiar with the idea of a namespace and the way Python builds on the concept of namespaces. Implement a module, a class and an instance object, using the dict to map (attribute)names to objects.
Finally, add as many builtin functions as you think you need to get a usefull implementation.
I think it is easy to underestimate the amount of work you're getting yourself into, but ... have fun!

Safe implementation strategy for embedded user-defined expressions

I am designing/prototyping a Domain Specific Language... in Python, for now, at least. The design is straightforward - but requiring support to specify an arbitrary function (the domain of which is a map from labels to integers - the range is an integer.) In many cases, the function will merely select a label in the domain to yield a result... but I want to allow the specification of any function that could be easily (and efficiently) implemented in a general purpose programming language.
A caveat is that I want the function to be 'safe'... by this I mean:
A 'pure' function: deterministic with no side effects. (i.e. no external state; no interaction with files, I/O, devices - etc.)
Terminating - either successfully, or after specific (small-scale) allocated computational resources have expired.
I am keen that this function should be implemented efficiently - I expect definitions to be provided infrequently - and evaluated very frequently. I would also like the functions to be defined using a familiar syntax.
I've considered supporting the implementation of functions in python... I'm aware that I could impose restrictions using the eval() function, and I've found the AST module - suggesting an approach involving parsing to an AST, then interpreting (or verifying, prior to evaluation) the AST tree. I've also read about pyparse and consdered implementing a bespoke, interpreted, language.
I can't help think that trying to block undesirable behaviour from eval() is to be tackling the problem "backwards" (trying to block undesirable functionality ex-post) whereas implementing a bespoke language would involve re-inventing the wheel.
Does Python already have a safe, efficient, embeddable, expression interpreter?
PyPy has a sandbox.
If you're running this in the web browser (the usual place for untrusted code concerns) consider running it client-side with something like Brython. No-one cares if the user hacks his own machine.
If you do implement a bespoke interpreter, you don't have to re-implement all of the wheel. It's thought to be relatively safe to use compile() on untrusted code, but beware of large constants eating time and memory. Run the compiler in a separate process you can kill. Then you just need to write a Python bytecode interpreter that lacks access to anything important.

Use of eval in Python, MATLAB, etc [duplicate]

This question already has answers here:
Why is using 'eval' a bad practice?
(8 answers)
Closed 9 years ago.
I do know that one shouldn't use eval. For all the obvious reasons (performance, maintainability, etc.). My question is more on the side – is there a legitimate use for it? Where one should use it rather than implement the code in another way.
Since it is implemented in several languages and can lead to bad programming style, I assume there is a reason why it's still available.
First, here is Mathwork's list of alternatives to eval.
You could also be clever and use eval() in a compiled application to build your mCode interpreter, but the Matlab compiler doesn't allow that for obvious reasons.
One place where I have found a reasonable use of eval is in obtaining small predicates of code that consumers of my software need to be able to supply as part of a parameter file.
For example, there might be an item called "Data" that has a location for reading and writing the data, but also requires some predicate applied to it upon load. In a Yaml file, this might look like:
Data:
Name: CustomerID
ReadLoc: some_server.some_table
WriteLoc: write_server.write_table
Predicate: "lambda x: x[:4]"
Upon loading and parsing the objects from Yaml, I can use eval to turn the predicate string into a callable lambda function. In this case, it implies that CustomerID is a long string and only the first 4 characters are needed in this particular instance.
Yaml offers some clunky ways to magically invoke object constructors (e.g. using something like !Data in my code above, and then having defined a class for Data in the code that appropriately uses Yaml hooks into the constructor). In fact, one of the biggest criticisms I have of the Yaml magic object construction is that it is effectively like making your whole parameter file into one giant eval statement. And this is very problematic if you need to validate things and if you need flexibility in the way multiple parts of the code absorb multiple parts of the parameter file. It also doesn't lend itself easily to templating with Mako, whereas my approach above makes that easy.
I think this simpler design which can be easily parsed with any XML tools is better, and using eval lets me allow the user to pass in whatever arbitrary callable they want.
A couple of notes on why this works in my case:
The users of the code are not Python programmers. They don't have the ability to write their own functions and then just pass a module location, function name, and argument signature (although, putting all that in a parameter file is another way to solve this that wouldn't rely on eval if the consumers can be trusted to write code.)
The users are responsible for their bad lambda functions. I can do some validation that eval works on the passed predicate, and maybe even create some tests on the fly or have a nice failure mode, but at the end of the day I am allowed to tell them that it's their job to supply valid predicates and to ensure the data can be manipulated with simple predicates. If this constraint wasn't in place, I'd have to shuck this for a different system.
The users of these parameter files compose a small group mostly willing to conform to conventions. If that weren't true, it would be risky that folks would hi-jack the predicate field to do many inappropriate things -- and this would be hard to guard against. On big projects, it would not be a great idea.
I don't know if my points apply very generally, but I would say that using eval to add flexibility to a parameter file is good if you can guarantee your users are a small group of convention-upholders (a rare feat, I know).
In MATLAB the eval function is useful when functions make use of the name of the input argument via the inputname function. For example, to overload the builtin display function (which is sensitive to the name of the input argument) the eval function is required. For example, to call the built in display from an overloaded display you would do
function display(X)
eval([inputname(1), ' = X;']);
eval(['builtin(''display'', ', inputname(1), ');']);
end
In MATLAB there is also evalc. From the documentation:
T = evalc(S) is the same as EVAL(S) except that anything that would
normally be written to the command window, except for error messages,
is captured and returned in the character array T (lines in T are
separated by '\n' characters).
If you still consider this eval, then it is very powerful when dealing with closed source code that displays useful information in the command window and you need to capture and parse that output.

Python Circular import: solutions and best practice

I'm writing an application for scientific data analysis and I'm wondering what's the best way to structure the code to avoid (or address) the circular import problem. Currently I'm using a mix of OO and procedural programming.
Other questions address this issue but in a more abstract way. Here I'm looking for a solution that is optimal in a more specific context.
I have a class Container defined in DataLib.py whose data consist in lists and/or arrays. With all methods and supporting functions DataLib.py is quite large (~1000 lines).
I have a second module SelectionLib.py (~400 lines) that contains only functions to "filter" the data in Container according to different criteria. These functions return new Container objects (with filtered data) and thus SelectionLib.py needs to import Container from DataLib.py. Note that, logically, these functions are "methods" for "Container", they are just implemented using python functions.
Now, I want to add some high level method to Container so that a complex analysis can be performed with a single function of method call. And by "complex analysis" I mean an arbitrary number of Container methods call, local function (defined in DataLib.py) and filter functions (defined inSelectionLib.py).
So the problem is that DataLib.py needs to import SelectionLib.py to use the filter functions, but SelectionLib.py already imports DataLib.py.
Right know my hackish solution is to run the two files with run -i ... from IPython so it is like having a big single file and I avoid the circular import. But at the same time this scripts are difficult to integrate for example in a GUI.
How do you suggest to solve this problem:
use pure OO and inheritance and split the object in 3: CoreContainer -> SelectionContainer -> HighLevelContainer
Restructuring the code (everything in one file?)
Some sort of Import trickery (put imports at the end)
Any feedback is appreciated!
If functions in SelectionLib are, as you say, "methods" for Container, it seems reasonable that DataLib imports SelectionLib, not the other way around.
Then the user code would just import DataLib. This would require some refactoring. One possibility to minimize the disruption to the user code would be to rename your existing DataLib and SelectionLib to _DataLib and _SelectionLib, and have a new DataLib to import the necessary bits from either (or both).
As an aside, it's better to follow the PEP-8 conventions and name your modules in lowercase_with_underscores.

Is it costly in Python to put classes in different files?

I am a Java programmer and I have always created separate files for Classes, I am attempting to learn python and I want to learn it right.
Is it costly in python to put Classes in different files, meaning one file contains only one class. I read in a blog that it is costly because resolution of . operator happens at runtime in python (It happens at compile time for Java).
Note: I did read in other posts that we can put them in separate files but they don't mention if they are costlier in any way
It is slightly more costly, but not to an extent you are likely to care. You can negate this extra cost by doing:
from module import Class
As then the class will be assigned to a variable in the local namespace, meaning it doesn't have to do the lookup through the module.
In reality, however, this is unlikely to be important. The cost of looking up something like this is going to be tiny, and you should focus on doing what makes your code the most readable. Split classes across modules and packages as is logical for your program, and as it keeps them clear.
If, for example, you are using something repeatedly in a loop which is a bottleneck for your program, you can assign it to a local variable for that loop, e.g:
import module
...
some_important_thing = module.some_important_thing
#Bottleneck loop
for item in items:
#module.some_important_thing()
some_important_thing()
Note that this kind of optimisation is unlikely to be the important thing, and you should only ever optimise where you have proof you need to do so.

Categories