I'm writing an application for scientific data analysis and I'm wondering what's the best way to structure the code to avoid (or address) the circular import problem. Currently I'm using a mix of OO and procedural programming.
Other questions address this issue but in a more abstract way. Here I'm looking for a solution that is optimal in a more specific context.
I have a class Container defined in DataLib.py whose data consist in lists and/or arrays. With all methods and supporting functions DataLib.py is quite large (~1000 lines).
I have a second module SelectionLib.py (~400 lines) that contains only functions to "filter" the data in Container according to different criteria. These functions return new Container objects (with filtered data) and thus SelectionLib.py needs to import Container from DataLib.py. Note that, logically, these functions are "methods" for "Container", they are just implemented using python functions.
Now, I want to add some high level method to Container so that a complex analysis can be performed with a single function of method call. And by "complex analysis" I mean an arbitrary number of Container methods call, local function (defined in DataLib.py) and filter functions (defined inSelectionLib.py).
So the problem is that DataLib.py needs to import SelectionLib.py to use the filter functions, but SelectionLib.py already imports DataLib.py.
Right know my hackish solution is to run the two files with run -i ... from IPython so it is like having a big single file and I avoid the circular import. But at the same time this scripts are difficult to integrate for example in a GUI.
How do you suggest to solve this problem:
use pure OO and inheritance and split the object in 3: CoreContainer -> SelectionContainer -> HighLevelContainer
Restructuring the code (everything in one file?)
Some sort of Import trickery (put imports at the end)
Any feedback is appreciated!
If functions in SelectionLib are, as you say, "methods" for Container, it seems reasonable that DataLib imports SelectionLib, not the other way around.
Then the user code would just import DataLib. This would require some refactoring. One possibility to minimize the disruption to the user code would be to rename your existing DataLib and SelectionLib to _DataLib and _SelectionLib, and have a new DataLib to import the necessary bits from either (or both).
As an aside, it's better to follow the PEP-8 conventions and name your modules in lowercase_with_underscores.
Related
I have read lots of posts about using Python gettext, but none of them addressed the issue of changing languages at runtime.
Using gettext, strings are translated by the function _() which is added globally to builtins. The definition of _ is language-specific and will change during execution when the language setting changes. At certain points in the code, I need strings in an object to be translated to a certain language. This happens by:
(Re)define the _ function in builtins to translate to the chosen language
(Re)evaluate the desired object using the new _ function - guaranteeing that any calls to _ within the object definition are evaluated using the current definition of _.
Return the object
I am wondering about different approaches to step 2. I thought of several but they all seem to have fundamental flaws.
What is the best way to achieve step 2 in practice?
Is it theoretically possible to achieve step 2 for any arbitrary object, without knowledge of its implementation?
Possible approaches
If all translated text is defined in functions that can be called in step 2, then it's straightforward: calling the function will evaluate using the current definition of _. But there are lots of situations where that's not the case, for instance, translated strings could be module-level variables evaluated at import time, or attributes evaluated when instantiating an object.
Minimal example of this problem with module-level variables is here.
Re-evaluation
Manually reload modules
Module-level variables can be re-evaluated at the desired time using importlib.reload. This gets more complicated if the module imports another module that also has translated strings. You have to reload every module that's a (nested) dependency.
With knowledge of the module's implementation, you can manually reload the dependencies in the right order: if A imports B,
importlib.reload(B)
importlib.reload(A)
# use A...
Problems: Requires knowledge of the module's implementation. Only reloads module-level variables.
Automatically reload modules
Without knowledge of the module's implementation, you'd need to automate reloading dependencies in the right order. You could do this for every module in the package, or just the (recursive) dependencies. To handle more complex situations, you'd need to generate a dependency graph and reload modules in breadth-first order from the roots.
Problems: Requires complex reloading algorithm. There are likely edge cases where it's not possible (cyclic dependencies, unusual package structure, from X import Y-style imports). Only reloads module-level variables.
Re-evaluate only the desired object?
eval allows you to evaluate dynamically generated expressions. Instead could you re-evaluate an existing object's static expression, given a dynamic context (builtins._)? I guess this would involve recursively re-evaluating the object, and every object referenced in its definition, and every object referenced in their definitions...
I looked through the inspect module and didn't find any obvious solution.
Problems: Not sure if this is possible. Security issues with eval and similar.
Delayed evaluation
Lazy evaluation
The Flask-Babel project provides a LazyString that delays evaluation of a translated string. If it could be completely delayed until step 2, that seems like the cleanest solution.
Problems: A LazyString can still get evaluated before it's supposed to. Lots of things may call its __str__ function and trigger evaluation, such as string formatting and concatenating.
Deferred translation
The python gettext docs demonstrate temporarily re-defining the _ function, and only calling the actual translation function when the translated string is needed.
Problems: Requires knowledge of the object's structure, and code customized to each object, to find the strings to translate. Doesn't allow concatenation or formatting of translated strings.
Refactoring
All translated strings could be factored out into a separate module, or moved to functions such that they can be completely evaluated at a given time.
Problems: As I understand it the point of gettext and the global _ function is to minimize the impact of translation on existing code. Refactoring like this could require significant design changes and make the code more confusing.
The only plausible, general approach is to rewrite all relevant code to not only use _ to request translation but to never cache the result. That’s not a fun idea and it’s not a new idea—you already list Refactoring and Deferred translation that rely on the cooperation of the gettext clients—but it is the “best way […] in practice”.
You can try to do a super-reload by removing many things from sys.modules and then doing a real reimport. This approach avoids understanding the import relationships, but works only if the relevant modules are all written in Python and you can guarantee that the state of your program will retain no references to any objects (including types and modules) that used the old language. (I’ve done this, but only in a context where the overarching program was a sort of supervisor utterly uninterested in the features of the discarded modules.)
You can try to walk the whole object graph and replace the strings, but even aside from the intrinsic technical difficulty of such an algorithm (consider __slots__ in base classes and co_consts for just the mildest taste), it would involve untranslating them, which changes from hard to impossible when some sort of transformation has already been performed. That transformation might just be concatenating the translated strings, or it might be pre-substituting known values to format, or padding the string, or storing a hash of it: it’s certainly undecidable in general. (I’ve done this too for other data types, but only with data constructed by a file reader whose output used known, simple structures.)
Any approach based on partial reevaluation combines the problems of the methods above.
The only other possible approach is a super-LazyString that refuses to translate for longer by implementing operations like + to return objects that encode the transformations to eventually apply, but it’s impossible to know when to force those operations unless you control all mechanisms used to display or transmit strings. It’s also impossible to defer past, say, if len(_("…"))>80:.
As I learn more about Python I am starting to get into the realm of classes. I have been reading on how to properly call a class and how to import the module or package.module but I was wondering if it is really needed to do this.
My question is this: Is it required to move your class to a separate module for a functional reason or is it solely for readability? I can perform all the same task using defined functions within my main module so what is the need for the class if any outside of readability?
Modules are structuring tools that provide encapsulation. In other words, modules are structures that combine your logic and data into one compartment, in the module itself. When you code a module, you should be consistent. To make a module consistent you must define its purpose: does my module provide tools? What type of tools? String tools? Numericals tools...?
For example, let's assume you're coding a program that processes numbers. Typically, you would use the builtin math module, and for some specialized purposes you might need to code some functions and classes that process your numbers according to your needs. If you read the documentation of math builtin module, you'll find math defines classes ad functions that relate to math but no classes or functions that process strings for instance, this is cohesion--unifying the purpose of your module. Keep in mind, maximizing cohesion, minimizes coupling. That's, when you keep your module unified, you make it less likely to be dependent on other modules.
Is it required to move your Class to a separate module for a functional reason or is it solely for readability?
If that specific class doesn't relate to your module, then you're probably better off moving that class to another module. Definitely, This is not a valid statement all the time. Suppose you're coding a relatively small program and you don't really need to define a large number of tools that you'll use in your small program, coding your class in your main module doesn't hurt at all. In larger applications where you need to write dozens of tools on the other hand, it's better to split your program to modules with specified purposes, myStringTools, myMath, main and many other modules. Structuring your program with modules and packages enhances maintenance.
If you need to delve deeper read about Modular programming, it'll help you grasp the idea even better.
You can do as you please. If the code for your classes is short, putting them all in your main script is fine. If they're longish, then splitting them out into separate files is a useful organizing technique (that has the added benefit of the code in them no getting recompiled into byte-code everytime the the script they are used in is run.
Putting them in modules also encourages their reuse since they're no longer mixed in with a lot of other unrelated stuff.
Lastly, they may be useful because modules are esstentially singleton objects, meaning that there's only once instance of them in your program which is created the first time it's imported. Later imports in other modules will just reuse the existing instance. This can be a nice way to do initialize that only has to be done once.
I am a Java programmer and I have always created separate files for Classes, I am attempting to learn python and I want to learn it right.
Is it costly in python to put Classes in different files, meaning one file contains only one class. I read in a blog that it is costly because resolution of . operator happens at runtime in python (It happens at compile time for Java).
Note: I did read in other posts that we can put them in separate files but they don't mention if they are costlier in any way
It is slightly more costly, but not to an extent you are likely to care. You can negate this extra cost by doing:
from module import Class
As then the class will be assigned to a variable in the local namespace, meaning it doesn't have to do the lookup through the module.
In reality, however, this is unlikely to be important. The cost of looking up something like this is going to be tiny, and you should focus on doing what makes your code the most readable. Split classes across modules and packages as is logical for your program, and as it keeps them clear.
If, for example, you are using something repeatedly in a loop which is a bottleneck for your program, you can assign it to a local variable for that loop, e.g:
import module
...
some_important_thing = module.some_important_thing
#Bottleneck loop
for item in items:
#module.some_important_thing()
some_important_thing()
Note that this kind of optimisation is unlikely to be the important thing, and you should only ever optimise where you have proof you need to do so.
In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?
Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.
As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)
In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)
Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.
My first "serious" language was Java, so I have comprehended object-oriented programming in sense that elemental brick of program is a class.
Now I write on VBA and Python. There are module languages and I am feeling persistent discomfort: I don't know how should I decompose program in a modules/classes.
I understand that one module corresponds to one knowledge domain, one module should ba able to test separately...
Should I apprehend module as namespace(c++) only?
I don't do VBA but in python, modules are fundamental. As you say, the can be viewed as namespaces but they are also objects in their own right. They are not classes however, so you cannot inherit from them (at least not directly).
I find that it's a good rule to keep a module concerned with one domain area. The rule that I use for deciding if something is a module level function or a class method is to ask myself if it could meaningfully be used on any objects that satisfy the 'interface' that it's arguments take. If so, then I free it from a class hierarchy and make it a module level function. If its usefulness truly is restricted to a particular class hierarchy, then I make it a method.
If you need it work on all instances of a class hierarchy and you make it a module level function, just remember that all the the subclasses still need to implement the given interface with the given semantics. This is one of the tradeoffs of stepping away from methods: you can no longer make a slight modification and call super. On the other hand, if subclasses are likely to redefine the interface and its semantics, then maybe that particular class hierarchy isn't a very good abstraction and should be rethought.
It is matter of taste. If you use modules your 'program' will be more procedural oriented. If you choose classes it will be more or less object oriented. I'm working with Excel for couple of months and personally I choose classes whenever I can because it is more comfortable to me. If you stop thinking about objects and think of them as Components you can use them with elegance. The main reason why I prefer classes is that you can have it more that one. You can't have two instances of module. It allows me use encapsulation and better code reuse.
For example let's assume that you like to have some kind of logger, to log actions that were done by your program during execution. You can write a module for that. It can have for example a global variable indicating on which particular sheet logging will be done. But consider the following hypothetical situation: your client wants you to include some fancy report generation functionality in your program. You are smart so you figure out that you can use your logging code to prepare them. But you can't do log and report simultaneously by one module. And you can with two instances of logging Component without any changes in their code.
Idioms of languages are different and thats the reason a problem solved in different languages take different approaches.
"C" is all about procedural decomposition.
Main idiom in Java is about "class or Object" decomposition. Functions are not absent, but they become a part of exhibited behavior of these classes.
"Python" provides support for both Class based problem decomposition as well as procedural based.
All of these uses files, packages or modules as concept for organizing large code pieces together. There is nothing that restricts you to have one module for one knowledge domain.
These are decomposition and organizing techniques and can be applied based on the problem at hand.
If you are comfortable with OO, you should be able to use it very well in Python.
VBA also allows the use of classes. Unfortunately, those classes don't support all the features of a full-fleged object oriented language. Especially inheritance is not supported.
But you can work with interfaces, at least up to a certain degree.
I only used modules like "one module = one singleton". My modules contain "static" or even stateless methods. So in my opinion a VBa module is not namespace. More often a bunch of classes and modules would form a "namespace". I often create a new project (DLL, DVB or something similar) for such a "namespace".