I am a Java programmer and I have always created separate files for Classes, I am attempting to learn python and I want to learn it right.
Is it costly in python to put Classes in different files, meaning one file contains only one class. I read in a blog that it is costly because resolution of . operator happens at runtime in python (It happens at compile time for Java).
Note: I did read in other posts that we can put them in separate files but they don't mention if they are costlier in any way
It is slightly more costly, but not to an extent you are likely to care. You can negate this extra cost by doing:
from module import Class
As then the class will be assigned to a variable in the local namespace, meaning it doesn't have to do the lookup through the module.
In reality, however, this is unlikely to be important. The cost of looking up something like this is going to be tiny, and you should focus on doing what makes your code the most readable. Split classes across modules and packages as is logical for your program, and as it keeps them clear.
If, for example, you are using something repeatedly in a loop which is a bottleneck for your program, you can assign it to a local variable for that loop, e.g:
import module
...
some_important_thing = module.some_important_thing
#Bottleneck loop
for item in items:
#module.some_important_thing()
some_important_thing()
Note that this kind of optimisation is unlikely to be the important thing, and you should only ever optimise where you have proof you need to do so.
Related
I have read lots of posts about using Python gettext, but none of them addressed the issue of changing languages at runtime.
Using gettext, strings are translated by the function _() which is added globally to builtins. The definition of _ is language-specific and will change during execution when the language setting changes. At certain points in the code, I need strings in an object to be translated to a certain language. This happens by:
(Re)define the _ function in builtins to translate to the chosen language
(Re)evaluate the desired object using the new _ function - guaranteeing that any calls to _ within the object definition are evaluated using the current definition of _.
Return the object
I am wondering about different approaches to step 2. I thought of several but they all seem to have fundamental flaws.
What is the best way to achieve step 2 in practice?
Is it theoretically possible to achieve step 2 for any arbitrary object, without knowledge of its implementation?
Possible approaches
If all translated text is defined in functions that can be called in step 2, then it's straightforward: calling the function will evaluate using the current definition of _. But there are lots of situations where that's not the case, for instance, translated strings could be module-level variables evaluated at import time, or attributes evaluated when instantiating an object.
Minimal example of this problem with module-level variables is here.
Re-evaluation
Manually reload modules
Module-level variables can be re-evaluated at the desired time using importlib.reload. This gets more complicated if the module imports another module that also has translated strings. You have to reload every module that's a (nested) dependency.
With knowledge of the module's implementation, you can manually reload the dependencies in the right order: if A imports B,
importlib.reload(B)
importlib.reload(A)
# use A...
Problems: Requires knowledge of the module's implementation. Only reloads module-level variables.
Automatically reload modules
Without knowledge of the module's implementation, you'd need to automate reloading dependencies in the right order. You could do this for every module in the package, or just the (recursive) dependencies. To handle more complex situations, you'd need to generate a dependency graph and reload modules in breadth-first order from the roots.
Problems: Requires complex reloading algorithm. There are likely edge cases where it's not possible (cyclic dependencies, unusual package structure, from X import Y-style imports). Only reloads module-level variables.
Re-evaluate only the desired object?
eval allows you to evaluate dynamically generated expressions. Instead could you re-evaluate an existing object's static expression, given a dynamic context (builtins._)? I guess this would involve recursively re-evaluating the object, and every object referenced in its definition, and every object referenced in their definitions...
I looked through the inspect module and didn't find any obvious solution.
Problems: Not sure if this is possible. Security issues with eval and similar.
Delayed evaluation
Lazy evaluation
The Flask-Babel project provides a LazyString that delays evaluation of a translated string. If it could be completely delayed until step 2, that seems like the cleanest solution.
Problems: A LazyString can still get evaluated before it's supposed to. Lots of things may call its __str__ function and trigger evaluation, such as string formatting and concatenating.
Deferred translation
The python gettext docs demonstrate temporarily re-defining the _ function, and only calling the actual translation function when the translated string is needed.
Problems: Requires knowledge of the object's structure, and code customized to each object, to find the strings to translate. Doesn't allow concatenation or formatting of translated strings.
Refactoring
All translated strings could be factored out into a separate module, or moved to functions such that they can be completely evaluated at a given time.
Problems: As I understand it the point of gettext and the global _ function is to minimize the impact of translation on existing code. Refactoring like this could require significant design changes and make the code more confusing.
The only plausible, general approach is to rewrite all relevant code to not only use _ to request translation but to never cache the result. That’s not a fun idea and it’s not a new idea—you already list Refactoring and Deferred translation that rely on the cooperation of the gettext clients—but it is the “best way […] in practice”.
You can try to do a super-reload by removing many things from sys.modules and then doing a real reimport. This approach avoids understanding the import relationships, but works only if the relevant modules are all written in Python and you can guarantee that the state of your program will retain no references to any objects (including types and modules) that used the old language. (I’ve done this, but only in a context where the overarching program was a sort of supervisor utterly uninterested in the features of the discarded modules.)
You can try to walk the whole object graph and replace the strings, but even aside from the intrinsic technical difficulty of such an algorithm (consider __slots__ in base classes and co_consts for just the mildest taste), it would involve untranslating them, which changes from hard to impossible when some sort of transformation has already been performed. That transformation might just be concatenating the translated strings, or it might be pre-substituting known values to format, or padding the string, or storing a hash of it: it’s certainly undecidable in general. (I’ve done this too for other data types, but only with data constructed by a file reader whose output used known, simple structures.)
Any approach based on partial reevaluation combines the problems of the methods above.
The only other possible approach is a super-LazyString that refuses to translate for longer by implementing operations like + to return objects that encode the transformations to eventually apply, but it’s impossible to know when to force those operations unless you control all mechanisms used to display or transmit strings. It’s also impossible to defer past, say, if len(_("…"))>80:.
As I learn more about Python I am starting to get into the realm of classes. I have been reading on how to properly call a class and how to import the module or package.module but I was wondering if it is really needed to do this.
My question is this: Is it required to move your class to a separate module for a functional reason or is it solely for readability? I can perform all the same task using defined functions within my main module so what is the need for the class if any outside of readability?
Modules are structuring tools that provide encapsulation. In other words, modules are structures that combine your logic and data into one compartment, in the module itself. When you code a module, you should be consistent. To make a module consistent you must define its purpose: does my module provide tools? What type of tools? String tools? Numericals tools...?
For example, let's assume you're coding a program that processes numbers. Typically, you would use the builtin math module, and for some specialized purposes you might need to code some functions and classes that process your numbers according to your needs. If you read the documentation of math builtin module, you'll find math defines classes ad functions that relate to math but no classes or functions that process strings for instance, this is cohesion--unifying the purpose of your module. Keep in mind, maximizing cohesion, minimizes coupling. That's, when you keep your module unified, you make it less likely to be dependent on other modules.
Is it required to move your Class to a separate module for a functional reason or is it solely for readability?
If that specific class doesn't relate to your module, then you're probably better off moving that class to another module. Definitely, This is not a valid statement all the time. Suppose you're coding a relatively small program and you don't really need to define a large number of tools that you'll use in your small program, coding your class in your main module doesn't hurt at all. In larger applications where you need to write dozens of tools on the other hand, it's better to split your program to modules with specified purposes, myStringTools, myMath, main and many other modules. Structuring your program with modules and packages enhances maintenance.
If you need to delve deeper read about Modular programming, it'll help you grasp the idea even better.
You can do as you please. If the code for your classes is short, putting them all in your main script is fine. If they're longish, then splitting them out into separate files is a useful organizing technique (that has the added benefit of the code in them no getting recompiled into byte-code everytime the the script they are used in is run.
Putting them in modules also encourages their reuse since they're no longer mixed in with a lot of other unrelated stuff.
Lastly, they may be useful because modules are esstentially singleton objects, meaning that there's only once instance of them in your program which is created the first time it's imported. Later imports in other modules will just reuse the existing instance. This can be a nice way to do initialize that only has to be done once.
This question already has answers here:
Should import statements always be at the top of a module?
(22 answers)
Closed 8 years ago.
What are the pros and cons of importing a Python module and/or function inside of a function, with respect to efficiency of speed and of memory?
Does it re-import every time the function is run, or perhaps just once at the beginning whether or not the function is run?
Does it re-import every time the function is run?
No; or rather, Python modules are essentially cached every time they are imported, so importing a second (or third, or fourth...) time doesn't actually force them to go through the whole import process again. 1
Does it import once at the beginning whether or not the function is run?
No, it is only imported if and when the function is executed. 2, 3
As for the benefits: it depends, I guess. If you may only run a function very rarely and don't need the module imported anywhere else, it may be beneficial to only import it in that function. Or if there is a name clash or other reason you don't want the module or symbols from the module available everywhere, you may only want to import it in a specific function. (Of course, there's always from my_module import my_function as f for those cases.)
In general practice, it's probably not that beneficial. In fact, most Python style guides encourage programmers to place all imports at the beginning of the module file.
The very first time you import goo from anywhere (inside or outside a function), goo.py (or other importable form) is loaded and sys.modules['goo'] is set to the module object thus built. Any future import within the same run of the program (again, whether inside or outside a function) just look up sys.modules['goo'] and bind it to barename goo in the appropriate scope. The dict lookup and name binding are very fast operations.
Assuming the very first import gets totally amortized over the program's run anyway, having the "appropriate scope" be module-level means each use of goo.this, goo.that, etc, is two dict lookups -- one for goo and one for the attribute name. Having it be "function level" pays one extra local-variable setting per run of the function (even faster than the dictionary lookup part!) but saves one dict lookup (exchanging it for a local-variable lookup, blazingly fast) for each goo.this (etc) access, basically halving the time such lookups take.
We're talking about a few nanoseconds one way or another, so it's hardly a worthwhile optimization. The one potentially substantial advantage of having the import within a function is when that function may well not be needed at all in a given run of the program, e.g., that function deals with errors, anomalies, and rare situations in general; if that's the case, any run that does not need the functionality will not even perform the import (and that's a saving of microseconds, not just nanoseconds), only runs that do need the functionality will pay the (modest but measurable) price.
It's still an optimization that's only worthwhile in pretty extreme situations, and there are many others I would consider before trying to squeeze out microseconds in this way.
It imports once when the function executes first time.
Pros:
imports related to the function they're used in
easy to move functions around the package
Cons:
couldn't see what modules this module might depend on
Might I suggest in general that instead of asking, "Will X improve my performance?" you use profiling to determine where your program is actually spending its time and then apply optimizations according to where you'll get the most benefit?
And then you can use profiling to assure that your optimizations have actually benefited you, too.
Importing inside a function will effectively import the module once.. the first time the function is run.
It ought to import just as fast whether you import it at the top, or when the function is run. This isn't generally a good reason to import in a def. Pros? It won't be imported if the function isn't called.. This is actually a reasonable reason if your module only requires the user to have a certain module installed if they use specific functions of yours...
If that's not he reason you're doing this, it's almost certainly a yucky idea.
It imports once when the function is called for the first time.
I could imagine doing it this way if I had a function in an imported module that is used very seldomly and is the only one requiring the import. Looks rather far-fetched, though...
I'm writing an application for scientific data analysis and I'm wondering what's the best way to structure the code to avoid (or address) the circular import problem. Currently I'm using a mix of OO and procedural programming.
Other questions address this issue but in a more abstract way. Here I'm looking for a solution that is optimal in a more specific context.
I have a class Container defined in DataLib.py whose data consist in lists and/or arrays. With all methods and supporting functions DataLib.py is quite large (~1000 lines).
I have a second module SelectionLib.py (~400 lines) that contains only functions to "filter" the data in Container according to different criteria. These functions return new Container objects (with filtered data) and thus SelectionLib.py needs to import Container from DataLib.py. Note that, logically, these functions are "methods" for "Container", they are just implemented using python functions.
Now, I want to add some high level method to Container so that a complex analysis can be performed with a single function of method call. And by "complex analysis" I mean an arbitrary number of Container methods call, local function (defined in DataLib.py) and filter functions (defined inSelectionLib.py).
So the problem is that DataLib.py needs to import SelectionLib.py to use the filter functions, but SelectionLib.py already imports DataLib.py.
Right know my hackish solution is to run the two files with run -i ... from IPython so it is like having a big single file and I avoid the circular import. But at the same time this scripts are difficult to integrate for example in a GUI.
How do you suggest to solve this problem:
use pure OO and inheritance and split the object in 3: CoreContainer -> SelectionContainer -> HighLevelContainer
Restructuring the code (everything in one file?)
Some sort of Import trickery (put imports at the end)
Any feedback is appreciated!
If functions in SelectionLib are, as you say, "methods" for Container, it seems reasonable that DataLib imports SelectionLib, not the other way around.
Then the user code would just import DataLib. This would require some refactoring. One possibility to minimize the disruption to the user code would be to rename your existing DataLib and SelectionLib to _DataLib and _SelectionLib, and have a new DataLib to import the necessary bits from either (or both).
As an aside, it's better to follow the PEP-8 conventions and name your modules in lowercase_with_underscores.
In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?
Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.
As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)
In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)
Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.