writing python libraries: structure, naming and import best practices

writing python libraries: structure, naming and import best practices - python

Let's say I have to write several small to medium libraries for my company.
Does it make sense, in a pythonic way, to use the Java approach and prefix all of them with a common high-level package (ahem, module) such as achieving the following structure:
mycompany.mylibrary1.moduleA
mycompany.mylibrary1.moduleB.moduleD
mycompany.mylibrary2.moduleC
or is it better to simply go for:
mylibrary1.moduleA
mylibrary1.moduleB.moduleD
mylibrary2.moduleC
I see that most of the time the 2nd approach is used, but I was looking for a confirmation (or not) that's the way to go.
I could not find anything in this respect in PEP 008, beside:
The naming conventions of Python's library are a bit of a mess, so we'll never get this completely consistent [...]
and then we get only modules and class naming indications, as well as the fact that relative imports are discouraged.
The fact that absolute imports are the way to go make the decision of how you organize your libraries really important (and I'm not here to discuss whether avoiding relative imports is good or bad).
I do like the Java approach that namespaces all of your libraries, but I have the impression it's not pythonic... what's the suggested way to go?
PS.
While in general 'best practice' questions are considered as subjective on SO, in Python the existance of the PEP makes them in my opinion very objective! Though the answer might be... there's not best practice for organizing your libraries..

There is a cognitive issue with python imports and names. Since there are so many different kinds of first-order objects (primitives, modules, classes, functions, classes pretending to be modules, modules pretending to be objects, et cetera) people tend to use dot-notation to indicate that an entity has "arrived" from an external source. Thus
import foo.bar
e = foo.bar.make_entity()
is felt to be clearer than
from foo.bar import make_entity
e = make_entity()
However, this means that you will frequently have to type the full path of whatever you need to access. "com.example.some.lib.module.frobnicate() gets tiresome quickly. Thus, a polite library supplies a reasonably short name for its access.

Related

Should I treat my own single underscored attributes as private?

If I use a single leading underscore for an attribute in a class, would it be wrong of me to access it from a different object? Is a single underscore saying "I will use this as I please but you the user shouldn't touch it" or should even the developer treat it as though it were private?

The single underscore indicates that it's not for public consumption; code within the same package is welcome to poke and prod it.

I think you should refrain from doing so as much as reasonably possible because it breaks encapsulation which one of the benefits of the object-oriented paradigm. While some feel it's fine as certain higher levels of scoping, say module or package, I've found avoiding doing so a useful rule-of-thumb even within the various methods of a single class -- at least if it's fairly complicated, subtle, or just an aspect I think I might want to change later.
The reason is every time you do something with one, you're creating a dependency between one object's internals and where you're coding, be it another part of the same object or module or whatever. The more of this there is, the harder it will be later to maintain or enhance things.
Another aspect to consider is the fact that whenever you feel a need to do so, it may be indicative of a need for a better design -- in which cause you can treat it a warning sign and react accordingly.
Creating an interface for doing operations that deal with internal details often means having to design and implement more code than not doing so would. So in each case, the benefits must be weighed against the costs to determine if it's worth it. Eventually experience (and education if it's ongoing) makes such decisions easier maybe even instinctual.

When should a Python script be split into multiple files/modules?

In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?

Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.

As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)

In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)

Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.

Module vs object-oriented programming in vba

My first "serious" language was Java, so I have comprehended object-oriented programming in sense that elemental brick of program is a class.
Now I write on VBA and Python. There are module languages and I am feeling persistent discomfort: I don't know how should I decompose program in a modules/classes.
I understand that one module corresponds to one knowledge domain, one module should ba able to test separately...
Should I apprehend module as namespace(c++) only?

I don't do VBA but in python, modules are fundamental. As you say, the can be viewed as namespaces but they are also objects in their own right. They are not classes however, so you cannot inherit from them (at least not directly).
I find that it's a good rule to keep a module concerned with one domain area. The rule that I use for deciding if something is a module level function or a class method is to ask myself if it could meaningfully be used on any objects that satisfy the 'interface' that it's arguments take. If so, then I free it from a class hierarchy and make it a module level function. If its usefulness truly is restricted to a particular class hierarchy, then I make it a method.
If you need it work on all instances of a class hierarchy and you make it a module level function, just remember that all the the subclasses still need to implement the given interface with the given semantics. This is one of the tradeoffs of stepping away from methods: you can no longer make a slight modification and call super. On the other hand, if subclasses are likely to redefine the interface and its semantics, then maybe that particular class hierarchy isn't a very good abstraction and should be rethought.

It is matter of taste. If you use modules your 'program' will be more procedural oriented. If you choose classes it will be more or less object oriented. I'm working with Excel for couple of months and personally I choose classes whenever I can because it is more comfortable to me. If you stop thinking about objects and think of them as Components you can use them with elegance. The main reason why I prefer classes is that you can have it more that one. You can't have two instances of module. It allows me use encapsulation and better code reuse.
For example let's assume that you like to have some kind of logger, to log actions that were done by your program during execution. You can write a module for that. It can have for example a global variable indicating on which particular sheet logging will be done. But consider the following hypothetical situation: your client wants you to include some fancy report generation functionality in your program. You are smart so you figure out that you can use your logging code to prepare them. But you can't do log and report simultaneously by one module. And you can with two instances of logging Component without any changes in their code.

Idioms of languages are different and thats the reason a problem solved in different languages take different approaches.
"C" is all about procedural decomposition.
Main idiom in Java is about "class or Object" decomposition. Functions are not absent, but they become a part of exhibited behavior of these classes.
"Python" provides support for both Class based problem decomposition as well as procedural based.
All of these uses files, packages or modules as concept for organizing large code pieces together. There is nothing that restricts you to have one module for one knowledge domain.
These are decomposition and organizing techniques and can be applied based on the problem at hand.
If you are comfortable with OO, you should be able to use it very well in Python.

VBA also allows the use of classes. Unfortunately, those classes don't support all the features of a full-fleged object oriented language. Especially inheritance is not supported.
But you can work with interfaces, at least up to a certain degree.
I only used modules like "one module = one singleton". My modules contain "static" or even stateless methods. So in my opinion a VBa module is not namespace. More often a bunch of classes and modules would form a "namespace". I often create a new project (DLL, DVB or something similar) for such a "namespace".

Is monkeypatching stdlib methods a good practice in Python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Over time I found the need to override several stdlib methods from Python in order to overcome limitation or to add some missing functionality.
In all cases I added a wrapper function and replaced the original method from the module with my wrapper (the wrapper was calling the original method).
Why I did this? Just to be sure that all the calls to the method are using my new versions, even if these are called from other third-party modules.
I know that monkeypatching can be a bad thing but my question is if this is useful if you use it with care? Meaning that:
you still call the original methods, assuring that you are not missing anything when the original module is updated
you are not changing the original "meaning" of the methods
Examples:
add coloring support to python logging module.
make open() be able to recognize Unicode BOM masks when using text mode
adding logging support to os.system() or subprocess.Popen() - letting you output to console or/and redirect to another file.
implementing methods that are missing on your platform like os.chown() or os.lchown() that are missing on Windows.
Doing things like these appear to me as decent overrides but I would like to see how others are seeing them and specially what should be considered as an acceptable monkeypatch and what not.

None of these things seem to require monkeypatching. All of them seem to have better, more robust and reliable solutions.
Adding a logging handler is easy. No monkeypatch.
Fixing open is done this way.
from io import open
That was easy. No patch.
Logging to os.system()? I'd think that a simple "wrapper" function would be far better than a complex patch. Further, I'd use subprocess.Popen, since that's the recommended replacement.
Adding missing methods to mask OS differences (like os.chown()) seems like a better use for try/except. But that's just me. I like explicit rather than implicit.
On balance, I still can't see a good reason for monkeypatching.
I'd hate to be locked in to legacy code (like os.system) because I was too dependent on my monkeypatches.
The concept of "subclass" applies to modules as well as classes. You can easily write your own modules which (a) import and (b) extend existing modules. You then use your new modules because they provided extra features. You don't need to monkeypatch.
even if these are called from other third-party modules
Dreadful idea. You can easily break another module by altering built-in features. If you have read the other module and are sure the monkeypatches won't break then what you've found is this.
The "other" module should have had room for customization. It should have had a place for a "dependency injection" or Strategy design pattern. Good thinking.
Once you've found this, the "other" module can be fixed to allow this customization. It may be as simple as a documentation change explaining how to modify an object. It may be an additional
parameter for construction to insert your customization.
You can then provide the revised module to the authors to see if they'll support your small update to their module. Many classes can use extra help supporting a "dependency injection" or Strategy design for extensions.
If you have not read the other module and are not sure your monkeypatches work... well... we still have hope that the monkeypatches don't break anything.

Monkeypatching can be "the least of evils", sometimes -- mostly, when you need to test code which uses a subsystem that is not well designed for testability (doesn't support dependency injection &c). In those cases you will be monkeypatching (very temporarily, fortunately) in your test harness, and almost invariably monkeypatching with mocks or fakes for the purpose of isolating tests (i.e., making them unit tests, rather than integration tests).
This "bad but could be worse" use case does not appear to apply to your examples -- they can all be better architected by editing the application level code to call your appropriate wrapper functions (say myos.chown rather than the bare os.chown, for example) and putting your wrapper functions in your own intermediate modules (such as myown) that stand between the application level code and the standard library (or third-party extensions that you are thus wrapping -- there's nothing special about the standard library in this respect).
One problematic situation might arise when the "application level code" isn't really under your control -- it's a third party subsystem that you'd rather not modify. Nevertheless, I have found that in such situations modifying the third party subsystem to call wrappers (rather than the standard library functions directly) is way more productive in the long run -- then of course you submit the change to the maintainers of the third party subsystem in question, they roll your change into their subsystem's next release, and life gets better for everybody (you included, since once your changes are accepted they'll get routinely maintained and tested by others!-).
(As a side note, such wrappers may also be worth submitting as diffs to the standard library, but that is a different case since the standard library evolves very very slowly and cautiously, and in particular on the Python 2 line will never evolve any longer, since 2.7 is the last of that line and it's feature-frozen).
Of course, all of this presupposes an open-source culture. If for some mysterious reasons you're using a closed-source third party subsystem, therefore one which you cannot possibly maintain, then you are in another situation where monkey patching may be the lesser evil (but that's just because the evil of losing strategic control of your development by trusting in code you can't possibly maintain is such a bigger evil in itself;-). I've never found myself in this situation with a third-party package that was both closed-source and itself written in Python (if the latter condition doesn't hold your monkeypatches would do you no good;-).
Note that here the working definition of "closed-source" is really very strict: for example, even Microsoft 12+ years ago distributed sources of libraries such as MFC with Visual C++ (as their product was then called) -- closed-source because you couldn't redistribute their sources, but still, you DID have sources at hand, so when you met some terrible limitation or bug you COULD fix it (and submit the change to them for a future release, as well as publishing your change as a diff as long as it included absolutely none of their copyrighted code -- not trivial, but feasible).
Monkeypatching well beyond the strict confines within which such an approach is "the least of evil" is a frequent mistake of users of dynamic languages -- be careful not to fall into that trap yourself!

Code organization in Python: Where is a good place to put obscure methods?

I have a class called Path for which there are defined about 10 methods, in a dedicated module Path.py. Recently I had a need to write 5 more methods for Path, however these new methods are quite obscure and technical and 90% of the time they are irrelevant.
Where would be a good place to put them so their context is clear? Of course I can just put them with the class definition, but I don't like that because I like to keep the important things separate from the obscure things.
Currently I have these methods as functions that are defined in a separate module, just to keep things separate, but it would be better to have them as bound methods. (Currently they take the Path instance as an explicit parameter.)
Does anyone have a suggestion?

If the method is relevant to the Path - no matter how obscure - I think it should reside within the class itself.
If you have multiple places where you have path-related functionality, it leads to problems. For example, if you want to check if some functionality already exists, how will a new programmer know to check the other, less obvious places?
I think a good practice might be to order functions by importance. As you may have heard, some suggest putting public members of classes first, and private/protected ones after. You could consider putting the common methods in your class higher than the obscure ones.

If you're keen to put those methods in a different source file at any cost, AND keen to have them at methods at any cost, you can achieve both goals by using the different source file to define a mixin class and having your Path class import that method and multiply-inherit from that mixin. So, technically, it's quite feasible.
However, I would not recommend this course of action: it's worth using "the big guns" (such as multiple inheritance) only to serve important goals (such as reuse and removing repetition), and separating methods out in this way is not really a particularly crucial goal.
If those "obscure methods" played no role you would not be implementing them, so they must have SOME importance, after all; so I'd just clarify in docstrings and comments what they're for, maybe explicitly mention that they're rarely needed, and leave it at that.

I would just prepend the names with an underscore _, to show that the reader shouldn't bother about them.
It's conventionally the same thing as private members in other languages.

Put them in the Path class, and document that they are "obscure" either with comments or docstrings. Separate them at the end if you like.

Oh wait, I thought of something -- I can just define them in the Path.py module, where every obscure method will be a one-liner that will call the function from the separate module that currently exists. With this compromise, the obscure methods will comprise of maybe 10 lines in the end of the file instead of 50% of its bulk.

I suggest making them accessible from a property of the Path class called something like "Utilties". For example: Path.Utilities.RazzleDazzle. This will help with auto-completion tools and general maintenance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.