Is monkeypatching stdlib methods a good practice in Python? [closed]

Is monkeypatching stdlib methods a good practice in Python? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Over time I found the need to override several stdlib methods from Python in order to overcome limitation or to add some missing functionality.
In all cases I added a wrapper function and replaced the original method from the module with my wrapper (the wrapper was calling the original method).
Why I did this? Just to be sure that all the calls to the method are using my new versions, even if these are called from other third-party modules.
I know that monkeypatching can be a bad thing but my question is if this is useful if you use it with care? Meaning that:
you still call the original methods, assuring that you are not missing anything when the original module is updated
you are not changing the original "meaning" of the methods
Examples:
add coloring support to python logging module.
make open() be able to recognize Unicode BOM masks when using text mode
adding logging support to os.system() or subprocess.Popen() - letting you output to console or/and redirect to another file.
implementing methods that are missing on your platform like os.chown() or os.lchown() that are missing on Windows.
Doing things like these appear to me as decent overrides but I would like to see how others are seeing them and specially what should be considered as an acceptable monkeypatch and what not.

None of these things seem to require monkeypatching. All of them seem to have better, more robust and reliable solutions.
Adding a logging handler is easy. No monkeypatch.
Fixing open is done this way.
from io import open
That was easy. No patch.
Logging to os.system()? I'd think that a simple "wrapper" function would be far better than a complex patch. Further, I'd use subprocess.Popen, since that's the recommended replacement.
Adding missing methods to mask OS differences (like os.chown()) seems like a better use for try/except. But that's just me. I like explicit rather than implicit.
On balance, I still can't see a good reason for monkeypatching.
I'd hate to be locked in to legacy code (like os.system) because I was too dependent on my monkeypatches.
The concept of "subclass" applies to modules as well as classes. You can easily write your own modules which (a) import and (b) extend existing modules. You then use your new modules because they provided extra features. You don't need to monkeypatch.
even if these are called from other third-party modules
Dreadful idea. You can easily break another module by altering built-in features. If you have read the other module and are sure the monkeypatches won't break then what you've found is this.
The "other" module should have had room for customization. It should have had a place for a "dependency injection" or Strategy design pattern. Good thinking.
Once you've found this, the "other" module can be fixed to allow this customization. It may be as simple as a documentation change explaining how to modify an object. It may be an additional
parameter for construction to insert your customization.
You can then provide the revised module to the authors to see if they'll support your small update to their module. Many classes can use extra help supporting a "dependency injection" or Strategy design for extensions.
If you have not read the other module and are not sure your monkeypatches work... well... we still have hope that the monkeypatches don't break anything.

Monkeypatching can be "the least of evils", sometimes -- mostly, when you need to test code which uses a subsystem that is not well designed for testability (doesn't support dependency injection &c). In those cases you will be monkeypatching (very temporarily, fortunately) in your test harness, and almost invariably monkeypatching with mocks or fakes for the purpose of isolating tests (i.e., making them unit tests, rather than integration tests).
This "bad but could be worse" use case does not appear to apply to your examples -- they can all be better architected by editing the application level code to call your appropriate wrapper functions (say myos.chown rather than the bare os.chown, for example) and putting your wrapper functions in your own intermediate modules (such as myown) that stand between the application level code and the standard library (or third-party extensions that you are thus wrapping -- there's nothing special about the standard library in this respect).
One problematic situation might arise when the "application level code" isn't really under your control -- it's a third party subsystem that you'd rather not modify. Nevertheless, I have found that in such situations modifying the third party subsystem to call wrappers (rather than the standard library functions directly) is way more productive in the long run -- then of course you submit the change to the maintainers of the third party subsystem in question, they roll your change into their subsystem's next release, and life gets better for everybody (you included, since once your changes are accepted they'll get routinely maintained and tested by others!-).
(As a side note, such wrappers may also be worth submitting as diffs to the standard library, but that is a different case since the standard library evolves very very slowly and cautiously, and in particular on the Python 2 line will never evolve any longer, since 2.7 is the last of that line and it's feature-frozen).
Of course, all of this presupposes an open-source culture. If for some mysterious reasons you're using a closed-source third party subsystem, therefore one which you cannot possibly maintain, then you are in another situation where monkey patching may be the lesser evil (but that's just because the evil of losing strategic control of your development by trusting in code you can't possibly maintain is such a bigger evil in itself;-). I've never found myself in this situation with a third-party package that was both closed-source and itself written in Python (if the latter condition doesn't hold your monkeypatches would do you no good;-).
Note that here the working definition of "closed-source" is really very strict: for example, even Microsoft 12+ years ago distributed sources of libraries such as MFC with Visual C++ (as their product was then called) -- closed-source because you couldn't redistribute their sources, but still, you DID have sources at hand, so when you met some terrible limitation or bug you COULD fix it (and submit the change to them for a future release, as well as publishing your change as a diff as long as it included absolutely none of their copyrighted code -- not trivial, but feasible).
Monkeypatching well beyond the strict confines within which such an approach is "the least of evil" is a frequent mistake of users of dynamic languages -- be careful not to fall into that trap yourself!

Related

practical considerations about backporting a feature from python 3.3 to 2.7 versus monkeypatching

I'm interested in a new smtplib feature introduced in python3.3: the ability to bind to an specific IP address in a multihomed machine (or a machine with multiple IP addresses).
Many building blocks I would like to use are not ported to 3.3 or have very unstable ports, so the idea of using 3.3 just because of this feature is not practical.
In order to backport this feature, I can patch or monkeypatch. I'm inclined to subclass smtplib.SMTP and monkeypatch the underlying socket, because it simplifies deployment, seems to be unlikely to affect the base class and is easier than a politically correct backport.
In the ruby world, monkeypatching are more tolerated, but in most python circles this dangerous-yet-frequently-useful technique is frowned upon.
My question is: have you ever faced such decision and/or would like to share some advice?
(I'm interested in the pros and cons of each approach)
[update]
pps actually, thinking some more, i always assumed monkey patching meant somehow modifying existing classes in-place so that the new code is invoked when loading from the standard location (i must admit, now that i think about it, i have no idea how you could do this). that is not what i am suggesting here - this subclass would be a new class, in my own module, used only by my own code. [andrew cooke]
Andrew, thanks for taking time to answer. This way I would be forking some SMTP.connect code, and it is also frowned upon because when the original library is updated, my forked code will not incorporate the change. I think of mokeypatching as something more surgical, but such updates also have potential to break the code if there is any refactoring around the monkeypatched code. Either forking or monkeypatching my jedi masters like not, to the dark side they lead. :-)
[update]
In the end I just wrote an SMTP proxy wich accepts an extended EHLO syntax allowing to choose the outgoing IP address:
s = SMTP('localhost', 8025)
# will use 173.22.213.16 as the outgoing IP address
s.ehlo('example.com 173.22.213.16')
Using twisted it was under 40 SLOC, twisted is amazing for network code, and I can do everything in 2.7, at the expense of running another process.

what i would do is subclass SMTP and replace the SMTP.connect() method with one that was almost identical, but which called self.sock.connect() with the source_address (and an extra argument to set that).
i am not completely sure what monkey-patching means (i have never used ruby), but i think the above is completely normal for python. if you weren't meant to be able to overwrite that method then it would be named with a leading underscore. i haven't used SMTP myself, but i have used the HTTP lib and have done similar things to add authentication to an HTTP server. this was paid, professional work, and i had no worries whatsoever about doing so (the simple HTTP server isn't intended for heavy use, but it's useful in certain cases).
(i can't really provide any discussion of pros and cons because i don't see what the alternative is - do you mean dig out the patch and apply it? i guess you could do that, but it might contain many more changes - if i knew a patch existed i would probably give it a quick read to make sure it was consistent with what i was doing, but that's all. creating a new, patched, "official" smtplib seems like a lot more work for no real gain - this isn't rocket-science code, it's just a bind parameter.)
ps i would provide any extra argument with a useful default value so that if called with old code, the call would still work.
source of SMTP, although i am unsure what version; socket docs; smtplib docs
pps actually, thinking some more, i always assumed monkey patching meant somehow modifying existing classes in-place so that the new code is invoked when loading from the standard location (i must admit, now that i think about it, i have no idea how you could do this). that is not what i am suggesting here - this subclass would be a new class, in my own module, used only by my own code.

writing python libraries: structure, naming and import best practices

Let's say I have to write several small to medium libraries for my company.
Does it make sense, in a pythonic way, to use the Java approach and prefix all of them with a common high-level package (ahem, module) such as achieving the following structure:
mycompany.mylibrary1.moduleA
mycompany.mylibrary1.moduleB.moduleD
mycompany.mylibrary2.moduleC
or is it better to simply go for:
mylibrary1.moduleA
mylibrary1.moduleB.moduleD
mylibrary2.moduleC
I see that most of the time the 2nd approach is used, but I was looking for a confirmation (or not) that's the way to go.
I could not find anything in this respect in PEP 008, beside:
The naming conventions of Python's library are a bit of a mess, so we'll never get this completely consistent [...]
and then we get only modules and class naming indications, as well as the fact that relative imports are discouraged.
The fact that absolute imports are the way to go make the decision of how you organize your libraries really important (and I'm not here to discuss whether avoiding relative imports is good or bad).
I do like the Java approach that namespaces all of your libraries, but I have the impression it's not pythonic... what's the suggested way to go?
PS.
While in general 'best practice' questions are considered as subjective on SO, in Python the existance of the PEP makes them in my opinion very objective! Though the answer might be... there's not best practice for organizing your libraries..

There is a cognitive issue with python imports and names. Since there are so many different kinds of first-order objects (primitives, modules, classes, functions, classes pretending to be modules, modules pretending to be objects, et cetera) people tend to use dot-notation to indicate that an entity has "arrived" from an external source. Thus
import foo.bar
e = foo.bar.make_entity()
is felt to be clearer than
from foo.bar import make_entity
e = make_entity()
However, this means that you will frequently have to type the full path of whatever you need to access. "com.example.some.lib.module.frobnicate() gets tiresome quickly. Thus, a polite library supplies a reasonably short name for its access.

What are the most frustrating Python hacks to unwind, rewrite, etc.? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
My impression of Python from the short time I've been developing with it is that it's incredible powerful and flexible, but I can't help but feel like "with great power comes great responsibility." So while I've read numerous blog posts about simple and elegant Python snippets that solve a problems, I wonder if there are design patterns or abuses of Python language features that, once built into an application or library, cause the code to be incredibly brittle and near impossible to refactor.
So the question is basically what are the most frustrating, but somewhat common, Python "hacks" or language feature abuses that someone can introduce that will cause nightmares for future maintainers of that code?

Excessive usage of from module import *.
Having a lot of such imports at the module you don't know where each variable came from and have to look though all imported modules. Searching doesn't help much in this case.

Magic that works but not always. For example, when metaclasses are abused to create a DSL. Such DSL could be suitable for most tasks but breaks horribly on a complex (unexpected by author) one.

Using eval or exec on user input may be the most common abuse of Python features.

It's not a hack, but there's been a somewhat large issue with Python 2.X's print keyword.
People would rely on print to be called for output throughout an entire project, and then when it finally came time to, say, change output to a file and to stdout, they'd have to go in and refactor all those print keywords to another custom output function.
Python 3 solved this by making print an actual function rather than a keyword (therefore automatically making output loosely coupled to the rest of the system), so if need be you can replace the original print with a new print that does more than just write to stdout.
See PEP3105 for the specific reasoning from Guido and more details.

..what are the most frustrating, but somewhat common, Python "hacks" or language feature abuses that someone can introduce that will cause nightmares for future maintainers of that code?
Hard to refactor:
nested list comprehensions (as in: multiple levels deep).
Most people (when learning Python) are fascinated by the power and utility of list comprehensions. This can cause a tendency to over-use them and build deeply nested, complicated ones. Most of the time the same code should have been written with simple loops for readability and maintainability. I consider three levels already too deeply nested.
--
And also (not so hard to refactor but mostly irritating):
trying to use Python as if it was another language (without it's own specific constructs); e.g.:
for i in range(len(mylist)):
item = mylist[i]
# do stuff with item
instead of
for i, item in enumerate(mylist):
# do stuff with item
or even (why do you need the index anyway):
for item in mylist:
# do stuff with item
This includes: reinventing the wheel (badly) when functionality is already (aptly named) in the rich standard library.
And type-checking, making stuff impossible to subclass, etc...

The single biggest issue I've come across is use of double-leading-underscore attributes. The perpetrators are practically always new Python programmers or programmers who prefer another language (in particular Java, for some reason.) Double leading underscores causes the attributes to be name-mangled (using the current class name), avoiding collisions in subclasses. It's too frequently seen as 'private', even though it isn't. (See this answer I once wrote.) The same classes are usually littered with accessors -- not properties, but regular methods called directly -- to get at these name-mangled attributes. The end result is always a horribly convoluted class that's impossible to subclass to specialize or bugfix or monkeypatch or test.

Protection from accidentally misnaming object attributes in Python?

A friend was "burned" when starting to learn Python, and now sees the language as perhaps fatally flawed.
He was using a library and changed the value of an object's attribute (the class being in the library), but he used the wrong abbreviation for the attribute name. It took him "forever" to figure out what was wrong. His objection to Python is thus that it allows one to accidentally add attributes to an object.
Unit tests don't provide a solution to this. One doesn't write unit tests against an API being used. One may have a mock for the class, but the mock could have the same typo or incorrect assumption about the attribute name.
It's possible to use __setattr__() to guard against this, but (as far as I know), no one does.
The only thing I've been able to tell my friend is that after several years of writing Python code full-time, I don't recall ever being burned by this. What else can I tell him?

"changed the value of an object's attribute" Can lead to problems. This is pretty well known. You know it, now, also. That doesn't indict the language. It simply says that you've learned an important lesson in dynamic language programming.
Unit testing absolutely discovers this. You are not forced to mock all library classes. Some folks say it's only a unit test when it's tested in complete isolation. This is silly. You have to trust the library modules -- it's a feature of your architecture. Rather than mock them, just use them. (It is important to write mocks for your own newly-developed libraries. It's also important to mock libraries that make expensive API calls.)
In most cases, you can (and should) test your classes with the real library modules. This will find the misspelled attribute name.
Also, now that you know that attributes are dynamic, it's really easy to verify that the attribute exists. How?
Use interactive Python to explore the classes before writing too much code.
Remember, Python is not Java and it's not C. You can execute Python interactively and determine immediately if you've spelled something wrong. Writing a lot of code without doing any interactive confirmation is -- simply -- the wrong way to use Python.
A little interactive exploration will find misspelled attribute names.
Finally -- for your own classes -- you can wrap updatable attributes as properties. This makes it easier to debug any misspelled attribute names. Again, you know to check for this. You can use interactive development to confirm the attribute names.
Fussing around with __setattr__ creates problems. In some cases, we actually need to add attributes to an object. Why? It's simpler than creating a whole subclass for one special case where we have to maintain more state information.
Other things you can say:
I was burned by a C program that absolutely could not be made to work because of ______. [Insert any known C-language problem you want here. No array bounds checking, for example] Does that make C fatally flawed?
I was burned by a DBA who changed a column name and all the SQL broke. It's painful to unit test all of it. Does that make the relational database fatally flawed?
I was burned by a sys admin who changed a directory's permissions and my application broke. It was nearly impossible to find. Does that make the OS fatally flawed?
I was burned by a COBOL program where someone changed the copybook, forgot to recompile the program, and we couldn't debug it because the source looked perfect. COBOL, however, actually is fatally flawed, so this isn't a good example.

There are code analyzers like pylint that will warn you if you add a attribute outside of __init__. PyDev has nice support for it. Such errors are very easy to find with a debugger too.

If the possibility to make mistakes is enough for him to consider a language "fatally flawed", I don't think you can convince him otherwise. The more you can do with a language, the more you can do wrong with the language. It's a caveat of flexibility—but that's true for any language.

You can use the __slots__ class attribute to limit the attributes that instances have. Attempting to set an attribute that's not expliticly listed will raise an AttributeError. There are some complications that arise with subclassing. See the Python data model reference for details.

A tool like pylint or pychecker may be able to detect this.

He's effectively ruling out an entire class of programming languages -- dynamically-typed languages -- because of one hard lesson learned. He can use only statically-typed languages if he wishes and still have a very productive career as a programmer, but he is certainly going to have deep frustrations with them as well. Will he then conclude that they are fatally-flawed?

I think your friend has misplaced his frustration in the language. His real problem is lack of debugging techniques. teach him how to break down a program into small pieces to examine the output. like a manual unit test, this way any inconsistency is found and any assumptions are proven or discarded.

I had a similar bad experience with Python when I first started ... took me 3 months to get over it. Having a tool which warns would be nice back then ...

Is it OK to inspect properties beginning with underscore?

I've been working on a very simple crud generator for pylons. I came up with something that inspects
SomeClass._sa_class_manager.mapper.c
Is it ok to inspect this (or to call methods begining with underscore)? I always kind of assumed this is legal though frowned upon as it relies heavily on the internal structure of a class/object. But hey, since python does not really have interfaces in the Java sense maybe it is OK.

It is intentional (in Python) that there are no "private" scopes. It is a convention that anything that starts with an underscore should not ideally be used, and hence you may not complain if its behavior or definition changes in a next version.

In general, this usually indicates that the method is effectively internal, rather than part of the documented interface, and should not be relied on. Future versions of the library are free to rename or remove such methods, so if you care about future compatability without having to rewrite, avoid doing it.

If it works, why not? You could have problems though when _sa_class_manager gets restructured, binding yourself to this specific version of SQLAlchemy, or creating more work to track the changes. As SQLAlchemy is a fast moving target, you may be there in a year already.
The preferable way would be to integrate your desired API into SQLAlchemy itself.

It's generally not a good idea, for reasons already mentioned. However, Python deliberately allows this behaviour in case there is no other way of doing something.
For example, if you have a closed-source compiled Python library where the author didn't think you'd need direct access to a certain object's internal state—but you really do—you can still get at the information you need. You have the same problems mentioned before of keeping up with different versions (if you're lucky enough that it's still maintained) but at least you can actually do what you wanted to do.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.