I've been working on a very simple crud generator for pylons. I came up with something that inspects
SomeClass._sa_class_manager.mapper.c
Is it ok to inspect this (or to call methods begining with underscore)? I always kind of assumed this is legal though frowned upon as it relies heavily on the internal structure of a class/object. But hey, since python does not really have interfaces in the Java sense maybe it is OK.
It is intentional (in Python) that there are no "private" scopes. It is a convention that anything that starts with an underscore should not ideally be used, and hence you may not complain if its behavior or definition changes in a next version.
In general, this usually indicates that the method is effectively internal, rather than part of the documented interface, and should not be relied on. Future versions of the library are free to rename or remove such methods, so if you care about future compatability without having to rewrite, avoid doing it.
If it works, why not? You could have problems though when _sa_class_manager gets restructured, binding yourself to this specific version of SQLAlchemy, or creating more work to track the changes. As SQLAlchemy is a fast moving target, you may be there in a year already.
The preferable way would be to integrate your desired API into SQLAlchemy itself.
It's generally not a good idea, for reasons already mentioned. However, Python deliberately allows this behaviour in case there is no other way of doing something.
For example, if you have a closed-source compiled Python library where the author didn't think you'd need direct access to a certain object's internal state—but you really do—you can still get at the information you need. You have the same problems mentioned before of keeping up with different versions (if you're lucky enough that it's still maintained) but at least you can actually do what you wanted to do.
Related
I am new to dynamic languages in general, and I have discovered that languages like Python prefer simple data structures, like dictionaries, for sending data between parts of a system (across functions, modules, etc).
In the C# world, when two parts of a system communicate, the developer defines a class (possibly one that implements an interface) that contains properties (like a Person class with a Name, Birth date, etc) where the sender in the system instantiates the class and assigns values to the properties. The receiver then accesses these properties. The class is called a DTO and it is "well- defined" and explicit. If I remove a property from the DTO's class, the compiler will instantly warn me of all parts of the code that use that DTO and are attempting to access what is now a non-existent property. I know exactly what has broken in my codebase.
In Python, functions that produce data (senders) create implicit DTOs by building up dictionaries and returning them. Coming from a compiled world, this scares me. I immediately think of the scenario of a large code base where a function producing a dictionary has the name of a key changed (or a key is removed altogether) and boom- tons of potential KeyErrors begin to crop up as pieces of the code base that work with that dictionary and expect a key are no longer able to access the data they were expecting. Without unit testing, the developer would have no reliable way of knowing where these errors will appear.
Maybe I misunderstand altogether. Are dictionaries a best practice tool for passing data around? If so, how do developers solve this kind of problem? How are implicit data structures and the functions that use them maintained? How do I become less afraid of what seems like a huge amount of uncertainty?
Coming from a compiled world, this scares me. I immediately think of
the scenario of a large code base where a function producing a
dictionary has the name of a key changed (or a key is removed
altogether) and boom- tons of potential KeyErrors begin to crop up as
pieces of the code base that work with that dictionary and expect a
key are no longer able to access the data they were expecting.
I would just like to highlight this part of your question, because I feel this is the main point you are trying to understand.
Python's development philosophy is a bit different; as objects can mutate without throwing errors (for example, you can add properties to instances without having them declared in the class) a common programming practice in Python is EAFP:
EAFP
Easier to ask for forgiveness than permission. This common Python
coding style assumes the existence of valid keys or attributes and
catches exceptions if the assumption proves false. This clean and fast
style is characterized by the presence of many try and except
statements. The technique contrasts with the LBYL style common to many
other languages such as C.
The LBYL referred to from the quote above is "Look Before You Leap":
LBYL
Look before you leap. This coding style explicitly tests for
pre-conditions before making calls or lookups. This style contrasts
with the EAFP approach and is characterized by the presence of many if
statements.
In a multi-threaded environment, the LBYL approach can risk
introducing a race condition between “the looking” and “the leaping”.
For example, the code, if key in mapping: return mapping[key] can fail
if another thread removes key from mapping after the test, but before
the lookup. This issue can be solved with locks or by using the EAFP
approach.
So I would say this is a bit of the norm and in Python you expect the objects will behave well and handle themselves with grace (mainly by throwing up lots of exceptions). Traditional "object hiding" and "interface contracts" are not what Python is all about. It is just like learning anything else, you have to acclimate to the programming environment and its rules.
The other part of your question:
Are dictionaries a best practice tool for passing data around? If so,
how do developers solve this kind of problem?
The answer here is depends on your problem domain. If your problem domain does not lend itself to custom objects, then you can pass around any kind of container (lists, tuples, dictionaries) around. If however all you have to pass around decorated data ("rich" data) is objects, then your code becomes littered with classes that don't define behavior but rather properties of things.
Oh, by the way - this getting of keys and raising KeyError problem is already solved, as Python dictionaries have a get method, which can return a default value (it returns the sentinel None object by default) when a key doesn't exist:
>>> d = {'a': 'b'}
>>> d['b']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'b'
>>> d.get('b') # None is returned, which is the only
# object that is not printed by the
# Python interactive interpreter.
>>> d.get('b','default')
'default'
When using Python for a large project using automated testing is a must because otherwise you would never dare to do any serious refactoring and the code base will rot in no time as all your changes will always try to touch nothing leading to bad solution (simply because you'd be too scared to implement the correct solution instead).
Indeed the above is true even with C++ or, as it often happens with large projects, with mixed-languages solutions.
Not longer than a few HOURS ago for example I had to make a branch for a four line bugfix (one of the lines is a single brace) for a specific customer because the trunk evolved too much from the version he has in production and the guy in charge of the release process told me his use cases have not yet been covered with manual testing in current version and therefore I cannot upgrade his installation.
The compiler can tell you something, but it cannot provide any confidence in stability after a refactoring if the software is complex. The idea that if some piece of code compiles then it's correct is bogus (possibly with the exception of hello_world.cpp).
That said you normally don't use dictionaries in Python for everything, unless you really care about the dynamic aspect (but in this case the code doesn't access the dictionary with a literal key). If your Python code has a lot of d["foo"] instead of d[k] when using dicts then I'd say there is a smell of a design problem.
I don't think passing dictionaries is the only way to pass structured data across parts of a system. I've seen lots of people use classes for that. Actually namedtuple is a good fit for that as well.
Without unit testing, the developer would have no reliable way of
knowing where these errors will appear
Now why would you not write unit tests?
In Python you don't rely on a compiler to catch your errors. If you really need static checking of your code, you can use one of several static analysis tools out there (see this question)
If I use a single leading underscore for an attribute in a class, would it be wrong of me to access it from a different object? Is a single underscore saying "I will use this as I please but you the user shouldn't touch it" or should even the developer treat it as though it were private?
The single underscore indicates that it's not for public consumption; code within the same package is welcome to poke and prod it.
I think you should refrain from doing so as much as reasonably possible because it breaks encapsulation which one of the benefits of the object-oriented paradigm. While some feel it's fine as certain higher levels of scoping, say module or package, I've found avoiding doing so a useful rule-of-thumb even within the various methods of a single class -- at least if it's fairly complicated, subtle, or just an aspect I think I might want to change later.
The reason is every time you do something with one, you're creating a dependency between one object's internals and where you're coding, be it another part of the same object or module or whatever. The more of this there is, the harder it will be later to maintain or enhance things.
Another aspect to consider is the fact that whenever you feel a need to do so, it may be indicative of a need for a better design -- in which cause you can treat it a warning sign and react accordingly.
Creating an interface for doing operations that deal with internal details often means having to design and implement more code than not doing so would. So in each case, the benefits must be weighed against the costs to determine if it's worth it. Eventually experience (and education if it's ongoing) makes such decisions easier maybe even instinctual.
I'm interested in a new smtplib feature introduced in python3.3: the ability to bind to an specific IP address in a multihomed machine (or a machine with multiple IP addresses).
Many building blocks I would like to use are not ported to 3.3 or have very unstable ports, so the idea of using 3.3 just because of this feature is not practical.
In order to backport this feature, I can patch or monkeypatch. I'm inclined to subclass smtplib.SMTP and monkeypatch the underlying socket, because it simplifies deployment, seems to be unlikely to affect the base class and is easier than a politically correct backport.
In the ruby world, monkeypatching are more tolerated, but in most python circles this dangerous-yet-frequently-useful technique is frowned upon.
My question is: have you ever faced such decision and/or would like to share some advice?
(I'm interested in the pros and cons of each approach)
[update]
pps actually, thinking some more, i always assumed monkey patching meant somehow modifying existing classes in-place so that the new code is invoked when loading from the standard location (i must admit, now that i think about it, i have no idea how you could do this). that is not what i am suggesting here - this subclass would be a new class, in my own module, used only by my own code. [andrew cooke]
Andrew, thanks for taking time to answer. This way I would be forking some SMTP.connect code, and it is also frowned upon because when the original library is updated, my forked code will not incorporate the change. I think of mokeypatching as something more surgical, but such updates also have potential to break the code if there is any refactoring around the monkeypatched code. Either forking or monkeypatching my jedi masters like not, to the dark side they lead. :-)
[update]
In the end I just wrote an SMTP proxy wich accepts an extended EHLO syntax allowing to choose the outgoing IP address:
s = SMTP('localhost', 8025)
# will use 173.22.213.16 as the outgoing IP address
s.ehlo('example.com 173.22.213.16')
Using twisted it was under 40 SLOC, twisted is amazing for network code, and I can do everything in 2.7, at the expense of running another process.
what i would do is subclass SMTP and replace the SMTP.connect() method with one that was almost identical, but which called self.sock.connect() with the source_address (and an extra argument to set that).
i am not completely sure what monkey-patching means (i have never used ruby), but i think the above is completely normal for python. if you weren't meant to be able to overwrite that method then it would be named with a leading underscore. i haven't used SMTP myself, but i have used the HTTP lib and have done similar things to add authentication to an HTTP server. this was paid, professional work, and i had no worries whatsoever about doing so (the simple HTTP server isn't intended for heavy use, but it's useful in certain cases).
(i can't really provide any discussion of pros and cons because i don't see what the alternative is - do you mean dig out the patch and apply it? i guess you could do that, but it might contain many more changes - if i knew a patch existed i would probably give it a quick read to make sure it was consistent with what i was doing, but that's all. creating a new, patched, "official" smtplib seems like a lot more work for no real gain - this isn't rocket-science code, it's just a bind parameter.)
ps i would provide any extra argument with a useful default value so that if called with old code, the call would still work.
source of SMTP, although i am unsure what version; socket docs; smtplib docs
pps actually, thinking some more, i always assumed monkey patching meant somehow modifying existing classes in-place so that the new code is invoked when loading from the standard location (i must admit, now that i think about it, i have no idea how you could do this). that is not what i am suggesting here - this subclass would be a new class, in my own module, used only by my own code.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Over time I found the need to override several stdlib methods from Python in order to overcome limitation or to add some missing functionality.
In all cases I added a wrapper function and replaced the original method from the module with my wrapper (the wrapper was calling the original method).
Why I did this? Just to be sure that all the calls to the method are using my new versions, even if these are called from other third-party modules.
I know that monkeypatching can be a bad thing but my question is if this is useful if you use it with care? Meaning that:
you still call the original methods, assuring that you are not missing anything when the original module is updated
you are not changing the original "meaning" of the methods
Examples:
add coloring support to python logging module.
make open() be able to recognize Unicode BOM masks when using text mode
adding logging support to os.system() or subprocess.Popen() - letting you output to console or/and redirect to another file.
implementing methods that are missing on your platform like os.chown() or os.lchown() that are missing on Windows.
Doing things like these appear to me as decent overrides but I would like to see how others are seeing them and specially what should be considered as an acceptable monkeypatch and what not.
None of these things seem to require monkeypatching. All of them seem to have better, more robust and reliable solutions.
Adding a logging handler is easy. No monkeypatch.
Fixing open is done this way.
from io import open
That was easy. No patch.
Logging to os.system()? I'd think that a simple "wrapper" function would be far better than a complex patch. Further, I'd use subprocess.Popen, since that's the recommended replacement.
Adding missing methods to mask OS differences (like os.chown()) seems like a better use for try/except. But that's just me. I like explicit rather than implicit.
On balance, I still can't see a good reason for monkeypatching.
I'd hate to be locked in to legacy code (like os.system) because I was too dependent on my monkeypatches.
The concept of "subclass" applies to modules as well as classes. You can easily write your own modules which (a) import and (b) extend existing modules. You then use your new modules because they provided extra features. You don't need to monkeypatch.
even if these are called from other third-party modules
Dreadful idea. You can easily break another module by altering built-in features. If you have read the other module and are sure the monkeypatches won't break then what you've found is this.
The "other" module should have had room for customization. It should have had a place for a "dependency injection" or Strategy design pattern. Good thinking.
Once you've found this, the "other" module can be fixed to allow this customization. It may be as simple as a documentation change explaining how to modify an object. It may be an additional
parameter for construction to insert your customization.
You can then provide the revised module to the authors to see if they'll support your small update to their module. Many classes can use extra help supporting a "dependency injection" or Strategy design for extensions.
If you have not read the other module and are not sure your monkeypatches work... well... we still have hope that the monkeypatches don't break anything.
Monkeypatching can be "the least of evils", sometimes -- mostly, when you need to test code which uses a subsystem that is not well designed for testability (doesn't support dependency injection &c). In those cases you will be monkeypatching (very temporarily, fortunately) in your test harness, and almost invariably monkeypatching with mocks or fakes for the purpose of isolating tests (i.e., making them unit tests, rather than integration tests).
This "bad but could be worse" use case does not appear to apply to your examples -- they can all be better architected by editing the application level code to call your appropriate wrapper functions (say myos.chown rather than the bare os.chown, for example) and putting your wrapper functions in your own intermediate modules (such as myown) that stand between the application level code and the standard library (or third-party extensions that you are thus wrapping -- there's nothing special about the standard library in this respect).
One problematic situation might arise when the "application level code" isn't really under your control -- it's a third party subsystem that you'd rather not modify. Nevertheless, I have found that in such situations modifying the third party subsystem to call wrappers (rather than the standard library functions directly) is way more productive in the long run -- then of course you submit the change to the maintainers of the third party subsystem in question, they roll your change into their subsystem's next release, and life gets better for everybody (you included, since once your changes are accepted they'll get routinely maintained and tested by others!-).
(As a side note, such wrappers may also be worth submitting as diffs to the standard library, but that is a different case since the standard library evolves very very slowly and cautiously, and in particular on the Python 2 line will never evolve any longer, since 2.7 is the last of that line and it's feature-frozen).
Of course, all of this presupposes an open-source culture. If for some mysterious reasons you're using a closed-source third party subsystem, therefore one which you cannot possibly maintain, then you are in another situation where monkey patching may be the lesser evil (but that's just because the evil of losing strategic control of your development by trusting in code you can't possibly maintain is such a bigger evil in itself;-). I've never found myself in this situation with a third-party package that was both closed-source and itself written in Python (if the latter condition doesn't hold your monkeypatches would do you no good;-).
Note that here the working definition of "closed-source" is really very strict: for example, even Microsoft 12+ years ago distributed sources of libraries such as MFC with Visual C++ (as their product was then called) -- closed-source because you couldn't redistribute their sources, but still, you DID have sources at hand, so when you met some terrible limitation or bug you COULD fix it (and submit the change to them for a future release, as well as publishing your change as a diff as long as it included absolutely none of their copyrighted code -- not trivial, but feasible).
Monkeypatching well beyond the strict confines within which such an approach is "the least of evil" is a frequent mistake of users of dynamic languages -- be careful not to fall into that trap yourself!
A friend was "burned" when starting to learn Python, and now sees the language as perhaps fatally flawed.
He was using a library and changed the value of an object's attribute (the class being in the library), but he used the wrong abbreviation for the attribute name. It took him "forever" to figure out what was wrong. His objection to Python is thus that it allows one to accidentally add attributes to an object.
Unit tests don't provide a solution to this. One doesn't write unit tests against an API being used. One may have a mock for the class, but the mock could have the same typo or incorrect assumption about the attribute name.
It's possible to use __setattr__() to guard against this, but (as far as I know), no one does.
The only thing I've been able to tell my friend is that after several years of writing Python code full-time, I don't recall ever being burned by this. What else can I tell him?
"changed the value of an object's attribute" Can lead to problems. This is pretty well known. You know it, now, also. That doesn't indict the language. It simply says that you've learned an important lesson in dynamic language programming.
Unit testing absolutely discovers this. You are not forced to mock all library classes. Some folks say it's only a unit test when it's tested in complete isolation. This is silly. You have to trust the library modules -- it's a feature of your architecture. Rather than mock them, just use them. (It is important to write mocks for your own newly-developed libraries. It's also important to mock libraries that make expensive API calls.)
In most cases, you can (and should) test your classes with the real library modules. This will find the misspelled attribute name.
Also, now that you know that attributes are dynamic, it's really easy to verify that the attribute exists. How?
Use interactive Python to explore the classes before writing too much code.
Remember, Python is not Java and it's not C. You can execute Python interactively and determine immediately if you've spelled something wrong. Writing a lot of code without doing any interactive confirmation is -- simply -- the wrong way to use Python.
A little interactive exploration will find misspelled attribute names.
Finally -- for your own classes -- you can wrap updatable attributes as properties. This makes it easier to debug any misspelled attribute names. Again, you know to check for this. You can use interactive development to confirm the attribute names.
Fussing around with __setattr__ creates problems. In some cases, we actually need to add attributes to an object. Why? It's simpler than creating a whole subclass for one special case where we have to maintain more state information.
Other things you can say:
I was burned by a C program that absolutely could not be made to work because of ______. [Insert any known C-language problem you want here. No array bounds checking, for example] Does that make C fatally flawed?
I was burned by a DBA who changed a column name and all the SQL broke. It's painful to unit test all of it. Does that make the relational database fatally flawed?
I was burned by a sys admin who changed a directory's permissions and my application broke. It was nearly impossible to find. Does that make the OS fatally flawed?
I was burned by a COBOL program where someone changed the copybook, forgot to recompile the program, and we couldn't debug it because the source looked perfect. COBOL, however, actually is fatally flawed, so this isn't a good example.
There are code analyzers like pylint that will warn you if you add a attribute outside of __init__. PyDev has nice support for it. Such errors are very easy to find with a debugger too.
If the possibility to make mistakes is enough for him to consider a language "fatally flawed", I don't think you can convince him otherwise. The more you can do with a language, the more you can do wrong with the language. It's a caveat of flexibility—but that's true for any language.
You can use the __slots__ class attribute to limit the attributes that instances have. Attempting to set an attribute that's not expliticly listed will raise an AttributeError. There are some complications that arise with subclassing. See the Python data model reference for details.
A tool like pylint or pychecker may be able to detect this.
He's effectively ruling out an entire class of programming languages -- dynamically-typed languages -- because of one hard lesson learned. He can use only statically-typed languages if he wishes and still have a very productive career as a programmer, but he is certainly going to have deep frustrations with them as well. Will he then conclude that they are fatally-flawed?
I think your friend has misplaced his frustration in the language. His real problem is lack of debugging techniques. teach him how to break down a program into small pieces to examine the output. like a manual unit test, this way any inconsistency is found and any assumptions are proven or discarded.
I had a similar bad experience with Python when I first started ... took me 3 months to get over it. Having a tool which warns would be nice back then ...