Obtaining an object's instance based on its string repr (Python) [duplicate]

Obtaining an object's instance based on its string repr (Python) [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python: Get object by id
Typically when you see the string representation of an instance it looks something like <module.space.Class # 0x108181bc>. I am curious if it's possible to take this string and get a handle on the instance.
Something like
obj_instance = get_instance_form_repr("<module.space.Class # 0x1aa031b>")
I don't believe it could be possible but if it is, it would be really useful.

You can do it for the subset that is tracked by the garbage collector at least, in a nasty and unreliable way.
def lookup_object_by_repr(myrep)
import gc
for obj in gc.get_objects():
if repr(obj) == myrep:
return obj
You can do even more things if you write a simple C extension and inspect the memory address.

Of course there's no way to get an object from its repr, because any class can return anything it wants in its repr. But in the specific case where the repr has a pointer in it, you're basically asking how to get an object from a pointer. Which (in C Python, at least) is the exact same thing as asking how to get an object from its id.
There's no built-in way to do this, even though it would be pretty simple. And that's intentional, because this is almost always a bad idea.
If you think you have a practical purpose for getting an object by id, you're probably wrong. Especially since you have to deal with object lifecycle issues that are usually taken care of automatically (and behind your back, which makes it hard to take care of them manually even if you try). For example, if an object goes away, a weakref to that object nulls out, but the id is still pointing—to deallocated memory, or another object created later, or half of one object and half of another. And then of course whatever you do in C Python isn't going to work in PyPy or jython or IronPython…
But if you're just tinkering around to learn how the C Python runtime works, then this is a legitimate question, and there's a legitimate answer. Probably the best way to do it is to create a C extension module. If you don't know how to create an extension module, you really ought to learn that first, and come back to this question later. If you do, the function you want to implement is pretty simple:
static PyObject *objectFromId(PyObject *self, PyObject *args) {
PyObject *obj;
if (!PyArg_ParseTuple(args, "n", &obj)) return NULL;
Py_INCREF(obj);
return obj;
}
You could do this in Pyrex/Cython instead of C, if you wanted. Or you could just call PyObj_FromPtr directly on the _ctypes module. But if you're trying to learn how things work at this level, it makes more sense to make what's happening explicit, and to explicitly put your "dealing with C pointers" code in C.
On further thought, it you wanted to build a sort of best-guess objectFromRepr for tinkering purposes, you could. Basically:
Use a regex that matches the common angle-bracket form; if it matches, call objectFromId with the address.
Call eval.
You might want to put an intermediate step in there: for many types, the repr is in the form of a constructor call like Class(arg1, arg2), and in that case it might be better to match that form with another regex and call the constructor directly, instead of using eval. If nothing else, it's probably more instructive, and that's the point of this exercise, right?
Obviously this is an even more terrible idea in real-life code than objectFromId. It's almost always bad to use eval, and eval(repr(x)) == x is not actually guaranteed to be true, and you're actually getting a new reference to the same object in the id case but a new object with the same value in the eval case, and…
PS, After you learn how all of this works in C Python, it's probably worth doing a similar exercise in another interpreter like jython or IronPython (especially when you repr a native Java/.NET/etc. object).

Related

Approach behind having everything as an object in Python

Why is everything in Python, an object? According to what I read, everything including functions is an object. It's not the same in other languages. So what prompted this shift of approach, to treat everything including, even functions, as objects.

The power of everything being an object is that you can define behavior for each object. For example a function being an object gives you an easy way to access the docs of the function for introspection.
print( function.__doc__ )
The alternative would be to provide a library of function that took
a function and returned its interesting properties.
import function_lib
print( function_lib.get_doc( function )
Making int, str etc classes means that you can extend those provide types
in interesting ways for your problem domain.

In my opinion, the 'Everything is object' is great in Python. In this language, you don't react to what are the objects you have to handle, but how they can interact. A function is just an object that you can __call__, a list is just an object that you can __iter__. But why should we divide data in non overlapping groups. An object can behave like a function when we call it, but also like an array when we access it.
This means that you don't think your "function" like, "i want an array of integers and i return the sum of it" but more "i will try to iterate over the thing that someone gave me and try to add them together, if something goes wrong, i will tell it to the caller by raiseing error and he will hate to modify his behavior".
The most interesting exemple is __add__. When you try something like Object1 + Object2, Python will ask (nicely ^^) to Object1 to try to add himself with object2 (Object1.__add__(Object2)). There is 2 scenarios here: either Oject1 knows how to add himself to Object2 and everything is fine, either he raises a NotImplemented error and Python will ask to Object2 to radd himself to Object1. Just with this mechanism, you can teach to your object to add themselves with any other object, you can manage commutativity,...

why is everything in Python, an object?
Python (unlike other languages) is a truly Object Orient language (aka OOP)
when everything is an object, it becomes easier to search, manipulate or access things. (But everything comes at the cost of speed)
what prompted this shift of approach, to treat everything including, even functions, as objects?
"Necessity is the mother of invention"

Python pseudo-immutable object field

I currently need to partially create a Python object and be able to update it for some time. Although, I must not be able to update it once I used the object as a dictionary key.
Of course there is the solution of marking the fields as private, which is mostly a warning for the programmer, and I will actually go for that solution.
But I stumbled on another solution and I want to know if this could be a good idea, or if it could simply go horribly wrong. Here it is:
class Foo():
def __init__(self, bar):
self._bar = bar
self._has_been_hashed = False
def __hash__(self):
self._has_been_hashed = True
return self._bar.__hash__()
def __eq__(self, other):
return self._bar == other._bar
def __copy__(self):
return Foo(self._bar)
def set_bar(self, bar):
if self.has_been_hashed:
raise FooIsNowImmutable
else:
self._bar = bar
Some testing proved it to work as desired, I can no longer use set_bar once I, say, used my object as a dictionary key.
What do you think? Is it a good idea? Will it turn against me? Is there an easier way? And is it somehow a bad practice?

Doing it that way is a bit fragile, since you never know when something might be used as a dictionary key, or when its hash might be called for some other reason. An object isn't supposed to "know" whether it's being used as a dictionary key. It will be confusing to have code that may raise an exception just because some other code somewhere else put the object in a dictionary.
Following the Python philosophy of "explicit is better than implicit", it would be safer to just give your object a method called .finalize() or .lock() or something, which would set a flag indicating the object is immutable. You could also reverse the exception-raising logic, so that __hash__ raises an exception if the object is not yet locked (rather than mutation raising an exception if the object has been hashed).
You would then call .lock() when you're ready to make the object immutable. It makes more sense to explicitly set it immutable when you're done with whatever mutating you need to do, rather than implicitly assuming that as soon as you use it in a dictionary, you're done mutating it.

You can do that, but I'm not sure I'd recommend it. Why do you need it in a dictionary?
It requires a lot more awareness of the state of the object... think a file object. Would you put one in a dictionary? It has to be opened for a lot of the functions to work, and once it's closed, you can't do them anymore. The user has to be aware in the surrounding code which state the object is in.
For files, that makes sense - after all, you don't normally hold files open across large parts of your program, or if you do, they have very defined init and close codes; something similar has to make sense for your object. Especially if you have some APIs that take the object, but expect an immutable version, and others that take the same object, but expect to change it...
I have used the lock method before, and it works well for complex, read-only objects that you want to initialize once and then make sure no one is messing with. E.G. you load a copy of a (say, English) dictionary from disk... it has to be mutable while you are populating it, but you don't want anyone to accidentally modify it, so locking it is a great idea. I would only use it if it was a one-time lock though - something you are locking and unlocking seems like a recipe for disaster.
There are two solutions IMHO if you just want to create a version you can use in hashable places. First is to explicitly create an immutable copy when you put it in a dictionary - tuple and frozenset are examples of this sort of behaviour... if you want to put a list in a dict, you can't, but you can create a tuple from it first, and that can be hashed. Create a frozen version of your object, then it's very clear by looking at the object type whether it's expected to be mutable or immutable, and so cases where it was used incorrectly are easily seen.
Second, if you really want it to be hashable, but need it to be mutable... that's actually legal, but implemented a little different. It goes back to the idea of hashing... hashing is used both for optimized lookups, and equality.
The first is to ensure you can get objects back... you put something in a dictionary, and it hashes to a value of 4 - goes in slot 4. Then you modify it. Then you go to look it up again, and now it hashes to 9 - there's nothing in slot 9, or worse, a different object, and you're broken.
Second is equality - for things like sets, I need to know if my object is already in there. I can hash, but if you know anything about hashing, you still need to check equality to check for hash collisions.
That doesn't preclude supporting __hash__ and being mutable, but it's unusual. You need to decide for your item what makes it the same, even though it's mutable. What you need to do then is give each object a unique id. Technically, you may be able to get away with id(self), but something like the uuid module is probably a better possibility. The UUID4 (or technically, the hash of the UUID4) is what determines both the hash and equality; two objects that contain the same UUID4 should be the exact same object; two objects that have the exact same data but a different UUID4 would be different object.

Extending weakref proxy/Copying behaviour

I have a class holding a table (list of lists). This class should return a rowpointer similar to sql. For this row pointer I would like to week ref the table row (a list) with a weakref.proxy. However, I would like to add additional capabilities to a row pointer, e.g. overwrite the __getitem__ method to allow access via, say the column names.
Is there an easy way to get the same behaviour (translating access to my object to the object beeing referenced), or do I have to reimplement all the special methods?
As an easy way I could think of inheritance (but since I found no doc on weakref.ProxyType I wont even try to inherit from that, (how to init?). The other option could be to define some special method even to always redirect "special" (__xxx__) function calls to the referred object, even though this makes that seem impossible.

Iresearched some more and found out this:
http://code.activestate.com/recipes/496741-object-proxying/
http://pypi.python.org/pypi/ProxyTypes
So in short, one can forward all calls (I think the recipi on active state is better), but I have not found a way to implement:
$a = proxy([1,2,3])
$b = a
$print type(b)
>>list
I will settle for just working with an object wich pretty much behaves like the list.

Is it common/good practice to test for type values in Python?

Is it common in Python to keep testing for type values when working in a OOP fashion?
class Foo():
def __init__(self,barObject):
self.bar = setBarObject(barObject)
def setBarObject(barObject);
if (isInstance(barObject,Bar):
self.bar = barObject
else:
# throw exception, log, etc.
class Bar():
pass
Or I can use a more loose approach, like:
class Foo():
def __init__(self,barObject):
self.bar = barObject
class Bar():
pass

Nope, in fact it's overwhelmingly common not to test for type values, as in your second approach. The idea is that a client of your code (i.e. some other programmer who uses your class) should be able to pass any kind of object that has all the appropriate methods or properties. If it doesn't happen to be an instance of some particular class, that's fine; your code never needs to know the difference. This is called duck typing, because of the adage "If it quacks like a duck and flies like a duck, it might as well be a duck" (well, that's not the actual adage but I got the gist of it I think)
One place you'll see this a lot is in the standard library, with any functions that handle file input or output. Instead of requiring an actual file object, they'll take anything that implements the read() or readline() method (depending on the function), or write() for writing. In fact you'll often see this in the documentation, e.g. with tokenize.generate_tokens, which I just happened to be looking at earlier today:
The generate_tokens() generator requires one argument, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section File Objects). Each call to the function should return one line of input as a string.
This allows you to use a StringIO object (like an in-memory file), or something wackier like a dialog box, in place of a real file.
In your own code, just access whatever properties of an object you need, and if it's the wrong kind of object, one of the properties you need won't be there and it'll throw an exception.

I think that it's good practice to check input for type. It's reasonable to assume that if you asked a user to give one data type they might give you another, so you should code to defend against this.
However, it seems like a waste of time (both writing and running the program) to check the type of input that the program generates independent of input. As in a strongly-typed language, checking type isn't important to defend against programmer error.
So basically, check input but nothing else so that code can run smoothly and users don't have to wonder why they got an exception rather than a result.

If your alternative to the type check is an else containing exception handling, then you should really consider duck typing one tier up, supporting as many objects with the methods you require from the input, and working inside a try.
You can then except (and except as specifically as possible) that.
The final result wouldn't be unlike what you have there, but a lot more versatile and Pythonic.
Everything else that needed to be said about the actual question, whether it's common/good practice or not, I think has been answered excellently by David's.

I agree with some of the above answers, in that I generally never check for type from one function to another.
However, as someone else mentioned, anything accepted from a user should be checked, and for things like this I use regular expressions. The nice thing about using regular expressions to validate user input is that not only can you verify that the data is in the correct format, but you can parse the input into a more convenient form, like a string into a dictionary.

What is your strategy to avoid dynamic typing errors in Python (NoneType has no attribute x)?

I'm not sure if I like Python's dynamic-ness. It often results in me forgetting to check a type, trying to call an attribute and getting the NoneType (or any other) has no attribute x error. A lot of them are pretty harmless but if not handled correctly they can bring down your entire app/process/etc.
Over time I got better predicting where these could pop up and adding explicit type checking, but because I'm only human I miss one occasionally and then some end-user finds it.
So I'm interested in your strategy to avoid these. Do you use type-checking decorators? Maybe special object wrappers?
Please share...

forgetting to check a type
This doesn't make much sense. You so rarely need to "check" a type. You simply run unit tests and if you've provided the wrong type object, things fail. You never need to "check" much, in my experience.
trying to call an attribute and
getting the NoneType (or any other)
has no attribute x error.
Unexpected None is a plain-old bug. 80% of the time, I omitted the return. Unit tests always reveal these.
Of those that remain, 80% of the time, they're plain old bugs due to an "early exit" which returns None because someone wrote an incomplete return statement. These if foo: return structures are easy to detect with unit tests. In some cases, they should have been if foo: return somethingMeaningful, and in still other cases, they should have been if foo: raise Exception("Foo").
The rest are dumb mistakes misreading the API's. Generally, mutator functions don't return anything. Sometimes I forget. Unit tests find these quickly, since basically, nothing works right.
That covers the "unexpected None" cases pretty solidly. Easy to unit test for. Most of the mistakes involve fairly trivial-to-write tests for some pretty obvious species of mistakes: wrong return; failure to raise an exception.
Other "has no attribute X" errors are really wild mistakes where a totally wrong type was used. That's either really wrong assignment statements or really wrong function (or method) calls. They always fail elaborately during unit testing, requiring very little effort to fix.
A lot of them are pretty harmless but if not handled correctly they can bring down your entire app/process/etc.
Um... Harmless? If it's a bug, I pray that it brings down my entire app as quickly as possible so I can find it. A bug that doesn't crash my app is the most horrible situation imaginable. "Harmless" isn't a word I'd use for a bug that fails to crash my app.

If you write good unit tests for all of your code, you should find the errors very quickly when testing code.

You can also use decorators to enforce the type of attributes.
>>> #accepts(int, int, int)
... #returns(float)
... def average(x, y, z):
... return (x + y + z) / 2
...
>>> average(5.5, 10, 15.0)
TypeWarning: 'average' method accepts (int, int, int), but was given
(float, int, float)
15.25
>>> average(5, 10, 15)
TypeWarning: 'average' method returns (float), but result is (int)
15
I'm not really a fan of them, but I can see their usefulness.

One tool to try to help you keep your pieces fitting together well is interfaces. zope.interface is the most notable package in the Python world for using interfaces. Check out http://wiki.zope.org/zope3/WhatAreInterfaces and http://glyph.twistedmatrix.com/2009/02/explaining-why-interfaces-are-great.html to start to get an idea how interfaces and z.i in particular work. Interfaces can prove very useful in a large Python codebases.
Interfaces are no substitute for testing. Reasonably comprehensive testing is especially important in highly dynamic languages like Python where there are types of bugs that could not exist in a statically types language. Tests will also help you catch the sorts of bugs that are not unique to dynamic languages. Fortunately, developing in Python means that testing is easy (due to the flexibility) and you have plenty of time to write them that you saved because you're using Python.

One advantage of TDD is that you end up writing code that is easier to write tests for.
Writing code first and then the tests can result in code that superficially works the same, but is much harder to write 100% coverage tests for.
Each case is likely to be different
It might make sense to have a decorator to check whether a particular parameter is None (or some other unexpected value) if you use it in a bunch of places.
Maybe it is appropriate to use the Null pattern - if the code is blowing up because you are setting the initial value to None, you could instead set the initial value to a null version of the object.
More and more wrappers can add up to quite a performance hit though, so it's always better to write code from the start that avoids the corner cases

forgetting to check a type
With duck typing, it shouldn't be necessary to check a type. But that's theory, in reality you will often want to validate input parameters (e.g. checking a UUID with a regex). For that purpose, I created myself some handy decorators for simple type and return type checking which are called like this:
#decorators.params(0, int, 2, str) # first parameter must be integer / third a string
#decorators.returnsOrNone(int, long) # must return an int/long value or None
def doSomething(integerParam, noMatterWhatParam, stringParam):
...
For everything else I mostly use assertions. Of course one often forgets to check a parameter, so it's necessary to test and to test often.
trying to call an attribute
Happens to me very seldom. Actually I often use methods instead of direct access to attributes (the "good" old getter/setter approach sometimes).
because I'm only human I miss one occasionally and then some end-user finds it
"Software is always completed at the customers'." - An anti-pattern which you should solve with unit tests that handle all possible cases in a function. Easier said than done, but it helps...
As for other common Python mistakes (mistyped names, wrong imports, ...), I'm using Eclipse with PyDev for projects (not for small scripts). PyDev warns you about most of the simple kinds of mistakes.

I haven’t done a lot of Python programming, but I’ve done no programming at all in staticly typed languages, so I don’t tend to think about things in terms of variable types. That might explain why I haven’t come across this problem much. (Although the small amount of Python programming I’ve done might explain that too.)
I do enjoy Python 3’s revised handling of strings (i.e. all strings are unicode, everything else is just a stream of bytes), because in Python 2 you might not notice TypeErrors until dealing with unusual real world string values.

You can hint your IDE via function doc, for example: http://www.pydev.org/manual_adv_type_hints.html, in JavaScript the jsDoc helps in a similar way.
But at some point you will face errors that a typed language would avoid immediately without unit tests (via the IDE compilation and the types/inference).
Of course this does not remove the benefit of unit tests, static analysis and assertions. For larger project I tend to use statically typed languages because they have very good IDE support (excellent autocompletion, heavy refactoring...). You can still use scripting or a DSL for some sub part of the project.

Something you can use to simplify your code is using the Null Object Design Pattern (to which I was introduced in Python Cookbook).
Roughly, the goal with Null objects is to provide an 'intelligent'
replacement for the often used primitive data type None in Python or
Null (or Null pointers) in other languages. These are used for many
purposes including the important case where one member of some group
of otherwise similar elements is special for whatever reason. Most
often this results in conditional statements to distinguish between
ordinary elements and the primitive Null value.
This object just eats the lack of attribute error, and you can avoid checking for their existence.
It's nothing more than
class Null(object):
def __init__(self, *args, **kwargs):
"Ignore parameters."
return None
def __call__(self, *args, **kwargs):
"Ignore method calls."
return self
def __getattr__(self, mname):
"Ignore attribute requests."
return self
def __setattr__(self, name, value):
"Ignore attribute setting."
return self
def __delattr__(self, name):
"Ignore deleting attributes."
return self
def __repr__(self):
"Return a string representation."
return "<Null>"
def __str__(self):
"Convert to a string and return it."
return "Null"
With this, if you do Null("any", "params", "you", "want").attribute_that_doesnt_exists() it won't explode, but just silently become the equivalent of pass.
Normally you'd do something like
if obj.attr:
obj.attr()
With this, you just do:
obj.attr()
and forget about it. Beware that extensive use of the Null object can potentially hide bugs in your code.

I tend to use
if x is None:
raise ValueError('x cannot be None')
But this will only work with the actual None value.
A more general approach is to test for the necessary attributes before you try to use them. For example:
def write_data(f):
# Here we expect f is a file-like object. But what if it's not?
if not hasattr(f, 'write'):
raise ValueError('write_data requires a file-like object')
# Now we can do stuff with f that assumes it is a file-like object
The point of this code is that instead of getting an error message like "NoneType has no attribute write", you get "write_data requires a file-like object". The actual bug isn't in write_data(), and isn't really a problem with NoneType at all. The actual bug is in the code that calls write_data(). The key is to communicate that information as directly as possible.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.