Proper way of having a unique identifier in Python? - python

Basically, I have a list like: [START, 'foo', 'bar', 'spam', eggs', END] and the START/END identifiers are necessary for later so I can compare later on. Right now, I have it set up like this:
START = object()
END = object()
This works fine, but it suffers from the problem of not working with pickling. I tried doing it the following way, but it seems like a terrible method of accomplishing this:
class START(object):pass
class END(object):pass
Could anybody share a better means of doing this? Also, the example I have set up above is just an oversimplification of a different problem.

If you want an object that's guaranteed to be unique and can also be guaranteed to get restored to exactly the same identify if pickled and unpickled right back, top-level functions, classes, class instances, and if you care about is rather than == also lists (and other mutables), are all fine. I.e., any of:
# work for == as well as is
class START(object): pass
def START(): pass
class Whatever(object): pass
START = Whatever()
# if you don't care for "accidental" == and only check with `is`
START = []
START = {}
START = set()
None of these is terrible, none has any special advantage (depending if you care about == or just is). Probably def wins by dint of generality, conciseness, and lighter weight.

You can define a Symbol class for handling START and END.
class Symbol:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return isinstance(other, Symbol) and other.value == self.value
def __repr__(self):
return "<sym: %r>" % self.value
def __str__(self):
return str(self.value)
START = Symbol("START")
END = Symbol("END")
# test pickle
import pickle
assert START == pickle.loads(pickle.dumps(START))
assert END == pickle.loads(pickle.dumps(END))

Actually, I like your solution.
A while back I was hacking on a Python module, and I wanted to have a special magical value that could not appear anywhere else. I spent some time thinking about it and the best I came up with is the same trick you used: declare a class, and use the class object as the special magical value.
When you are checking for the sentinel, you should of course use the is operator, for object identity:
for x in my_list:
if x is START:
# handle start of list
elif x is END:
# handle end of list
else:
# handle item from list

If your list didn't have strings, I'd just use "start", "end" as Python makes the comparison O(1) due to interning.
If you do need strings, but not tuples, the complete cheapskate method is:
[("START",), 'foo', 'bar', 'spam', eggs', ("END",)]
PS: I was sure your list was numbers before, not strings, but I can't see any revisions so I must have imagined it

I think maybe this would be easier to answer if you were more explicit about what you need this for, but my inclination if faced with a problem like this would be something like:
>>> START = os.urandom(16).encode('hex')
>>> END = os.urandom(16).encode('hex')
Pros of this approach, as I'm seeing it
Your markers are strings (can pickle or otherwise easily serialize, eg to JSON or a DB, without any special effort)
Very unlikely to collide either accidentally or on purpose
Will serialize and deserialize to identical values, even across process restarts, which (I think) would not be the case for object() or an empty class.
Cons(?)
Each time they are newly chosen they will be completely different. (This being good or bad depends on details you have not provided, I would think).

Related

Error "decoding str is not supported" Custom str Class Python

I was building a class that would work like a switch;
such custom class would have been a string, with an inner list, whereas the string identifies the current position of the switch ("on") and the list covers all the possible positions (["on","off","halfway on", "kinda dead but still on", ...];
While initializing an instance of this class, I would accept an indefinite number of possible positions;
Hence the code
class switch(str):
pos=[]
def __init__(self,*positions):
self.pos=[str(el) for el in positions]
self=self.pos[0]
def swap(self):
try:
self=self.pos[self.pos.index(self)+1]
except IndexError:
self=self.pos[0]
and the call
mood=switch("upbeat","depressed","horny")
That quite doesn't work. The error goes like this:
TypeError: decoding str is not supported
Of course there are better ways to make this Switch class (this was just a draft of a quick insertion) and eventually, I already did; still I didn't understand the origins of that error - whose search through Google wasn't useful - and angered enough about it that I finally decided to sign up here on Stack and post about it.
What do you guys think about it?
Googling it up, it seems like the error would turn off when trying to concatenate strings like "str1","str2" and not like "str1"+"str2"; then I think it's safe to keep suspicions around the *positions thing, with it not unpacking the strings as separate variables, but trying to guess I wanted them to be joined as one but failed at it as I wasn't using the + concatenator.
That's what came to my mind.
Otherwise, if I'm wrong, I've seen the error pop up only while playing with formats and stuff. Which is pretty way off in this case.
You can subclass str. But if you are passing more than a single argument, you need to intercept that at __new__() so python doesn't try to interpret the other arguments when creating the object.
Note this is just to demonstrate — it won't work for your problem
class switch(str):
def __new__(cls, *content):
return str.__new__(cls, content[0])
def __init__(self, *positions):
self.pos = [str(el) for el in positions]
mood=switch("upbeat", "depressed", "horny")
# prints as expected and has string methods
print(mood, mood.upper())
# upbeat UPBEAT
# has your instance attribute
print(mood.pos)
['upbeat', 'depressed', 'horny']
The problem is that strings are not mutable. So this is doomed from the beginning if the idea is to change the value of the string in-place. You can instead use collections.UserString for this. This acts like a string but gives you a data property to store the actual value. With this, your idea might work:
from collections import UserString
class Switch(UserString):
def __init__(self, *positions):
self.pos = [str(el) for el in positions]
# store the actual string in .data
self.data = self.pos[0]
def swap(self):
try:
self.data = self.pos[self.pos.index(self)+1]
except IndexError:
self.data = self.pos[0]
mood=Switch("upbeat", "depressed", "horny")
# still acts like a string
print(mood, mood.upper())
# upbeat UPBEAT
# but now you can swap
mood.swap()
print(mood, mood.upper())
# depressed DEPRESSED
I think Python thinks you're using the second constructor of str() (https://docs.python.org/3/library/stdtypes.html#str).
e.g. switch(b"upbeat", "utf-8") would work.

Why does `for x in list[None:None]:` work?

I have a script that attempts to read the begin and end point for a subset via a binary search, these values are then used to create a slice for further processing.
I noticed that when these variables did not get set (the search returned None) the code would still run and in the end I noticed that a slice spanning from None to None works as if examining the entire list (see example below).
#! /usr/bin/env python
list = [1,2,3,4,5,6,7,8,9,10]
for x in list[None:None]:
print x
Does anyone know why the choice was made to see the list[None:None] simply as list[:], at least that's what I think that happens (correct me if I'm wrong). I personally would think that throwing a TypeError would be desirable in such a case.
Because None is the default for slice positions. You can use either None or omit the value altogether, at which point None is passed in for you.
None is the default because you can use a negative stride, at which point what the default start and end positions are changed. Compare list[0:len(list):-1] to list[None:None:-1], for example.
Python uses None for 'value not specified' throughout the standard library; this is no exception.
Note that if your class implements the object.__getitem__ hook, you'll get passed a slice() object with the start, end and stride attributes set to None as well:
>>> class Foo(object):
... def __getitem__(self, key):
... print key
...
>>> Foo()[:]
slice(None, None, None)
Since Foo() doesn't even implement a __len__ having the defaults use None is entirely logical here.
I also think that list[None:None] is interpreted as list[:]. This is handy behavior because you can do something like this:
return list[some_params.get('start'):some_params.get('end')]
If the list slicing wouldn't work with None, you would have to check if start and end were None yourself:
if some_params.get('start') and some_params.get('end'):
return list[some_params.get('start'):some_params.get('end')]
elif some_params.get('start'):
return list[some_params.get('start'):]
elif end:
return list[:some_params.get('end')]
else:
return list[:]
Fortunately this is not the case in Python :).
None is the usual representation for "parameter not given", so you can communicate the fact to a function. You will often see functions or methods declared like this
def f(p=None):
if f is None:
f = some_default_value()
I guess this makes the choice clear: Be using None you can tell the slicer to use its default values.

Multiple Value Return Pattern in Python (not tuple, list, dict, or object solutions)

There were several discussions on "returning multiple values in Python", e.g.
1,
2.
This is not the "multiple-value-return" pattern I'm trying to find here.
No matter what you use (tuple, list, dict, an object), it is still a single return value and you need to parse that return value (structure) somehow.
The real benefit of multiple return value is in the upgrade process. For example,
originally, you have
def func():
return 1
print func() + func()
Then you decided that func() can return some extra information but you don't want to break previous code (or modify them one by one). It looks like
def func():
return 1, "extra info"
value, extra = func()
print value # 1 (expected)
print extra # extra info (expected)
print func() + func() # (1, 'extra info', 1, 'extra info') (not expected, we want the previous behaviour, i.e. 2)
The previous codes (func() + func()) are broken. You have to fix it.
I don't know whether I made the question clear... You can see the CLISP example. Is there an equivalent way to implement this pattern in Python?
EDIT: I put the above clisp snippets online for your quick reference.
Let me put two use cases here for multiple return value pattern. Probably someone can have alternative solutions to the two cases:
Better support smooth upgrade. This is shown in the above example.
Have simpler client side codes. See following alternative solutions I have so far. Using exception can make the upgrade process smooth but it costs more codes.
Current alternatives: (they are not "multi-value-return" constructions, but they can be engineering solutions that satisfy some of the points listed above)
tuple, list, dict, an object. As is said, you need certain parsing from the client side. e.g. if ret.success == True: blabla. You need to ret = func() before that. It's much cleaner to write if func() == True: blabal.
Use Exception. As is discussed in this thread, when the "False" case is rare, it's a nice solution. Even in this case, the client side code is still too heavy.
Use an arg, e.g. def func(main_arg, detail=[]). The detail can be list or dict or even an object depending on your design. The func() returns only original simple value. Details go to the detail argument. Problem is that the client need to create a variable before invocation in order to hold the details.
Use a "verbose" indicator, e.g. def func(main_arg, verbose=False). When verbose == False (default; and the way client is using func()), return original simple value. When verbose == True, return an object which contains simple value and the details.
Use a "version" indicator. Same as "verbose" but we extend the idea there. In this way, you can upgrade the returned object for multiple times.
Use global detail_msg. This is like the old C-style error_msg. In this way, functions can always return simple values. The client side can refer to detail_msg when necessary. One can put detail_msg in global scope, class scope, or object scope depending on the use cases.
Use generator. yield simple_return and then yield detailed_return. This solution is nice in the callee's side. However, the caller has to do something like func().next() and func().next().next(). You can wrap it with an object and override the __call__ to simplify it a bit, e.g. func()(), but it looks unnatural from the caller's side.
Use a wrapper class for the return value. Override the class's methods to mimic the behaviour of original simple return value. Put detailed data in the class. We have adopted this alternative in our project in dealing with bool return type. see the relevant commit: https://github.com/fqj1994/snsapi/commit/589f0097912782ca670568fe027830f21ed1f6fc (I don't have enough reputation to put more links in the post... -_-//)
Here are some solutions:
Based on #yupbank 's answer, I formalized it into a decorator, see github.com/hupili/multiret
The 8th alternative above says we can wrap a class. This is the current engineering solution we adopted. In order to wrap more complex return values, we may use meta class to generate the required wrapper class on demand. Have not tried, but this sounds like a robust solution.
try inspect?
i did some try, and not very elegant, but at least is doable.. and works :)
import inspect
from functools import wraps
import re
def f1(*args):
return 2
def f2(*args):
return 3, 3
PATTERN = dict()
PATTERN[re.compile('(\w+) f()')] = f1
PATTERN[re.compile('(\w+), (\w+) = f()')] = f2
def execute_method_for(call_str):
for regex, f in PATTERN.iteritems():
if regex.findall(call_str):
return f()
def multi(f1, f2):
def liu(func):
#wraps(func)
def _(*args, **kwargs):
frame,filename,line_number,function_name,lines,index=\
inspect.getouterframes(inspect.currentframe())[1]
call_str = lines[0].strip()
return execute_method_for(call_str)
return _
return liu
#multi(f1, f2)
def f():
return 1
if __name__ == '__main__':
print f()
a, b = f()
print a, b
Your case does need code editing. However, if you need a hack, you can use function attributes to return extra values , without modifying return values.
def attr_store(varname, value):
def decorate(func):
setattr(func, varname, value)
return func
return decorate
#attr_store('extra',None)
def func(input_str):
func.extra = {'hello':input_str + " ,How r you?", 'num':2}
return 1
print(func("John")+func("Matt"))
print(func.extra)
Demo : http://codepad.org/0hJOVFcC
However, be aware that function attributes will behave like static variables, and you will need to assign values to them with care, appends and other modifiers will act on previous saved values.
the magic is you should use design pattern blablabla to not use actual operation when you process the result, but use a parameter as the operation method, for your case, you can use the following code:
def x():
#return 1
return 1, 'x'*1
def f(op, f1, f2):
print eval(str(f1) + op + str(f2))
f('+', x(), x())
if you want generic solution for more complicated situation, you can extend the f function, and specify the process operation via the op parameter

Python: Efficient way to put multiple variables through a function

I have a bunch of variables that are equal to values pulled from a database. Sometimes, the database doesn't have a value and returns "NoneType". I'm taking these variables and using them to build an XML file. When the variable is NoneType, it causes the XML value to read "None" rather than blank as I'd prefer.
My question is: Is there an efficient way to go through all the variables at once and search for a NoneType and, if found, turn it to a blank string?
ex.
from types import *
[Connection to database omitted]
color = database.color
size = database.size
shape = database.shape
name = database.name
... etc
I could obviously do something like this:
if type(color) is NoneType:
color = ""
but that would become tedious for the 15+ variables I have. Is there a more efficient way to go through and check each variable for it's type and then correct it, if necessary? Something like creating a function to do the check/correction and having an automated way of passing each variable through that function?
All the solutions given here will make your code shorter and less tedious, but if you really have a lot of variables I think you will appreciate this, since it won't make you add even a single extra character of code for each variable:
class NoneWrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped
def __getattr__(self, name):
value = getattr(self.wrapped, name)
if value is None:
return ''
else:
return value
mydb = NoneWrapper(database)
color = mydb.color
size = mydb.size
shape = mydb.shape
name = mydb.name
# All of these will be set to an empty string if their
# original value in the database is none
Edit
I thought it was obvious, but I keep forgetting it takes time until all the fun Python magickery becomes a second nature. :) So how NoneWrapper does its magic? It's very simple, really. Each python class can define some "special" methods names that are easy to identify, because they are always surrounded by two underscores from each side. The most common and well-known of these methods is __init__(), which initializes each instance of the class, but there are many other useful special methods, and one of them is __getattr__(). This method is called whenever someone tries to access an attribute. of an instance of your class, and you can customize it to customize attribute access.
What NoneWrapper does is to override getattr, so whenever someone tries to read an attribute of mydb (which is a NoneWrapper instance), it reads the attribute with the specified name from the wrapped object (in this case, database) and return it - unless it's value is None, in which case it returns an empty string.
I should add here that both object variables and methods are attributes, and, in fact, for Python they are essentially the same thing: all attributes are variables that could be changed, and methods just happen to be variables that have their value set to a function of special type (bound method). So you can also use getattr() to control access to functions, which could lead to many interesting uses.
The way I would do it, although I don't know if it is the best, would be to put the variables you want to check and then use a for statement to iterate through the list.
check_vars = [color,size,shape,name]
for var in check_vars:
if type(var) is NoneType:
var = ""
To add variables all you have to do is add them to the list.
If you're already getting them one at a time, it's not that much longer to write:
def none_to_blank(value):
if value is None:
return ""
return value
color = none_to_blank(database.color)
size = none_to_blank(database.size)
shape = none_to_blank(database.shape)
name = none_to_blank(database.name)
Incidentally, use of "import *" is generally discouraged. Import only what you're using.
you can simply use:
color = database.color or ""
another way is to use a function:
def filter_None(var):
"" if (a is None) else a
color = filter_None(database.color)
I don't know how the database object is structured but another solution is to modify the database object like:
def myget(self, varname):
value = self.__dict__[varname]
return "" if (value is None) else value
DataBase.myget = myget
database = DataBase(...)
[...]
color = database.myget("color")
you can do better using descriptors or properties

pythonic way to rewrite an assignment in an if statement

Is there a pythonic preferred way to do this that I would do in C++:
for s in str:
if r = regex.match(s):
print r.groups()
I really like that syntax, imo it's a lot cleaner than having temporary variables everywhere. The only other way that's not overly complex is
for s in str:
r = regex.match(s)
if r:
print r.groups()
I guess I'm complaining about a pretty pedantic issue. I just miss the former syntax.
How about
for r in [regex.match(s) for s in str]:
if r:
print r.groups()
or a bit more functional
for r in filter(None, map(regex.match, str)):
print r.groups()
Perhaps it's a bit hacky, but using a function object's attributes to store the last result allows you to do something along these lines:
def fn(regex, s):
fn.match = regex.match(s) # save result
return fn.match
for s in strings:
if fn(regex, s):
print fn.match.groups()
Or more generically:
def cache(value):
cache.value = value
return value
for s in strings:
if cache(regex.match(s)):
print cache.value.groups()
Note that although the "value" saved can be a collection of a number of things, this approach is limited to holding only one such at a time, so more than one function may be required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. So, in accordance with the DRY principle, rather than writing each one, a factory function can help:
def Cache():
def cache(value):
cache.value = value
return value
return cache
cache1 = Cache()
for s in strings:
if cache1(regex.match(s)):
# use another at same time
cache2 = Cache()
if cache2(somethingelse) != cache1.value:
process(cache2.value)
print cache1.value.groups()
...
There's a recipe to make an assignment expression but it's very hacky. Your first option doesn't compile so your second option is the way to go.
## {{{ http://code.activestate.com/recipes/202234/ (r2)
import sys
def set(**kw):
assert len(kw)==1
a = sys._getframe(1)
a.f_locals.update(kw)
return kw.values()[0]
#
# sample
#
A=range(10)
while set(x=A.pop()):
print x
## end of http://code.activestate.com/recipes/202234/ }}}
As you can see, production code shouldn't touch this hack with a ten foot, double bagged stick.
This might be an overly simplistic answer, but would you consider this:
for s in str:
if regex.match(s):
print regex.match(s).groups()
There is no pythonic way to do something that is not pythonic. It's that way for a reason, because 1, allowing statements in the conditional part of an if statement would make the grammar pretty ugly, for instance, if you allowed assignment statements in if conditions, why not also allow if statements? how would you actually write that? C like languages don't have this problem, because they don't have assignment statements. They make do with just assignment expressions and expression statements.
the second reason is because of the way
if foo = bar:
pass
looks very similar to
if foo == bar:
pass
even if you are clever enough to type the correct one, and even if most of the members on your team are sharp enough to notice it, are you sure that the one you are looking at now is exactly what is supposed to be there? it's not unreasonable for a new dev to see this and just fix it (one way or the other) and now its definitely wrong.
Whenever I find that my loop logic is getting complex I do what I would with any other bit of logic: I extract it to a function. In Python it is a lot easier than some other languages to do this cleanly.
So extract the code that just generates the items of interest:
def matching(strings, regex):
for s in strings:
r = regex.match(s)
if r: yield r
and then when you want to use it, the loop itself is as simple as they get:
for r in matching(strings, regex):
print r.groups()
Yet another answer is to use the "Assign and test" recipe for allowing assigning and testing in a single statement published in O'Reilly Media's July 2002 1st edition of the Python Cookbook and also online at Activestate. It's object-oriented, the crux of which is this:
# from http://code.activestate.com/recipes/66061
class DataHolder:
def __init__(self, value=None):
self.value = value
def set(self, value):
self.value = value
return value
def get(self):
return self.value
This can optionally be modified slightly by adding the custom __call__() method shown below to provide an alternative way to retrieve instances' values -- which, while less explicit, seems like a completely logical thing for a 'DataHolder' object to do when called, I think.
def __call__(self):
return self.value
Allowing your example to be re-written:
r = DataHolder()
for s in strings:
if r.set(regex.match(s))
print r.get().groups()
# or
print r().groups()
As also noted in the original recipe, if you use it a lot, adding the class and/or an instance of it to the __builtin__ module to make it globally available is very tempting despite the potential downsides:
import __builtin__
__builtin__.DataHolder = DataHolder
__builtin__.data = DataHolder()
As I mentioned in my other answer to this question, it must be noted that this approach is limited to holding only one result/value at a time, so more than one instance is required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. That doesn't mean you should use it or the other answer, just that more effort will be required.

Categories