python: how to override str.join?

python: how to override str.join? - python

We have a subclass of str (call it MyStr), and I need to be able to control how str.join interacts with my subclass.
At minimum, a join of all MyStr's should produce another MyStr, and joining of MyStr and "plain" str should throw a TypeError.
Currently, this is what happens: (MyStr subclasses unicode)
>>> m = MyStr(':')
>>> m.join( [MyStr('A'), MyStr('B')] )
u'A:B'
>>> ':'.join( [MyStr('A'), 'B', u'C'] )
u'A:B:C'

Couldn't your class just override join:
class MyStr(unicode):
def join(self, strs):
# your code here
This will at least cover the case of MyStr(...).join(...)
After #bukzor's comment, I looked up how this works, and it looks like join is a C function that always returns a unicode object when called using a unicode seperator.
The code can be seen here. Take a look at the PyUnicode_Join function, especially this line:
res = _PyUnicode_New(res_alloc);
So, the result of PyUnicode_Join will always be an instance of PyUnicode.
The only error case I can see is if the input isn't unicode:
/* Convert item to Unicode. */
if (! PyUnicode_Check(item) && ! PyString_Check(item)) {
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string or Unicode,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
goto onError;
}
So I don't think it's possible to make this case fail (at least, not while your object extends from unicode):
':'.join( [MyStr('A'), 'B', u'C'] )

join() is a str method. If you want to end up with a MyStr object afterwards, you to use a MyStr ojbect to do the join.
If you want a TypeError you'll have to not inherit from str and provide all of str's methods yourself (at least the ones that you need). It's quite possible, though, that this will make them largely useless for normal string operations.

Related

Why does print() on tuple call repr for elements of tuple and not str? [duplicate]

I've noticed that when an instance with an overloaded __str__ method is passed to the print function as an argument, it prints as intended. However, when passing a container that contains one of those instances to print, it uses the __repr__ method instead. That is to say, print(x) displays the correct string representation of x, and print(x, y) works correctly, but print([x]) or print((x, y)) prints the __repr__ representation instead.
First off, why does this happen? Secondly, is there a way to correct that behavior of print in this circumstance?

The problem with the container using the objects' __str__ would be the total ambiguity -- what would it mean, say, if print L showed [1, 2]? L could be ['1, 2'] (a single item list whose string item contains a comma) or any of four 2-item lists (since each item can be a string or int). The ambiguity of type is common for print of course, but the total ambiguity for number of items (since each comma could be delimiting items or part of a string item) was the decisive consideration.

I'm not sure why exactly the __str__ method of a list returns the __repr__ of the objects contained within - so I looked it up: [Python-3000] PEP: str(container) should call str(item), not repr(item)
Arguments for it:
-- containers refuse to guess what the user wants to see on str(container) - surroundings, delimiters, and so on;
-- repr(item) usually displays type information - apostrophes around strings, class names, etc.
So it's more clear about what exactly is in the list (since the object's string representation could have commas, etc.). The behavior is not going away, per Guido "BDFL" van Rossum:
Let me just save everyone a lot of
time and say that I'm opposed to this
change, and that I believe that it
would cause way too much disturbance
to be accepted this close to beta.
Now, there are two ways to resolve this issue for your code.
The first is to subclass list and implement your own __str__ method.
class StrList(list):
def __str__(self):
string = "["
for index, item in enumerate(self):
string += str(item)
if index != len(self)-1:
string += ", "
return string + "]"
class myClass(object):
def __str__(self):
return "myClass"
def __repr__(self):
return object.__repr__(self)
And now to test it:
>>> objects = [myClass() for _ in xrange(10)]
>>> print objects
[<__main__.myClass object at 0x02880DB0>, #...
>>> objects = StrList(objects)
>>> print objects
[myClass, myClass, myClass #...
>>> import random
>>> sample = random.sample(objects, 4)
>>> print sample
[<__main__.myClass object at 0x02880F10>, ...
I personally think this is a terrible idea. Some functions - such as random.sample, as demonstrated - actually return list objects - even if you sub-classed lists. So if you take this route there may be a lot of result = strList(function(mylist)) calls, which could be inefficient. It's also a bad idea because then you'll probably have half of your code using regular list objects since you don't print them and the other half using strList objects, which can lead to your code getting messier and more confusing. Still, the option is there, and this is the only way to get the print function (or statement, for 2.x) to behave the way you want it to.
The other solution is just to write your own function strList() which returns the string the way you want it:
def strList(theList):
string = "["
for index, item in enumerate(theList):
string += str(item)
if index != len(theList)-1:
string += ", "
return string + "]"
>>> mylist = [myClass() for _ in xrange(10)]
>>> print strList(mylist)
[myClass, myClass, myClass #...
Both solutions require that you refactor existing code, unfortunately - but the behavior of str(container) is here to stay.

Because when you print the list, generally you're looking from the programmer's perspective, or debugging. If you meant to display the list, you'd process its items in a meaningful way, so repr is used.
If you want your objects to be printed while in containers, define repr
class MyObject:
def __str__(self): return ""
__repr__ = __str__
Of course, repr should return a string that could be used as code to recreate your object, but you can do what you want.

Is there a magic method that gets called when I write **object_identifier? [duplicate]

Without subclassing dict, what would a class need to be considered a mapping so that it can be passed to a method with **.
from abc import ABCMeta
class uobj:
__metaclass__ = ABCMeta
uobj.register(dict)
def f(**k): return k
o = uobj()
f(**o)
# outputs: f() argument after ** must be a mapping, not uobj
At least to the point where it throws errors of missing functionality of mapping, so I can begin implementing.
I reviewed emulating container types but simply defining magic methods has no effect, and using ABCMeta to override and register it as a dict validates assertions as subclass, but fails isinstance(o, dict). Ideally, I dont even want to use ABCMeta.

The __getitem__() and keys() methods will suffice:
>>> class D:
def keys(self):
return ['a', 'b']
def __getitem__(self, key):
return key.upper()
>>> def f(**kwds):
print kwds
>>> f(**D())
{'a': 'A', 'b': 'B'}

If you're trying to create a Mapping — not just satisfy the requirements for passing to a function — then you really should inherit from collections.abc.Mapping. As described in the documentation, you need to implement just:
__getitem__
__len__
__iter__
The Mixin will implement everything else for you: __contains__, keys, items, values, get, __eq__, and __ne__.

The answer can be found by digging through the source.
When attempting to use a non-mapping object with **, the following error is given:
TypeError: 'Foo' object is not a mapping
If we search CPython's source for that error, we can find the code that causes that error to be raised:
case TARGET(DICT_UPDATE): {
PyObject *update = POP();
PyObject *dict = PEEK(oparg);
if (PyDict_Update(dict, update) < 0) {
if (_PyErr_ExceptionMatches(tstate, PyExc_AttributeError)) {
_PyErr_Format(tstate, PyExc_TypeError,
"'%.200s' object is not a mapping",
Py_TYPE(update)->tp_name);
PyDict_Update is actually dict_merge, and the error is thrown when dict_merge returns a negative number. If we check the source for dict_merge, we can see what leads to -1 being returned:
/* We accept for the argument either a concrete dictionary object,
* or an abstract "mapping" object. For the former, we can do
* things quite efficiently. For the latter, we only require that
* PyMapping_Keys() and PyObject_GetItem() be supported.
*/
if (a == NULL || !PyDict_Check(a) || b == NULL) {
PyErr_BadInternalCall();
return -1;
The key part being:
For the latter, we only require that PyMapping_Keys() and PyObject_GetItem() be supported.

Use an expression twice in one line - as a condition AND for string formatting?

I find that in lots of different projects I'm writing a lot of code where I need to evaluate a (moderately complex, possibly costly-to-evaluate) expression and then do something with it (e.g. use it for string formatting), but only if the expression is True/non-None.
For example in lots of places I end up doing something like the following:
result += '%s '%( <complexExpressionForGettingX> ) if <complexExpressionForGettingX> else ''
... which I guess is basically a special-case of the more general problem of wanting to return some function of an expression, but only if that expression is True, i.e.:
f( e() ) if e() else somedefault
but without re-typing the expression (or re-evaluating it, in case it's a costly function call).
Obviously the required logic can be achieved easily enough in various long-winded ways (e.g. by splitting the expression into multiple statements and assigning the expression to a temporary variable), but that's a bit grungy and since this seems like quite a generic problem, and since python is pretty cool (especially for functional stuff) I wondered if there's a nice, elegant, concise way to do it?
My current best options are either defining a short-lived lambda to take care of it (better than multiple statements, but a bit hard to read):
(lambda e: '%s ' % e if e else '')( <complexExpressionForGettingX> )
or writing my own utility function like:
def conditional(expr, formatStringIfTrue, default='')
... but since I'm doing this in lots of different code-bases I'd much rather use a built-in library function or some clever python syntax if such a thing exists

I like one-liners, definitely. But sometimes they are the wrong solution.
In professional software development, if the team size is > 2, you spent more time on understanding code someone else wrote than on writing new code. The one-liners presented here are definitely confusing, so just do two lines (even though you mentioned multiple statements in your post):
X = <complexExpressionForGettingX>
result += '%s '% X if X else ''
This is clear, concise, and everybody immediately understands what's going on here.

Python doesn't have expression scope (Is there a Python equivalent of the Haskell 'let'), presumably because the abuses and confusion of the syntax outweigh the advantages.
If you absolutely have to use an expression scope, the least worst option is to abuse a generator comprehension:
result += next('%s '%(e) if e else '' for e in (<complexExpressionForGettingX>,))

You could define a conditional formatting function once, and use it repeatedly:
def cond_format(expr, form, alt):
if expr:
return form % expr
else:
return alt
Usage:
result += cond_format(<costly_expression>, '%s ', '')

After hearing the responses (thanks guys!) I'm now convinced there's no way to achieve what I want in Python without defining a new function (or lambda function) since that's the only way to introduce a new scope.
For best clarity I decided this needed to be implemented as a reusable function (not lambda) so for the benefit of others, I thought I'd share the function I finally came up with - which is flexible enough to cope with multiple additional format string arguments (in addition to the main argument used to decide whether it's to do the formatting at all); it also comes with pythondoc to show correctness and illustrate usage (if you're not sure how the **kwargs thing works just ignore it, it's just an implementation detail and was the only way I could see to implement an optional defaultValue= kwarg following the variable list of format string arguments).
def condFormat(formatIfTrue, expr, *otherFormatArgs, **kwargs):
""" Helper for creating returning the result of string.format() on a
specified expression if the expressions's bool(expr) is True
(i.e. it's not None, an empty list or an empty string or the number zero),
or return a default string (typically '') if not.
For more complicated cases where the operation on expr is more complicated
than a format string, or where a different condition is required, use:
(lambda e=myexpr: '' if not e else '%s ' % e)
formatIfTrue -- a format string suitable for use with string.format(), e.g.
"{}, {}" or "{1}, {0:d}".
expr -- the expression to evaluate. May be of any type.
defaultValue -- set this keyword arg to override
>>> 'x' + condFormat(', {}.', 'foobar')
'x, foobar.'
>>> 'x' + condFormat(', {}.', [])
'x'
>>> condFormat('{}; {}', 123, 456, defaultValue=None)
'123; 456'
>>> condFormat('{0:,d}; {2:d}; {1:d}', 12345, 678, 9, defaultValue=None)
'12,345; 9; 678'
>>> condFormat('{}; {}; {}', 0, 678, 9, defaultValue=None) == None
True
"""
defaultValue = kwargs.pop('defaultValue','')
assert not kwargs, 'unexpected kwargs: %s'%kwargs
if not bool(expr): return defaultValue
if otherFormatArgs:
return formatIfTrue.format( *((expr,)+otherFormatArgs) )
else:
return formatIfTrue.format(expr)

Presumably, you want to do this repeatedly to build up a string. With a more global view, you might find that filter (or itertools.ifilter) does what you want to the collection of values.
You'll wind up with something like this:
' '.join(map(str, filter(None, <iterable of <complexExpressionForGettingX>>)))
Using None as the first argument for filter indicates to accept any true value. As a concrete example with a simple expression:
>>> ' '.join(map(str, filter(None, range(-3, 3))))
'-3 -2 -1 1 2'
Depending on how you're calculating the values, it may be that an equivalent list or generator comprehension would be more readable.

Defining dynamic functions to a string

I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...
What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.

First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)
Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.
class O(object):
c = str.capitalize
r = str.replace
s = str.strip
def process_line(line, *ops):
i = iter(ops)
while True:
try:
op = i.next()
args = i.next()
except StopIteration:
break
line = op(line, *args)
return line
The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.
The process_line function is where all the interesting things happen. First, here is a description of the argument format:
The first argument is the string to be processed.
The remaining arguments must be given in pairs.
The first argument of the pair is a string method. Use the shortened method names here.
The second argument of the pair is a list representing the arguments to that particular string method.
The process_line function returns the string that emerges after all these operations have performed.
Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.
f = open("parrot_sketch.txt")
for line in f:
p = process_line(
line,
O.r, ["He's resting...", "This is an ex-parrot!"],
O.c, [],
O.s, []
)
print p
Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.

If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:
def all_lines(somefile, methods):
"""Apply a sequence of methods to all lines of some file and yield the results.
Args:
somefile: an open file or other iterable yielding lines
methods: a string that's a whitespace-separated sequence of method names.
(note that the methods must be callable without arguments beyond the
str to which they're being applied)
"""
tobecalled = [getattr(str, name) for name in methods.split()]
for line in somefile:
for tocall in tobecalled: line = tocall(line)
yield line

It is possible to map string operations to numbers:
>>> import string
>>> ops = {1:string.split, 2:string.replace}
>>> my = "a,b,c"
>>> ops[1](",", my)
[',']
>>> ops[1](my, ",")
['a', 'b', 'c']
>>> ops[2](my, ",", "-")
'a-b-c'
>>>
But maybe string descriptions of the operations will be more readable.
>>> ops2={"split":string.split, "replace":string.replace}
>>> ops2["split"](my, ",")
['a', 'b', 'c']
>>>
Note:
Instead of using the string module, you can use the str type for the same effect.
>>> ops={1:str.split, 2:str.replace}

To map names (or numbers) to different string operations, I'd do something like
OPERATIONS = dict(
strip = str.strip,
lower = str.lower,
removespaces = lambda s: s.replace(' ', ''),
maketitle = lamdba s: s.title().center(80, '-'),
# etc
)
def process(myfile, ops):
for line in myfile:
for op in ops:
line = OPERATIONS[op](line)
yield line
which you use like this
for line in process(afile, ['strip', 'removespaces']):
...

Help--Function Pointers in Python

My idea of program:
I have a dictionary:
options = { 'string' : select_fun(function pointer),
'float' : select_fun(function pointer),
'double' : select_fun(function pointer)
}
whatever type comes single function select_fun(function pointer) gets called.
Inside select_fun(function pointer),I will have diff functions for float, double and so on.
Depending on function pointers, specified function will get called.
I don't know whether my programming knowledge is good or bad, still I need help.

Could you be more specific on what you're trying to do? You don't have to do anything special to get function pointers in Python -- you can pass around functions like regular objects:
def plus_1(x):
return x + 1
def minus_1(x):
return x - 1
func_map = {'+' : plus_1, '-' : minus_1}
func_map['+'](3) # returns plus_1(3) ==> 4
func_map['-'](3) # returns minus_1(3) ==> 2

You can use the type() built-in function to detect the type of the function.
Say, if you want to check if a certain name hold a string data, you could do this:
if type(this_is_string) == type('some random string'):
# this_is_string is indeed a string
So in your case, you could do it like this:
options = { 'some string' : string_function,
(float)(123.456) : float_function,
(int)(123) : int_function
}
def call_option(arg):
# loop through the dictionary
for (k, v) in options.iteritems():
# if found matching type...
if type(k) == type(arg):
# call the matching function
func = option[k]
func(arg)
Then you can use it like this:
call_option('123') # string_function gets called
call_option(123.456) # float_function gets called
call_option(123) # int_function gets called
I don't have a python interpreter nearby and I don't program in Python much so there may be some errors, but you should get the idea.
EDIT: As per #Adam's suggestion, there are built-in type constants that you can check against directly, so a better approach would be:
from types import *
options = { types.StringType : string_function,
types.FloatType : float_function,
types.IntType : int_function,
types.LongType : long_function
}
def call_option(arg):
for (k, v) in options.iteritems():
# check if arg is of type k
if type(arg) == k:
# call the matching function
func = options[k]
func(arg)
And since the key itself is comparable to the value of the type() function, you can just do this:
def call_option(arg):
func = options[type(arg)]
func(arg)
Which is more elegant :-) save for some error-checking.
EDIT: And for ctypes support, after some fiddling around, I've found that ctypes.[type_name_here] is actually implented as classes. So this method still works, you just need to use the ctypes.c_xxx type classes.
options = { ctypes.c_long : c_long_processor,
ctypes.c_ulong : c_unsigned_long_processor,
types.StringType : python_string_procssor
}
call_option = lambda x: options[type(x)](x)

Looking at your example, it seems to me some C procedure, directly translated to Python.
For this reason, I think there could be some design issue, because usually, in Python, you do not care about type of an object, but only about the messages you can send to it.
Of course, there are plenty of exceptions to this approach, but still in this case I would try encapsulating in some polymorphism; eg.
class StringSomething(object):
data = None
def data_function(self):
string_function_pointer(self.data)
class FloatSomething(object):
data = None
def data_function(self):
float_function_pointer(self.data)
etc.
Again, all of this under the assumption you are translating from a procedural language to python; if it is not the case, then discard my answer :-)

Functions are the first-class objects in Python therefore you can pass them as arguments to other functions as you would with any other object such as string or an integer.
There is no single-precision floating point type in Python. Python's float corresponds to C's double.
def process(anobject):
if isinstance(anobject, basestring):
# anobject is a string
fun = process_string
elif isinstance(anobject, (float, int, long, complex)):
# anobject is a number
fun = process_number
else:
raise TypeError("expected string or number but received: '%s'" % (
type(anobject),))
return fun(anobject)
There is functools.singledispatch that allows to create a generic function:
from functools import singledispatch
from numbers import Number
#singledispatch
def process(anobject): # default implementation
raise TypeError("'%s' type is not supported" % type(anobject))
#process.register(str)
def _(anobject):
# handle strings here
return process_string(anobject)
process.register(Number)(process_number) # use existing function for numbers
On Python 2, similar functionality is available as pkgutil.simplegeneric().
Here's a couple of code example of using generic functions:
Remove whitespaces and newlines from JSON file
Make my_average(a, b) work with any a and b for which f_add and d_div are defined. As well as builtins

Maybe you want to call the same select_fun() every time, with a different argument. If that is what you mean, you need a different dictionary:
>>> options = {'string' : str, 'float' : float, 'double' : float }
>>> options
{'double': <type 'float'>, 'float': <type 'float'>, 'string': <type 'str'>}
>>> def call_option(val, func):
... return func(val)
...
>>> call_option('555',options['float'])
555.0
>>>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python: how to override str.join? - python

Related

Why does print() on tuple call repr for elements of tuple and not str? [duplicate]

Is there a magic method that gets called when I write **object_identifier? [duplicate]

Use an expression twice in one line - as a condition AND for string formatting?

Defining dynamic functions to a string

Help--Function Pointers in Python

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python: how to override str.join? - python

Related

Why does print() on tuple call __repr__ for elements of tuple and not __str__? [duplicate]

Is there a magic method that gets called when I write **object_identifier? [duplicate]

Use an expression twice in one line - as a condition AND for string formatting?

Defining dynamic functions to a string

Help--Function Pointers in Python

Categories

Resources

Why does print() on tuple call repr for elements of tuple and not str? [duplicate]