Why does isinstance require a tuple instead of any iterable? [duplicate]

Why does isinstance require a tuple instead of any iterable? [duplicate] - python

I've been playing for a bit with startswith() and I've discovered something interesting:
>>> tup = ('1', '2', '3')
>>> lis = ['1', '2', '3', '4']
>>> '1'.startswith(tup)
True
>>> '1'.startswith(lis)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str or a tuple of str, not list
Now, the error is obvious and casting the list into a tuple will work just fine as it did in the first place:
>>> '1'.startswith(tuple(lis))
True
Now, my question is: why the first argument must be str or a tuple of str prefixes, but not a list of str prefixes?
AFAIK, the Python code for startswith() might look like this:
def startswith(src, prefix):
return src[:len(prefix)] == prefix
But that just confuses me more, because even with it in mind, it still shouldn't make any difference whether is a list or tuple. What am I missing ?

There is technically no reason to accept other sequence types, no. The source code roughly does this:
if isinstance(prefix, tuple):
for substring in prefix:
if not isinstance(substring, str):
raise TypeError(...)
return tailmatch(...)
elif not isinstance(prefix, str):
raise TypeError(...)
return tailmatch(...)
(where tailmatch(...) does the actual matching work).
So yes, any iterable would do for that for loop. But, all the other string test APIs (as well as isinstance() and issubclass()) that take multiple values also only accept tuples, and this tells you as a user of the API that it is safe to assume that the value won't be mutated. You can't mutate a tuple but the method could in theory mutate the list.
Also note that you usually test for a fixed number of prefixes or suffixes or classes (in the case of isinstance() and issubclass()); the implementation is not suited for a large number of elements. A tuple implies that you have a limited number of elements, while lists can be arbitrarily large.
Next, if any iterable or sequence type would be acceptable, then that would include strings; a single string is also a sequence. Should then a single string argument be treated as separate characters, or as a single prefix?
So in other words, it's a limitation to self-document that the sequence won't be mutated, is consistent with other APIs, it carries an implication of a limited number of items to test against, and removes ambiguity as to how a single string argument should be treated.
Note that this was brought up before on the Python Ideas list; see this thread; Guido van Rossum's main argument there is that you either special case for single strings or for only accepting a tuple. He picked the latter and doesn't see a need to change this.

This has already been suggested on Python-ideas a couple of years back see: str.startswith taking any iterator instead of just tuple and GvR had this to say:
The current behavior is intentional, and the ambiguity of strings
themselves being iterables is the main reason. Since startswith() is
almost always called with a literal or tuple of literals anyway, I see
little need to extend the semantics.
In addition to that, there seemed to be no real motivation as to why to do this.
The current approach keeps things simple and fast,
unicode_startswith (and endswith) check for a tuple argument and then for a string one. They then call tailmatch in the appropriate direction. This is, arguably, very easy to understand in its current state, even for strangers to C code.
Adding other cases will only lead to more bloated and complex code for little benefit while also requiring similar changes to any other parts of the unicode object.

On a similar note, here is an excerpt from a talk by core developer, Raymond Hettinger discussing API design choices regarding certain string methods, including recent changes to the str.startswith signature. While he briefly mentions this fact that str.startswith accepts a string or tuple of strings and does not expound, the talk is informative on the decisions and pain points both core developers and contributors have dealt with leading up to the present API.

Related

Is there a historical reason why Python split() and join() methods use opposite syntaxes? [duplicate]

This has always confused me. It seems like this would be nicer:
["Hello", "world"].join("-")
Than this:
"-".join(["Hello", "world"])
Is there a specific reason it is like this?

It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something other than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found

This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.
There were four options proposed in this thread:
str.join(seq)
seq.join(str)
seq.reduce(str)
join as a built-in function
Guido wanted to support not only lists and tuples, but all sequences/iterables.
seq.reduce(str) is difficult for newcomers.
seq.join(str) introduces unexpected dependency from sequences to str/unicode.
join() as a free-standing built-in function would support only specific data types. So using a built-in namespace is not good. If join() were to support many data types, creating an optimized implementation would be difficult: if implemented using the __add__ method then it would be O(n²).
The separator string (sep) should not be omitted. Explicit is better than implicit.
Here are some additional thoughts (my own, and my friend's):
Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS-2/-4. To calculate total buffer length for UTF-8 strings, the method needs to know the character encoding.
At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).
Guido's decision is recorded in a historical mail, deciding on str.join(seq):
Funny, but it does seem right! Barry, go for it...
Guido van Rossum

Because the join() method is in the string class, instead of the list class.
See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
Historical note. When I first learned
Python, I expected join to be a method
of a list, which would take the
delimiter as an argument. Lots of
people feel the same way, and there’s
a story behind the join method. Prior
to Python 1.6, strings didn’t have all
these useful methods. There was a
separate string module which contained
all the string functions; each
function took a string as its first
argument. The functions were deemed
important enough to put onto the
strings themselves, which made sense
for functions like lower, upper, and
split. But many hard-core Python
programmers objected to the new join
method, arguing that it should be a
method of the list instead, or that it
shouldn’t move at all but simply stay
a part of the old string module (which
still has lots of useful stuff in it).
I use the new join method exclusively,
but you will see code written either
way, and if it really bothers you, you
can use the old string.join function
instead.
--- Mark Pilgrim, Dive into Python

I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:
it must work for different iterables too (tuples, generators, etc.)
it must have different behavior between different types of strings.
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.

Why is it string.join(list) instead of list.join(string)?
This is because join is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?
What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let's roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
Performance Caveat for Generators
The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join operation is still semantically a "string" operation, so it still makes sense to have it on the str object than on miscellaneous iterables.

Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can't easily be implemented just on list.
For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.

- in "-".join(my_list) declares that you are converting to a string from joining elements a list.It's result-oriented. (just for easy memory and understanding)
I made an exhaustive cheatsheet of methods_of_string for your reference.
string_methods_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}

Primarily because the result of a someString.join() is a string.
The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.

The variables my_list and "-" are both objects. Specifically, they're instances of the classes list and str, respectively. The join function belongs to the class str. Therefore, the syntax "-".join(my_list) is used because the object "-" is taking my_list as an input.

You can't only join lists and tuples. You can join almost any iterable.
And iterables include generators, maps, filters etc
>>> '-'.join(chr(x) for x in range(48, 55))
'0-1-2-3-4-5-6'
>>> '-'.join(map(str, (1, 10, 100)))
'1-10-100'
And the beauty of using generators, maps, filters etc is that they cost little memory, and are created almost instantaneously.
Just another reason why it's conceptually:
str.join(<iterator>)
It's efficient only granting str this ability. Instead of granting join to all the iterators: list, tuple, set, dict, generator, map, filter all of which only have object as common parent.
Of course range(), and zip() are also iterators, but they will never return str so they cannot be used with str.join()
>>> '-'.join(range(48, 55))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found

I 100% agree with your issue. If we boil down all the answers and comments here, the explanation comes down to "historical reasons".
str.join isn't just confusing or not-nice looking, it's impractical in real-world code. It defeats readable function or method chaining because the separator is rarely (ever?) the result of some previous computation. In my experience, it's always a constant, hard-coded value like ", ".
I clean up my code — enabling reading it in one direction — using tools.functoolz:
>>> from toolz.functoolz import curry, pipe
>>> join = curry(str.join)
>>>
>>> a = ["one", "two", "three"]
>>> pipe(
... a,
... join("; ")
>>> )
'one; two; three'
I'll have several other functions in the pipe as well. The result is that it reads easily in just one direction, from beginning to end as a chain of functions. Currying map helps a lot.

What is the best way to work with unique item, list and generator [duplicate]

I want to write a function that accepts a parameter which can be either a sequence or a single value. The type of value is str, int, etc., but I don't want it to be restricted to a hardcoded list.
In other words, I want to know if the parameter X is a sequence or something I have to convert to a sequence to avoid special-casing later. I could do
type(X) in (list, tuple)
but there may be other sequence types I'm not aware of, and no common base class.
-N.
Edit: See my "answer" below for why most of these answers don't help me. Maybe you have something better to suggest.

As of 2.6, use abstract base classes.
>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance(0, collections.Sequence)
False
Furthermore ABC's can be customized to account for exceptions, such as not considering strings to be sequences. Here an example:
import abc
import collections
class Atomic(object):
__metaclass__ = abc.ABCMeta
#classmethod
def __subclasshook__(cls, other):
return not issubclass(other, collections.Sequence) or NotImplemented
Atomic.register(basestring)
After registration the Atomic class can be used with isinstance and issubclass:
assert isinstance("hello", Atomic) == True
This is still much better than a hard-coded list, because you only need to register the exceptions to the rule, and external users of the code can register their own.
Note that in Python 3 the syntax for specifying metaclasses changed and the basestring abstract superclass was removed, which requires something like the following to be used instead:
class Atomic(metaclass=abc.ABCMeta):
#classmethod
def __subclasshook__(cls, other):
return not issubclass(other, collections.Sequence) or NotImplemented
Atomic.register(str)
If desired, it's possible to write code which is compatible both both Python 2.6+ and 3.x, but doing so requires using a slightly more complicated technique which dynamically creates the needed abstract base class, thereby avoiding syntax errors due to the metaclass syntax difference. This is essentially the same as what Benjamin Peterson's six module'swith_metaclass()function does.
class _AtomicBase(object):
#classmethod
def __subclasshook__(cls, other):
return not issubclass(other, collections.Sequence) or NotImplemented
class Atomic(abc.ABCMeta("NewMeta", (_AtomicBase,), {})):
pass
try:
unicode = unicode
except NameError: # 'unicode' is undefined, assume Python >= 3
Atomic.register(str) # str includes unicode in Py3, make both Atomic
Atomic.register(bytes) # bytes will also be considered Atomic (optional)
else:
# basestring is the abstract superclass of both str and unicode types
Atomic.register(basestring) # make both types of strings Atomic
In versions before 2.6, there are type checkers in theoperatormodule.
>>> import operator
>>> operator.isSequenceType([])
True
>>> operator.isSequenceType(0)
False

The problem with all of the above
mentioned ways is that str is
considered a sequence (it's iterable,
has getitem, etc.) yet it's
usually treated as a single item.
For example, a function may accept an
argument that can either be a filename
or a list of filenames. What's the
most Pythonic way for the function to
detect the first from the latter?
Based on the revised question, it sounds like what you want is something more like:
def to_sequence(arg):
'''
determine whether an arg should be treated as a "unit" or a "sequence"
if it's a unit, return a 1-tuple with the arg
'''
def _multiple(x):
return hasattr(x,"__iter__")
if _multiple(arg):
return arg
else:
return (arg,)
>>> to_sequence("a string")
('a string',)
>>> to_sequence( (1,2,3) )
(1, 2, 3)
>>> to_sequence( xrange(5) )
xrange(5)
This isn't guaranteed to handle all types, but it handles the cases you mention quite well, and should do the right thing for most of the built-in types.
When using it, make sure whatever receives the output of this can handle iterables.

IMHO, the python way is to pass the list as *list. As in:
myfunc(item)
myfunc(*items)

Sequences are described here:
https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
So sequences are not the same as iterable objects. I think sequence must implement
__getitem__, whereas iterable objects must implement __iter__.
So for example string are sequences and don't implement __iter__, xrange objects are sequences and don't implement __getslice__.
But from what you seen to want to do, I'm not sure you want sequences, but rather iterable objects.
So go for hasattr("__getitem__", X) you want sequences, but go rather hasattr("__iter__", X) if you don't want strings for example.

In cases like this, I prefer to just always take the sequence type or always take the scalar. Strings won't be the only types that would behave poorly in this setup; rather, any type that has an aggregate use and allows iteration over its parts might misbehave.

The simplest method would be to check if you can turn it into an iterator. ie
try:
it = iter(X)
# Iterable
except TypeError:
# Not iterable
If you need to ensure that it's a restartable or random access sequence (ie not a generator etc), this approach won't be sufficient however.
As others have noted, strings are also iterable, so if you need so exclude them (particularly important if recursing through items, as list(iter('a')) gives ['a'] again, then you may need to specifically exclude them with:
if not isinstance(X, basestring)

I'm new here so I don't know what's the correct way to do it. I want to answer my answers:
The problem with all of the above mentioned ways is that str is considered a sequence (it's iterable, has __getitem__, etc.) yet it's usually treated as a single item.
For example, a function may accept an argument that can either be a filename or a list of filenames. What's the most Pythonic way for the function to detect the first from the latter?
Should I post this as a new question? Edit the original one?

I think what I would do is check whether the object has certain methods that indicate it is a sequence. I'm not sure if there is an official definition of what makes a sequence. The best I can think of is, it must support slicing. So you could say:
is_sequence = '__getslice__' in dir(X)
You might also check for the particular functionality you're going to be using.
As pi pointed out in the comment, one issue is that a string is a sequence, but you probably don't want to treat it as one. You could add an explicit test that the type is not str.

If strings are the problem, detect a sequence and filter out the special case of strings:
def is_iterable(x):
if type(x) == str:
return False
try:
iter(x)
return True
except TypeError:
return False

You're asking the wrong question. You don't try to detect types in Python; you detect behavior.
Write another function that handles a single value. (let's call it _use_single_val).
Write one function that handles a sequence parameter. (let's call it _use_sequence).
Write a third parent function that calls the two above. (call it use_seq_or_val). Surround each call with an exception handler to catch an invalid parameter (i.e. not single value or sequence).
Write unit tests to pass correct & incorrect parameters to the parent function to make sure it catches the exceptions properly.
def _use_single_val(v):
print v + 1 # this will fail if v is not a value type
def _use_sequence(s):
print s[0] # this will fail if s is not indexable
def use_seq_or_val(item):
try:
_use_single_val(item)
except TypeError:
pass
try:
_use_sequence(item)
except TypeError:
pass
raise TypeError, "item not a single value or sequence"
EDIT: Revised to handle the "sequence or single value" asked about in the question.

Revised answer:
I don't know if your idea of "sequence" matches what the Python manuals call a "Sequence Type", but in case it does, you should look for the __Contains__ method. That is the method Python uses to implement the check "if something in object:"
if hasattr(X, '__contains__'):
print "X is a sequence"
My original answer:
I would check if the object that you received implements an iterator interface:
if hasattr(X, '__iter__'):
print "X is a sequence"
For me, that's the closest match to your definition of sequence since that would allow you to do something like:
for each in X:
print each

You could pass your parameter in the built-in len() function and check whether this causes an error. As others said, the string type requires special handling.
According to the documentation the len function can accept a sequence (string, list, tuple) or a dictionary.
You could check that an object is a string with the following code:
x.__class__ == "".__class__

Python isinstance function

I cannot understand why isinstance function as second parameter need a tuple instead of some iterable?
isinstance(some_object, (some_class1, some_class2))
works fine, but
isinstance(some_object, [some_class1, some_class2])
raise a TypeError

The reason seems to be "allowing only tuples is enough, it's simpler, it avoids the danger of some corner cases, and it seemed neater to the BDFL" (i.e. Guido). (Kudos to #Caleb for posting the key link in the comments.)
Here is an excerpt from this email conversation with Guido van Rossum that specifically addresses the case of other iterables for the isinstance function. (Click on the link for the complete conversation.)
On Thu, Jan 2, 2014 at 1:37 PM, James Powell wrote:
This is driven by a real-world example wherein a large number of
prefixes stored in a set, necessitating:
any('spam'.startswith(c) for c in prefixes)
# or
'spam'.startswith(tuple(prefixes))
Neither of these strikes me as bad. Also, depending on whether the set
of prefixes itself changes dynamically, it may be best to lift the
tuple() call out of the startswith() call.
...
However, .startswith doesn't seem to be the only example of this, and
the other examples are free of the string/iterable ambiguity:
isinstance(x, {int, float})
But this is even less likely to have a dynamically generated argument.
And there could still be another ambiguity here: a metaclass could
conceivably make its instances (i.e. classes) iterable.

It is exacly as it should behave, according to the docs: https://docs.python.org/3/library/functions.html#isinstance
If classinfo is a tuple of type objects (or recursively, other such tuples), return true if object is an instance of any of the types. If classinfo is not a type or tuple of types and such tuples, a TypeError exception is raised.

Because a string is also "some iterable". So you could write:
isinstance(some_object, 'foobar')
and it would check if some_object is an instance of f, o, b, a or r.
This wouldn't work obviously, so isinstance would need to check if the second argument is not a string. Since isinstance needs to do a type check, it might as well make sure the second argument is always a tuple.

Because this is the way the language was designed...
When you write code that can accept more that one type, it is easier to have fixed types that you can directly test than using duck typing. For example as strings are iterable, when you want to accept either a string of a sequence of strings you must first test for the string type.
Here I can imagine no strong reason for limiting to the tuple type, but no strong reason either to extend it to any sequence. You could try to propose it on the python-ideas list.

At a high level need a container type for isinstance checks, so you have tuples, lists, sets, and dicts for built-in containers. Most likely, they decided on tuple over a set because the expected use case for isinstance is a small number of types, and a tuple is faster to check for containment of than compared to a set.
Mutability really isn't a consideration. If they really needed immutability, they could have just re-wrapped the iterable into a tuple before processing.

Finding documentation on python "native" types, e.g. set

I'm trying to learn Python, and, while I managed to stumble on the answer to my current problem, I'd like to know how I can better find answers in the future.
My goal was to take a list of strings as input, and return a string whose characters were the union of the characters in the strings, e.g.
unionStrings( ("ab", "bc"))
would return "abc".
I implemented it like this:
def unionStrings( strings ):
# Input: A list of strings
# Output: A string that is the (set) union of input strings
all = set()
for s in strings:
all = all.union(set(s))
return "".join(sorted(list(all)))
I felt the for loop was unnecessary, and searched for more neater, more pythonic(?), improvements .
First question: I stumbled on using the class method set.union(), instead of set1.union(set2). Should I have been able to find this in the standard Python docs? I've not been able to find it there.
So I tried using set.union() like this:
>>> set.union( [set(x) for x in ("ab","bc")] )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'union' requires a 'set' object but received a 'list'
Again, I stumbled around and finally found that I should be calling it like this:
>>> set.union( *[set(x) for x in ("ab","bc")] )
set(['a', 'c', 'b'])
Second question: I think this means that set.union is (effectively) declared as
set.union( *sets)
and not
set.union( setsList )
Is that correct? (I'm still learning how to use splat '*'.)
Third question: Where could I find documentation on the signature of set.union()? I didn't see it in the set/freezeset doc's, and I couldn't get the inspect module to give me anything. I'm not even sure set is a module, it seems to be a type. Is it defined in a module, or what?
Thanks for reading my complicated question. It's more "How do I navigate Python documentation?" than "How do I do this in Python code?".
Responding to jonrsharpe's comment:
Ohhhhh! I'm so used to C++ where you define separate static and instance methods. Now that you explain it I can really see what's happening.
The only thing I might do different is write it as
t = set.union( *[set(x) for x strings] )
return "".join(sorted(t))
because it bugs me to treat strings[0] differently from the strings in strings[1:] when, functionally, they don't play different roles. If I have to call set() on one of them, I'd rather call it on all of them, since union() is going to do it anyways. But that's just style, right?

There are several questions here. Firstly, you should know that:
Class.method(instance, arg)
is equivalent to:
instance.method(arg)
for instance methods. You can call the method on the class and explicitly provide the instance, or just call it on the instance.
For historical reasons, many of the standard library and built-in types don't follow the UppercaseWords convention for class names, but they are classes. Therefore
set.union(aset, anotherset)
is the same as
aset.union(anotherset)
set methods specifically can be tricky, because of the way they're often used. set.method(arg1, arg2, ...) requires arg1 to already be a set, the instance for the method, but all the other arguments will be converted (from 2.6 on).
This isn't directly covered in the set docs, because it's true for everything; Python is pretty consistent.
In terms of needing to "splat", note that the docs say:
union(other, ...)
rather than
union(others)
i.e. each iterable is a separate argument, hence you need to unpack your list of iterables.
Your function could therefore be:
def union_strings(strings):
if not strings:
return ""
return "".join(sorted(set(strings[0]).union(*strings[1:])))
or, avoiding the special-casing of strings[0]:
def union_strings(strings):
if not strings:
return ""
return "".join(sorted(set.union(*map(set, strings))))

Python: why str.join(iterable) instead of str.join(*strings)

I'm constantly wrapping my str.join() arguments in a list, e.g.
'.'.join([str_one, str_two])
The extra list wrapper always seems superfluous to me. I'd like to do...
'.'.join(str_one, str_two, str_three, ...)
... or if I have a list ...
'.'.join(*list_of_strings)
Yes I'm a minimalist, yes I'm picky, but mostly I'm just curious about the history here, or whether I'm missing something. Maybe there was a time before splats?
Edit:
I'd just like to note that max() handles both versions:
max(iterable[, key])
max(arg1, arg2, *args[, key])

For short lists this won't matter and it costs you exactly 2 characters to type. But the most common use-case (I think) for str.join() is following:
''.join(process(x) for x in some_input)
# or
result = []
for x in some_input:
result.append(process(x))
''.join(result)
where input_data can have thousand of entries and you just want to generate the output string efficiently.
If join accepted variable arguments instead of an iterable, this would have to be spelled as:
''.join(*(process(x) for x in some_input))
# or
''.join(*result)
which would create a (possibly long) tuple, just to pass it as *args.
So that's 2 characters in a short case vs. being wasteful in large data case.
History note
(Second Edit: based on HISTORY file which contains missing release from all releases. Thanks Don.)
The *args in function definitions were added in Python long time ago:
==> Release 0.9.8 (9 Jan 1993) <==
Case (a) was needed to accommodate variable-length argument lists;
there is now an explicit "varargs" feature (precede the last argument
with a '*'). Case (b) was needed for compatibility with old class
definitions: up to release 0.9.4 a method with more than one argument
had to be declared as "def meth(self, (arg1, arg2, ...)): ...".
A proper way to pass a list to such functions was using a built-in function apply(callable, sequence). (Note, this doesn't mention **kwargs which can be first seen in docs for version 1.4).
The ability to call a function with * syntax is first mentioned in release notes for 1.6:
There's now special syntax that you can use instead of the apply()
function. f(*args, **kwds) is equivalent to apply(f, args, kwds). You
can also use variations f(a1, a2, *args, **kwds) and you can leave one
or the other out: f(args), f(*kwds).
But it's missing from grammar docs until version 2.2.
Before 2.0 str.join() did not even exists and you had to do from string import join.

You'd have to write your own function to do that.
>>> def my_join(separator, *args):
return separator.join(args)
>>> my_join('.', '1', '2', '3')
'1.2.3'
Note that this doesn't avoid the creation of an extra object, it just hides that an extra object is being created. If you inspect the type of args, you'll see that it's a tuple.
If you don't want to create a function and you have a fixed list of strings then it would be possible to use format instead of join:
'{}.{}.{}.{}'.format(str_one, str_two, str_three, str_four)
It's better to just stick with '.'.join((a, b, c)).

Argh, now this is a hard question! Try arguing which style is more minimalist... Hard to give a good answer without being too subjective, since it's all about convention.
The problem is: We have a function that accepts an ordered collection; should it accept it as a single argument or as a variable-length argument list?
Python usually answers: Single argument; VLAL if you really have a reason to. Let's see how Python libs reflect this:
The standard library has a couple examples for VLAL, most notably:
when the function can be called with an arbitrary number of separate sequences - like zip or map or itertools.chain,
when there's one sequence to pass, but you don't really expect the caller to have the whole of it as a single variable. This seems to fit str.format.
And the common case for using a single argument:
When you want to do some generic data processing on a single sequence. This fits the functional trio (map*, reduce, filter), and specialized spawns of thereof, like sum or str.join. Also stateful transforms like enumerate.
The pattern is "consume an interable, give another iterable" or "consume an iterable, give a result".
Hope this answers your question.
Note: map is technically var-arg, but the common use case is just map(func, sequence) -> sequence which falls into one bucket with reduce and filter.
*The obscure case, map(func, *sequences) is conceptually like map(func, izip_longest(sequences)) - and the reason for zips to follow the var-arg convention was explained before.
I Hope you follow my thinking here; after all it's all a matter of programming style, I'm just pointing at some patterns in Python's library functions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does isinstance require a tuple instead of any iterable? [duplicate] - python

Related

Is there a historical reason why Python split() and join() methods use opposite syntaxes? [duplicate]

What is the best way to work with unique item, list and generator [duplicate]

Python isinstance function

Finding documentation on python "native" types, e.g. set

Python: why str.join(iterable) instead of str.join(*strings)

Categories

Resources