How does str.startswith really work? - python

I've been playing for a bit with startswith() and I've discovered something interesting:
>>> tup = ('1', '2', '3')
>>> lis = ['1', '2', '3', '4']
>>> '1'.startswith(tup)
True
>>> '1'.startswith(lis)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str or a tuple of str, not list
Now, the error is obvious and casting the list into a tuple will work just fine as it did in the first place:
>>> '1'.startswith(tuple(lis))
True
Now, my question is: why the first argument must be str or a tuple of str prefixes, but not a list of str prefixes?
AFAIK, the Python code for startswith() might look like this:
def startswith(src, prefix):
return src[:len(prefix)] == prefix
But that just confuses me more, because even with it in mind, it still shouldn't make any difference whether is a list or tuple. What am I missing ?

There is technically no reason to accept other sequence types, no. The source code roughly does this:
if isinstance(prefix, tuple):
for substring in prefix:
if not isinstance(substring, str):
raise TypeError(...)
return tailmatch(...)
elif not isinstance(prefix, str):
raise TypeError(...)
return tailmatch(...)
(where tailmatch(...) does the actual matching work).
So yes, any iterable would do for that for loop. But, all the other string test APIs (as well as isinstance() and issubclass()) that take multiple values also only accept tuples, and this tells you as a user of the API that it is safe to assume that the value won't be mutated. You can't mutate a tuple but the method could in theory mutate the list.
Also note that you usually test for a fixed number of prefixes or suffixes or classes (in the case of isinstance() and issubclass()); the implementation is not suited for a large number of elements. A tuple implies that you have a limited number of elements, while lists can be arbitrarily large.
Next, if any iterable or sequence type would be acceptable, then that would include strings; a single string is also a sequence. Should then a single string argument be treated as separate characters, or as a single prefix?
So in other words, it's a limitation to self-document that the sequence won't be mutated, is consistent with other APIs, it carries an implication of a limited number of items to test against, and removes ambiguity as to how a single string argument should be treated.
Note that this was brought up before on the Python Ideas list; see this thread; Guido van Rossum's main argument there is that you either special case for single strings or for only accepting a tuple. He picked the latter and doesn't see a need to change this.

This has already been suggested on Python-ideas a couple of years back see: str.startswith taking any iterator instead of just tuple and GvR had this to say:
The current behavior is intentional, and the ambiguity of strings
themselves being iterables is the main reason. Since startswith() is
almost always called with a literal or tuple of literals anyway, I see
little need to extend the semantics.
In addition to that, there seemed to be no real motivation as to why to do this.
The current approach keeps things simple and fast,
unicode_startswith (and endswith) check for a tuple argument and then for a string one. They then call tailmatch in the appropriate direction. This is, arguably, very easy to understand in its current state, even for strangers to C code.
Adding other cases will only lead to more bloated and complex code for little benefit while also requiring similar changes to any other parts of the unicode object.

On a similar note, here is an excerpt from a talk by core developer, Raymond Hettinger discussing API design choices regarding certain string methods, including recent changes to the str.startswith signature. While he briefly mentions this fact that str.startswith accepts a string or tuple of strings and does not expound, the talk is informative on the decisions and pain points both core developers and contributors have dealt with leading up to the present API.

Related

Python match case shows error with list element as case. Expected ":"

Hi I was using the newly added match case function but I encountered this problem.
Here's my code:
from typing import List
class Wee():
def __init__(self) -> None:
self.lololol: List[str] = ["car", "goes", "brrrr"]
def func1(self):
self.weee = input()
try:
match self.weee:
case self.lololol[0]:
print(self.lololol[0])
case self.lololol[1]:
print(self.lololol[1])
case _:
print(self.lololol[2])
except SyntaxError as e:
print(e)
waa = Wee()
waa.func1()
At line 11 and 13, errors show up saying SyntaxError: expected ':'. However, when I change case self.lololol[0]: to case "car":, the errors disappear. What is happening?
You can’t use arbitrary expressions as patterns (since that would sometimes be ambiguous), only a certain subset.
If you want to match against the elements of a list, you should probably, depending on the situation, either make them separate variables, or use list.index.
This is because the match case statement is intended for structural matching. Specifically PEP 635 says:
Although patterns might superficially look like expressions, it is important to keep in mind that there is a clear distinction. In fact, no pattern is or contains an expression. It is more productive to think of patterns as declarative elements similar to the formal parameters in a function definition.
Specifically, plain variables are used as capture patterns and not as match patterns - definitely not what a C/C++ programmer could expect... - and functions or subscripts or just not allowed.
On a practical point of view, only litteral or dotted expressions can be used as matching patterns.

Why ins't it possible to create a list with multiple arguments

I got curious about why we can do this:
l = [1,2,3,4]
But get an error when trying this:
l = list(1,2,3,4)
Also, as most of people usually suggest, despite of passing the arguments of the latter as a tuple solves the problem, why this doesn't work either?
t = tuple(1,2,3,4)
list() and tuple() are used to convert a different type of object into a list or tuple, respectively. Any iterable will do there. The two functions are not intended to create an object from a discreet number of inputs, that's what the literal notations are for.
So if you have a fixed number of elements, each produceable with an expression, the right way to create a list is to use the [...] literal syntax. If you have a variable number of elements produced by a single iterable object, use list(). The two use-cases differ.
If list() accepted multiple arguments, then you wouldn't need to have the [...] syntax anymore; there is no point in having two different syntaxes fill the same use case.

Does a trailing comma after an n-tuple in Python change its value?

This is a very basic doubt that came to my mind. When we use threading module in python to start a new thread, I have see two different ways in which arguments are passed to with the call:
Version 1:
thread = threading.Thread(target=tar,args=(4,0.25,))
Version 2:
thread = threading.Thread(target=tar,args=(4,0.25))
The difference is the addition of , at the end of argument list at the end of version 1 call. Both the versions work fine but I want to know if theres any significant difference between the two versions above and if than which ones a better way to write? If theres no difference than what is the reason a lot of people and articles choose to use version 1 and add a redundant , at the end of the argument list.
The two forms of writing a 2-tuple are equivalent. Proof:
>>> (4,0.25,) == (4,0.25)
True
For an elaboration on valid tuple syntax in Python, see https://wiki.python.org/moin/TupleSyntax. Specifically:
In Python, multiple-element tuples look like:
1,2,3
The essential elements are the commas between each element of
the tuple. Multiple-element tuples may be written with a trailing
comma, e.g.
1,2,3,
but the trailing comma is completely optional.

In Python, are strings considered as lists?

I am just curious whether strings are considered as lists.
Strings and lists are both sequences, so for loops can iterate over them, but they are definitely two different types.
for c in 'abcd':
print c
for i in [1,2,3,4]:
print i
Strings are lists like a cat is a marshmallow. They're similar in a lot of ways - you can pet them, they're both soft, they can be real sweet to you, and they can make a mess on your floor.
And you can eat a marshmallow, but if you try to eat a cat, it will throw a big sharp, bloody error in your face.
So it is with strings and lists. You can do many of the same things to them, because they are both sequences, but some things you should only do with strings, and some things only with lists.
In almost all languages Strings are indeed lists of characters, so yes, in that sense Strings are lists. However, strings are their own entities. They have their own methods, and not all list methods can be used on strings. However, there is an overlap. You can slice, iterate and concatenate strings as if they were lists, and even use a few list methods (ie. len and index). However, the biggest difference is that Strings in Python are not mutable. With a list you can do my_list[5] = "a". If you try this with a String, you'll receive a TypeError.
EDIT:
As is mentioned in the comment of another answer, immutability may not be the biggest difference (that's a matter of opinion), but something I don't see mentioned anywhere else is the fact that lists can be multidimensional. While you can easily have a two, three or even four dimensional list, something like that is not possible with strings (though arguably my_list = ["foo", "bar"] could be looked at as multidimensional since you can call my_list[1][2], it is not solely a String, it's a combination of Strings and lists). I would be thoroughly impressed if someone could produce a "String Of Strings" like you can a "List Of Lists."
In addition to what others have said. Strings are not mutable and hashable: you cannot change strings in place and they can be keys in dictionaries, and members of sets. Lists are mutable and not hashable, you can change a list in place, but they cannot be keys in a dictionary or members of sets.
# Hashability
>>> {['a', 'b'] : 1} # With lists: fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> {'ab' : 1} # With strings: works
{'ab': 1}

Python: how to ensure method gets a list

I have a method that takes a list of strings. Unfortunately, if the list is only one item long, Python treats this list as a string.
This post has a method for checking whether something is a string, and converting it to a list if it is:
Python: How to define a function that gets a list of strings OR a string
But this seems an incredibly redundant way of getting the item that I passed in as a list to in fact be a list. Is there a better way?
You are probably using tuples, not lists, and forgetting to add the comma to the one-item tuple literal(s). ('foo') is interpreted as simply 'foo' which would match what you are saying. However, adding a comma to one-item tuple literals will solve this. ('foo',) returns, well, ('foo',).
I'm not sure I believe you, python shouldn't behave that way, and it doesn't appear to:
>>> def foo(lst):
print type(lst)
>>> foo(['bar'])
<type 'list'>
That post was about a different thing, they wanted the ability to pass a single string or a list of strings and handle both cases as if they were lists of strings. If you're only passing in a list, always treating it as a list should be fine.
Python shouldn't do that with a list. A singleton tuple, though, has syntax different from singleton lists:
(1) == 1 # (1) is 1
[1] != 1 # [1] is a singleton list
(1,) != 1 # (1,) is a singleton tuple
You are mistaken. Python does no such transformation to lists of a single element.
Double, Triple check that you are putting the [] around the the item you are passing.
If you still can't get it working show us the code!

Categories