I am just curious whether strings are considered as lists.
Strings and lists are both sequences, so for loops can iterate over them, but they are definitely two different types.
for c in 'abcd':
print c
for i in [1,2,3,4]:
print i
Strings are lists like a cat is a marshmallow. They're similar in a lot of ways - you can pet them, they're both soft, they can be real sweet to you, and they can make a mess on your floor.
And you can eat a marshmallow, but if you try to eat a cat, it will throw a big sharp, bloody error in your face.
So it is with strings and lists. You can do many of the same things to them, because they are both sequences, but some things you should only do with strings, and some things only with lists.
In almost all languages Strings are indeed lists of characters, so yes, in that sense Strings are lists. However, strings are their own entities. They have their own methods, and not all list methods can be used on strings. However, there is an overlap. You can slice, iterate and concatenate strings as if they were lists, and even use a few list methods (ie. len and index). However, the biggest difference is that Strings in Python are not mutable. With a list you can do my_list[5] = "a". If you try this with a String, you'll receive a TypeError.
EDIT:
As is mentioned in the comment of another answer, immutability may not be the biggest difference (that's a matter of opinion), but something I don't see mentioned anywhere else is the fact that lists can be multidimensional. While you can easily have a two, three or even four dimensional list, something like that is not possible with strings (though arguably my_list = ["foo", "bar"] could be looked at as multidimensional since you can call my_list[1][2], it is not solely a String, it's a combination of Strings and lists). I would be thoroughly impressed if someone could produce a "String Of Strings" like you can a "List Of Lists."
In addition to what others have said. Strings are not mutable and hashable: you cannot change strings in place and they can be keys in dictionaries, and members of sets. Lists are mutable and not hashable, you can change a list in place, but they cannot be keys in a dictionary or members of sets.
# Hashability
>>> {['a', 'b'] : 1} # With lists: fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> {'ab' : 1} # With strings: works
{'ab': 1}
Related
I wrote a bit of code to iterate over a list to see if a line contained one or more keywords:
STRINGS_TO_MATCH = [ "Foo",
"Bar",
"Oggle" ]
for string in STRINGS_TO_MATCH
if string in line:
val_from_line = line.split(' ')[-1]
Does anyone happen to know if there is a way to make this more readable? Would a list comprehension be a better fit here?
The thing to remember here is that comprehensions are expressions, whose purpose is to create a value - list comprehensions create lists, dict comprehensions create dicts, and set comprehensions create sets. They are unlikely to help in this case, because you aren't creating any such object.
Your code sample is incomplete, because it doesn't do anything with the val_from_line values that it extracts. I am presuming that you want to extract the last "word" from a line which contains any of the strings in STRINGS_TO_MATCH, but it's difficult to work with such incomplete information so this answer might, for all I know, be totally useless.
Assuming I'm correct, the easiest way to find out if line contains any of the STRINGS_TO_MATCH is to use the expression
any(s in line for s in STRINGS_TO_MATCH)
This uses a so-called generator expression, which is similar to a list comprehension - the interpreter can iterate over it to produce a sequence of values - but it doesn't go as far as creating a list of the values, it just creates them as the client code (in this case the any built-in function) requests them. So I might rewrite your code as
if any(s in line for s in STRINGS_TO_MATCH):
val_from_line = line.split(' ')[-1]
I'll leave you to decide what you actually want to do after that, with the warning note that after this code executes val_from_line may or may not exist (depending on whether or not the condition was true), which is never an entirely comfortable situation.
I've been playing for a bit with startswith() and I've discovered something interesting:
>>> tup = ('1', '2', '3')
>>> lis = ['1', '2', '3', '4']
>>> '1'.startswith(tup)
True
>>> '1'.startswith(lis)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str or a tuple of str, not list
Now, the error is obvious and casting the list into a tuple will work just fine as it did in the first place:
>>> '1'.startswith(tuple(lis))
True
Now, my question is: why the first argument must be str or a tuple of str prefixes, but not a list of str prefixes?
AFAIK, the Python code for startswith() might look like this:
def startswith(src, prefix):
return src[:len(prefix)] == prefix
But that just confuses me more, because even with it in mind, it still shouldn't make any difference whether is a list or tuple. What am I missing ?
There is technically no reason to accept other sequence types, no. The source code roughly does this:
if isinstance(prefix, tuple):
for substring in prefix:
if not isinstance(substring, str):
raise TypeError(...)
return tailmatch(...)
elif not isinstance(prefix, str):
raise TypeError(...)
return tailmatch(...)
(where tailmatch(...) does the actual matching work).
So yes, any iterable would do for that for loop. But, all the other string test APIs (as well as isinstance() and issubclass()) that take multiple values also only accept tuples, and this tells you as a user of the API that it is safe to assume that the value won't be mutated. You can't mutate a tuple but the method could in theory mutate the list.
Also note that you usually test for a fixed number of prefixes or suffixes or classes (in the case of isinstance() and issubclass()); the implementation is not suited for a large number of elements. A tuple implies that you have a limited number of elements, while lists can be arbitrarily large.
Next, if any iterable or sequence type would be acceptable, then that would include strings; a single string is also a sequence. Should then a single string argument be treated as separate characters, or as a single prefix?
So in other words, it's a limitation to self-document that the sequence won't be mutated, is consistent with other APIs, it carries an implication of a limited number of items to test against, and removes ambiguity as to how a single string argument should be treated.
Note that this was brought up before on the Python Ideas list; see this thread; Guido van Rossum's main argument there is that you either special case for single strings or for only accepting a tuple. He picked the latter and doesn't see a need to change this.
This has already been suggested on Python-ideas a couple of years back see: str.startswith taking any iterator instead of just tuple and GvR had this to say:
The current behavior is intentional, and the ambiguity of strings
themselves being iterables is the main reason. Since startswith() is
almost always called with a literal or tuple of literals anyway, I see
little need to extend the semantics.
In addition to that, there seemed to be no real motivation as to why to do this.
The current approach keeps things simple and fast,
unicode_startswith (and endswith) check for a tuple argument and then for a string one. They then call tailmatch in the appropriate direction. This is, arguably, very easy to understand in its current state, even for strangers to C code.
Adding other cases will only lead to more bloated and complex code for little benefit while also requiring similar changes to any other parts of the unicode object.
On a similar note, here is an excerpt from a talk by core developer, Raymond Hettinger discussing API design choices regarding certain string methods, including recent changes to the str.startswith signature. While he briefly mentions this fact that str.startswith accepts a string or tuple of strings and does not expound, the talk is informative on the decisions and pain points both core developers and contributors have dealt with leading up to the present API.
I got curious about why we can do this:
l = [1,2,3,4]
But get an error when trying this:
l = list(1,2,3,4)
Also, as most of people usually suggest, despite of passing the arguments of the latter as a tuple solves the problem, why this doesn't work either?
t = tuple(1,2,3,4)
list() and tuple() are used to convert a different type of object into a list or tuple, respectively. Any iterable will do there. The two functions are not intended to create an object from a discreet number of inputs, that's what the literal notations are for.
So if you have a fixed number of elements, each produceable with an expression, the right way to create a list is to use the [...] literal syntax. If you have a variable number of elements produced by a single iterable object, use list(). The two use-cases differ.
If list() accepted multiple arguments, then you wouldn't need to have the [...] syntax anymore; there is no point in having two different syntaxes fill the same use case.
I have a method that takes a list of strings. Unfortunately, if the list is only one item long, Python treats this list as a string.
This post has a method for checking whether something is a string, and converting it to a list if it is:
Python: How to define a function that gets a list of strings OR a string
But this seems an incredibly redundant way of getting the item that I passed in as a list to in fact be a list. Is there a better way?
You are probably using tuples, not lists, and forgetting to add the comma to the one-item tuple literal(s). ('foo') is interpreted as simply 'foo' which would match what you are saying. However, adding a comma to one-item tuple literals will solve this. ('foo',) returns, well, ('foo',).
I'm not sure I believe you, python shouldn't behave that way, and it doesn't appear to:
>>> def foo(lst):
print type(lst)
>>> foo(['bar'])
<type 'list'>
That post was about a different thing, they wanted the ability to pass a single string or a list of strings and handle both cases as if they were lists of strings. If you're only passing in a list, always treating it as a list should be fine.
Python shouldn't do that with a list. A singleton tuple, though, has syntax different from singleton lists:
(1) == 1 # (1) is 1
[1] != 1 # [1] is a singleton list
(1,) != 1 # (1,) is a singleton tuple
You are mistaken. Python does no such transformation to lists of a single element.
Double, Triple check that you are putting the [] around the the item you are passing.
If you still can't get it working show us the code!
Please observe the following behavior:
a = u"foo"
b = u"b\xe1r" # \xe1 is an 'a' with an accent
s = [a, b]
print a, b
print s
for x in s: print x,
The result is:
foo bár
[u'foo', u'b\xe1r']
foo bár
When I just print the two values sitting in variables a and b, I get what I expect; when I put the string values in a list and print it, I get the unwanted u"xyz" form; finally, when I print values from the list with a loop, I get the first form again. Can someone please explain this seemingly odd behavior? I know there's probably a good reason.
When you print a list, you get the repr() of each element, lists aren't really meant to be printed, so python tries to print something representative of it's structure.
If you want to format it in any particular way, either be explicit about how you want it formatted, or override it's __repr__ method.
Objects in Python have two ways to be turned into strings: roughly speaking, str() produces human readable output, and repr() produces computer-readable output. When you print something, it uses str().
But the str() of a list uses the repr() of its elements.
You get this because lists can contain any number of elements, of mixed types. In the second case, instead of printing unicode strings, you're printing the list itself - which is very different than printing the list contents.
Since the list can contain anything, you get the u'foo' syntax. If you were using non-unicode strings, you'd see the 'foo' instead of just foo, as well.