Why do tuples in a list comprehension need parentheses? [duplicate] - python

This question already has answers here:
Why does creating a list of tuples using list comprehension requires parentheses?
(2 answers)
Closed 2 years ago.
It is well known that tuples are not defined by parentheses, but commas. Quote from documentation:
A tuple consists of a number of values separated by commas
Therefore:
myVar1 = 'a', 'b', 'c'
type(myVar1)
# Result:
<type 'tuple'>
Another striking example is this:
myVar2 = ('a')
type(myVar2)
# Result:
<type 'str'>
myVar3 = ('a',)
type(myVar3)
# Result:
<type 'tuple'>
Even the single-element tuple needs a comma, and parentheses are always used just to avoid confusion. My question is: Why can't we omit parentheses of arrays in a list comprehension? For example:
myList1 = ['a', 'b']
myList2 = ['c', 'd']
print([(v1,v2) for v1 in myList1 for v2 in myList2])
# Works, result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
print([v1,v2 for v1 in myList1 for v2 in myList2])
# Does not work, result:
SyntaxError: invalid syntax
Isn't the second list comprehension just syntactic sugar for the following loop, which does work?
myTuples = []
for v1 in myList1:
for v2 in myList2:
myTuple = v1,v2
myTuples.append(myTuple)
print myTuples
# Result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]

Python's grammar is LL(1), meaning that it only looks ahead one symbol when parsing.
[(v1, v2) for v1 in myList1 for v2 in myList2]
Here, the parser sees something like this.
[ # An opening bracket; must be some kind of list
[( # Okay, so a list containing some value in parentheses
[(v1
[(v1,
[(v1, v2
[(v1, v2)
[(v1, v2) for # Alright, list comprehension
However, without the parentheses, it has to make a decision earlier on.
[v1, v2 for v1 in myList1 for v2 in myList2]
[ # List-ish thing
[v1 # List containing a value; alright
[v1, # List containing at least two values
[v1, v2 # Here's the second value
[v1, v2 for # Wait, what?
A parser which backtracks tends to be notoriously slow, so LL(1) parsers do not backtrack. Thus, the ambiguous syntax is forbidden.

As I felt "because the grammar forbids it" to be a little too snarky, I came up with a reason.
It begins parsing the expression as a list/set/tuple and is expecting a , and instead encounters a for token.
For example:
$ python3.6 test.py
File "test.py", line 1
[a, b for a, b in c]
^
SyntaxError: invalid syntax
tokenizes as follows:
$ python3.6 -m tokenize test.py
0,0-0,0: ENCODING 'utf-8'
1,0-1,1: OP '['
1,1-1,2: NAME 'a'
1,2-1,3: OP ','
1,4-1,5: NAME 'b'
1,6-1,9: NAME 'for'
1,10-1,11: NAME 'a'
1,11-1,12: OP ','
1,13-1,14: NAME 'b'
1,15-1,17: NAME 'in'
1,18-1,19: NAME 'c'
1,19-1,20: OP ']'
1,20-1,21: NEWLINE '\n'
2,0-2,0: ENDMARKER ''

There was no parser issue that motivated this restriction. Contrary to Silvio Mayolo's answer, an LL(1) parser could have parsed the no-parentheses syntax just fine. The parentheses were optional in early versions of the original list comprehension patch; they were only made mandatory to make the meaning clearer.
Quoting Guido van Rossum back in 2000, in a response to someone worried that [x, y for ...] would cause parser issues,
Don't worry. Greg Ewing had no problem expressing this in Python's
own grammar, which is about as restricted as parsers come. (It's
LL(1), which is equivalent to pure recursive descent with one
lookahead token, i.e. no backtracking.)
Here's Greg's grammar:
atom: ... | '[' [testlist [list_iter]] ']' | ...
list_iter: list_for | list_if
list_for: 'for' exprlist 'in' testlist [list_iter]
list_if: 'if' test [list_iter]
Note that before, the list syntax was '[' [testlist] ']'. Let me
explain it in different terms:
The parser parses a series comma-separated expressions. Previously,
it was expecting ']' as the sole possible token following this.
After the change, 'for' is another possible following token. This
is no problem at all for any parser that knows how to parse matching
parentheses!
If you'd rather not support [x, y for ...] because it's ambiguous
(to the human reader, not to the parser!), we can change the grammar
to something like:
'[' test [',' testlist | list_iter] ']'
(Note that | binds less than concatenation, and [...] means an
optional part.)
Also see the next response in the thread, where Greg Ewing runs
>>> seq = [1,2,3,4,5]
>>> [x, x*2 for x in seq]
[(1, 2), (2, 4), (3, 6), (4, 8), (5, 10)]
on an early version of the list comprehension patch, and it works just fine.

Related

Create a list of shared components of another list and a string?

I have two variables:
myList = [(0, 't'), (1, 'r'), (2, '_')]
newList = []
I want to create a new list, which includes tuples that have alphabet character inside. The output should be:
newList = [(0, 't'), (1, 'r')]
My initial thought is:
for thing in myList:
if thing(1) in string.ascii_lowercase: #This line doesn't work.
newList.append(thing)
I have 2 questions:
Please help me with the broken code. Can you tell me the name of the error, for a beginner, it's even hard to know the right word to search google.
Please give advice on naming things. Like in this example, how would you name thing?
You need to change:
if thing(1) in string.ascii_lowercase:
to:
if thing[1] in string.ascii_lowercase:
Also make sure you have importedstring.
You can rename thing to list_tuple or my_list_object for example. You will get good at naming eventually.
The "Pythonic" way to accomplish your goal is as follows:
import string
newList = filter(lambda x: type(x) is tuple and x[1] in string.ascii_lowercase, myList)
Explanation:
import string: importing the string module to obtain a list of all alphabet
filter(condition, iterable): an extremely useful, builtin, function of Python which allows you to filter out unwanted elements from a list (or any other iterable for that matter)
lambda x: an (usually simple) anonymous function defined at runtime which operates on a runtime variable x
type(x) is tuple and x[1] in string.ascii_lowercase: when operating on each element x in the iterable passed to filter, the lambda function first verifies that the element is indeed a tuple, and if so, checks if the first element is in the lowercased alphabet
Hope this helps

Python lambda function underscore-colon syntax explanation?

In the following Python script where "aDict" is a dictionary, what does "_: _[0]" do in the lambda function?
sorted(aDict.items(), key=lambda _: _[0])
Lets pick that apart.
1) Suppose you have a dict, di:
di={'one': 1, 'two': 2, 'three': 3}
2) Now suppose you want each of its key, value pairs:
>>> di.items()
[('three', 3), ('two', 2), ('one', 1)]
3) Now you want to sort them (since dicts are unordered):
>>> sorted(di.items())
[('one', 1), ('three', 3), ('two', 2)]
Notice that the tuples are sorted lexicographically -- by the text in the first element of the tuple. This is a equivalent to the t[0] of a series of tuples.
Suppose you wanted it sorted by the number instead. You would you use a key function:
>>> sorted(di.items(), key=lambda t: t[1])
[('one', 1), ('two', 2), ('three', 3)]
The statement you have sorted(aDict.items(), key=lambda _: _[0]) is just using _ as a variable name. It also does nothing, since aDict.items() produces tuples and if you did not use a key it sorts by the first element of the tuple anyway. The key function in your example is completely useless.
There might be a use case for the form (other than for tuples) to consider. If you had strings instead, then you would be sorting by the first character and ignoring the rest:
>>> li=['car','auto','aardvark', 'arizona']
>>> sorted(li, key=lambda c:c[0])
['auto', 'aardvark', 'arizona', 'car']
Vs:
>>> sorted(li)
['aardvark', 'arizona', 'auto', 'car']
I still would not use _ in the lambda however. The use of _ is for a throway variable that has minimal chance of side-effects. Python has namespaces that mostly makes that worry not a real worry.
Consider:
>>> c=22
>>> sorted(li, key=lambda c:c[0])
['auto', 'aardvark', 'arizona', 'car']
>>> c
22
The value of c is preserved because of the local namespace inside the lambda.
However (under Python 2.x but not Python 3.x) this can be a problem:
>>> c=22
>>> [c for c in '123']
['1', '2', '3']
>>> c
'3'
So the (light) convention became using _ for a variable either in the case of a list comprehension or a tuple expansion, etc where you worry less about trampling on one of your names. The message is: If it is named _, I don't really care about it except right here...
In Python _ (underscore) is a valid identifier and can be used as a variable name, e.g.
>>> _ = 10
>>> print(_)
10
It can therefore also be used as the name of an argument to a lambda expression - which is like an unnamed function.
In your example sorted() passes tuples produced by aDict.items() to its key function. The key function returns the first element of that tuple which sorted() then uses as the key, i.e that value to be compared with other values to determine the order.
Note that, in this case, the same result can be produced without a key function because tuples are naturally sorted according to the first element, then the second element, etc. So
sorted(aDict.items())
will produce the same result. Because dictionaries can not contain duplicate keys, the first element of each tuple is unique, so the second element is never considered when sorting.
In Python, lambda is used to create an anonymous function. The first underscore in your example is simply the argument to the lambda function. After the colon (i.e. function signature), the _[0] retrieves the first element of the variable _.
Admittedly, this can be confusing; the lambda component of your example could be re-written as lambda x: x[0] with the same result. Conventionally, though, underscore variable names in Python are used for "throwaway variables". In this case, it implies that the only thing we care about in each dictionary item is the key. Nuanced to a fault, perhaps.

Why are tuples enclosed in parentheses?

a tuple is a comma-separated list of values
so the valid syntax to declare a tuple is:
tup = 'a', 'b', 'c', 'd'
But what I often see is a declaration like this:
tup = ('a', 'b', 'c', 'd')
What is the benefit of enclosing tuples in parentheses ?
From the Python docs:
... so that nested tuples are interpreted correctly. Tuples may be
input with or without surrounding parentheses, although often
parentheses are necessary anyway (if the tuple is part of a larger
expression).
Example of nested tuples:
tuple = ('a', ('b', 'c'), 'd')
The parentheses are just parentheses - they work by changing precedence. The only exception is if nothing is enclosed (ie ()) in which case it will generate an empty tuple.
The reason one would use parentheses nevertheless is that it will result in a fairly consistent notation. You can write the empty tuple and any other tuple that way.
Another reason is that one normally want a literal to have higher precedence than other operations. For example adding two tuples would be written (1,2)+(3,4) (if you omit the parentheses here you get 1,2+3,4 which means to add 2 and 3 first then form the tuple - the result is 1,5,4). Similar situations is when you want to pass a tuple to a function f(1,2) means to send the arguments 1 and 2 while f((1,2)) means to send the tuple (1,2). Yet another is if you want to include a tuple inside a tuple ((1,2),(3,4) and (1,2,3,4) are two different things.
Those are good answers! Here's just an additional example of tuples in action (packing/unpacking):
If you do this
x, y = y, x
what's happening is:
tuple_1 = (y, x)
(x, y) = tuple_1
which is the same as:
tuple_1 = (y, x)
x = tuple_1[0]
y = tuple_1[1]
In all these cases the parenthesis don't do anything at all to the python. But they are helpful if you want to say to someone reading the script "hey! I am making a tuple here! If you didn't see the comma I'll add these parenthesis to catch your eye!"
Of course the answers about nested tuples are correct. If you want to put a tuple inside something like a tuple or list...
A = x, (x, y) # same as (x, (x, y))
B = [x, (x, y)]

Difference accessing element(s) of tuple and list

Why is there this difference accessing the element(s) of t when making it a tuple?
>>> t = [('ID','int')]
>>> for r in t:
print r
('ID', 'int')
t = (('ID','int'))
>>> for r in t:
print r
ID
int
I'd expect this to be exactly the same as the first example! Whereas populating the tuple with more than one element the behavior changes.
>>> t = (('ID','int'),('DEF','str'))
>>> for r in t:
print r
('ID', 'int')
('DEF', 'str')
>>> t = [('ID','int'),('DEF','str')]
>>> for r in t:
print r
('ID', 'int')
('DEF', 'str')
Can somebody give a short explanation? I'm running python 2.7
(('a', 'b')) is the same as ('a', 'b').
You actually want (('a', 'b'),)
This is documented here:
5.13. Expression lists
expression_list ::= expression ( "," expression )* [","]
An expression list containing at least one comma yields a tuple. The length of the tuple is the number of expressions in the list. The expressions are evaluated from left to right.
The trailing comma is required only to create a single tuple (a.k.a. a singleton); it is optional in all other cases. A single expression without a trailing comma doesn’t create a tuple, but rather yields the value of that expression. (To create an empty tuple, use an empty pair of parentheses: ().)
Remember, that without this restriction, should the expression (3) * (4) be the multiplication of two numbers, or two tuples? Most users would expect that to be the multiplication of numbers.
t = [('ID','int')]
is a tuple in a list.
t = (('ID','int'))
is a tuple with brackets around it.
t = ('ID','int'),
is a tuple in a tuple.
The , makes the tuple! The brackets around a tuple are only needed to avoid ambiguity.

Python sorting multiple attributes

I have a dictionary like the following. Key value pairs or username:name
d = {"user2":"Tom Cruise", "user1": "Tom Cruise"}
My problem is that i need to sort these by the Name, but if multiple users contain the same name like above, i then need to sort those by their username. I looked up the sorted function but i dont really understand the cmp parameter and the lambda. If someone could explain those and help me with this that would be great! Thanks :)
cmp is obsolescent. lambda just makes a function.
sorted(d.iteritems(), key=operator.itemgetter(1, 0))
I'm just going to elaborate on Ignacio Vazquez-Abrams's answer. cmp is deprecated. Don't use it. Use the key attribute instead.
lambda makes a function. It's an expression and so can go places that a normal def statement can't but it's body is limited to a single expression.
my_func = lambda x: x + 1
This defines a function that takes a single argument, x and returns x + 1. lambda x, y=1: x + y defines a function that takes an x argument, an optional y argument with a default value of 1 and returns x + y. As you can see, it's really just like a def statement except that it's an expression and limited to a single expression for the body.
The purpose of the key attribute is that sorted will call it for each element of the sequence to be sorted and use the value that it returns for comparison.
list_ = ['a', 'b', 'c']
sorted(list_, key=lambda x: 1)
Just read the rest for a hypothetical example. I didn't look at problem closely enough before writing this. It will still be educational though so I'll leave it up.
We can't really say much more because
You can't sort dicts. Do you have a list of dictss? We could sort that.
You haven't shown a username key.
I'll assume that it's something like
users = [{'name': 'Tom Cruise', 'username': user234234234, 'reputation': 1},
{'name': 'Aaron Sterling', 'username': 'aaronasterling', 'reputation': 11725}]
If you wanted to confirm that I'm more awesome than Tom Cruise, you could do:
sorted(users, key=lambda x: x['reputation'])
This just passes a function that returns the 'reputation' value for each dictionary in the list. But lambdas can be slower. Most of the time operator.itemgetter is what you want.
operator.itemgetter takes a series of keys and returns a function that takes an object and returns a tuple of the value of its argument.
so f = operator.itemgetter('name', 'username') will return essentially the same function as
lambda d: (d['name'], d['username']) The difference is that it should, in principle run much faster and you don't have to look at ugly lambda expressions.
So to sort a list of dicts by name and then username, just do
sorted(list_of_dicts, operator.itemgetter('name', 'username'))
which is exactly what Ignacio Vazquez-Abrams suggested.
You should know that dict can't be sorted. But python 2.7 & 3.1 have this class collections.OrderedDict.
So,
>>> from collections import OrderedDict
>>> d=OrderedDict({'D':'X','B':'Z','C':'X','A':'Y'})
>>> d
OrderedDict([('A', 'Y'), ('C', 'X'), ('B', 'Z'), ('D', 'X')])
>>> OrderedDict(sorted((d.items()), key=lambda t:(t[1],t[0])))
OrderedDict([('C', 'X'), ('D', 'X'), ('A', 'Y'), ('B', 'Z')])

Categories