Behaviour of Python when attempting to iterate over characters in a string

Behaviour of Python when attempting to iterate over characters in a string - python

I am attempting to iterate over the following string, using a for loop:
>>> for a,b,c in "cat"
print(a,b,c)
Now what I intended for this to do was print out each character in the string individually on one physical line, instead I receive an error. I am aware that this is very easily resolved by enclosing the string in the list operator []:
>>> for a,b,c in ["cat"]
print(a,b,c)
c a t
But could someone explain why this is the case?

You are telling for to expand each iteration value to assign to three separate variables:
for a,b,c in "cat":
# ^^^^^ the target for the loop variable, 3 different names
However, iteration over a string produces a string with a single character, you can't assign a single character to three variables:
>>> loopiterable = 'cat'
>>> loopiterable[0] # first element
'c'
>>> a, b, c = loopiterable[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 3, got 1)
The error message tells you why this didn't work; you can't take three values out of a string of length 1.
When you put the string into a list, you changed what you loop over. You now have a list with one element, so the loop iterates just once, and the value for the single iteration is the string 'cat'. That string just happens to have 3 characters, so can be assigned to three variables:
>>> loopiterable = ['cat']
>>> loopiterable[0] # first element
'cat'
>>> a, b, c = loopiterable[0]
>>> a
'c'
>>> b
'a'
>>> c
't'
This still would fail if the contained string has a different number of characters:
>>> for a, b, c in ['cat', 'hamster']:
... print(a, b, c)
...
c a t
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 3)
'hamster' is 7 characters, not 3; that's 4 too many.
The correct solution is to just use one variable for the loop target, not 3:
for character in 'cat':
print(character)
Now you are printing each character separately:
>>> for character in 'cat':
... print(character)
...
c
a
t
Now, if you wanted to pass all characters of a string to print() as separate arguments, just use * to expand the string to separate arguments to the call:
>>> my_pet = 'cat'
>>> print(*my_pet)
c a t

The for loop goes through each element in the iterable you supply and tries to perform an assignment.
In the first case, since your iterable is 'cat' you're essentially unpacking:
a, b, c = 'c'
during the first iteration and getting the appropriate error message:
ValueError: not enough values to unpack (expected 3, got 1)
because you'll iterate 3 times; once for each character in the string 'cat'.
In the second case, you're unpacking "cat" as expected, because the list has a single element (i.e 'cat') which is retrieved and unpacked into a, b and c.

You are iterating different objects in two blocks of code.
In the former one, you iterate the string, which may be considered a special list.
While in the latter one, you iterate the list which contains only one object 'cat'.
You may change the first one to below:
for ch in 'cat':
print(ch)
Feel free to ask more about this.

Related

dictionary conversion from reading files in python

What is the difference between line number 5 and line number 6?
line number 5 throws error but line number 6 works. I am unable to understand the difference.
with open("file1.txt") as f:
lines = f.readlines()
for line in lines:
print(line)
print(dict(line.split('=', 1))) # Line 5 throwing ValueError: dictionary update sequence element #0 has length 1; 2 is required
print(dict(line.split('=', 1) for line in lines)) # Line 6 works fine

Suppose that we have values like the following for lines:
lines = ["foo=bar", "baz=ola"]
In this code:
print(dict(line.split('=', 1) for line in lines))
the argument to dict is an iterable of lists of two elements, similar to:
>>> dict([["foo", "bar"], ["baz", "ola"]])
{'foo': 'bar', 'baz': 'ola'}
In this code:
for line in lines:
print(dict(line.split('=', 1)))
we call dict() multiple times, each time with a list of two elements, similar to:
>>> dict(["foo", "bar"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #0 has length 3; 2 is required
"Dictionary update sequence element #0" is "foo"; each "sequence element" is a key, value pair, so the error is happening because dict() is trying to interpret "foo" as if it were a pair. Since your error message indicates "length 1", I'm guessing your file has a one-character name in it that is similarly being interpreted as a pair.
If we used two-character strings, it would "work", but it would work by using the first character as the key and the second character as the value:
>>> line = "ab=xy"
>>> dict(line.split("="))
{'a': 'b', 'x': 'y'}
The key takeaway is that the argument to dict must be all the key, value pairs, not a single key, value pair -- i.e. it must be a generator, list, or other iterable that contains other iterables with exactly two elements each. Even if you only have a single pair, you must put that pair inside an iterable (e.g. a list), because dict() will always treat its argument as an iterable of pairs.

Here is the answer:
If no positional argument is given, an empty dictionary is created. If
a positional argument is given and it is a mapping object, a
dictionary is created with the same key-value pairs as the mapping
object. Otherwise, the positional argument must be an iterable object.
Each item in the iterable must itself be an iterable with exactly two
objects. The first object of each item becomes a key in the new
dictionary, and the second object the corresponding value.
In line 6 you have an iterator whose items are an iterable with two objects. Line 5 is just a list with two items.

For a file like:
x=1
y=2
z=3
Use something like the following:
with open('f.txt') as f:
lines = f.read().splitlines() # strips newlines
new_dict = dict([line.split('=') for line in lines])
The list comprehension inside the call to dict produces a list where each element of the list are the split components of the lines. In this case:
[['x', '1'], ['y', '2'], ['z', '3']]
The dict function consumes iterables like this (with elements consisting of 2 elements) as key-value pairs.
If you inspect new_dict you'd see:
print(new_dict)
{'x': '1', 'y': '2', 'z': '3'}

Shortcut to turn a ONE element dict into ONE tuple in Python

There are dozens of questions on turning a Python dict with some number of elements into a list of tuples. I am looking for a shortcut to turn a one element dict into a tuple.
Ideally, it would:
Be idiomatic and simple (ie, not a subclass of dict, not a function, etc).
Throw an error if there are more that one element in the dict.
Support multiple assignment by tuple unpacking.
Not destroy the dict.
I am looking for a shortcut to do this (that is not destructive to the dict):
k,v=unpack_this({'unknown_key':'some value'})
* does not work here.
I have come up with these that work:
k,v=next(iter(di.items())) # have to call 'iter' since 'dict_items' is not
Or:
k,v=(next(((k,v) for k,v in di.items())))
Or even:
k,v=next(zip(di.keys(), di.values()))
Finally, the best I can come up with:
k,v=list(di.items())[0] # probably the best...
Which can be wrapped into a function if I want a length check:
def f(di):
if (len(di)==1): return list(di.items())[0]
raise ValueError(f'Too many items to unpack. Expected 2, got {len(di)*2}')
These methods seem super clumsy and none throw an error if there is more than one element.
Is there an idiomatic shortcut that I am missing?

Why not :
next(iter(d.items())) if len(d)==1 else (None,None)

>>> d = {'a': 1}
>>> d.popitem()
('a', 1)
This will return the pair associated with the last key added, with no error error if the dict has multiple keys, but the same length check you've made in your function f can be used.
def f(di):
if len(di) == 1:
return d.popitem()
raise ValueError(f'Dict has multiple keys')

#chepner has the right approach with .popitem():
>>> d = {'a': 1}
>>> d.popitem()
('a', 1) # d is now {}
This will return the pair associated with the last key added, with no error error if the dict has multiple keys. It will also destroy d.
You can keep d intact by using this idiom to create a new dict and pop off the last item from the new dict created with {**d}:
>>> {**d}.popitem()
('a', 1)
>>> d
{'a': 1}
As far as a 1 line unpackable tuple that throws an error if the dict has more than one item, you can use a Python conditional expression with an alternate that throws the error desired:
# success: k=='a' and v==1
>>> d={'a':1}
>>> k,v={**d}.popitem() if len(d)==1 else 'wrong number of values'
# len(d)!=1 - pass a string to k,v which is too long
>>> d={'a':1,'b':2}
>>> k,v={**d}.popitem() if len(d)==1 else 'wrong number of values'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
If you object to different types being passed, you could do:
>>> k,v={**d}.popitem() if len(d)==1 else {}.fromkeys('too many keys').items()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)

What does comma expression in for loop means in Python? [duplicate]

This question already has answers here:
What is the purpose of the single underscore "_" variable in Python?
(5 answers)
Closed 8 months ago.
Reading through Peter Norvig's Solving Every Sudoku Puzzle essay, I've encountered a few Python idioms that I've never seen before.
I'm aware that a function can return a tuple/list of values, in which case you can assign multiple variables to the results, such as
def f():
return 1,2
a, b = f()
But what is the meaning of each of the following?
d2, = values[s] ## values[s] is a string and at this point len(values[s]) is 1
If len(values[s]) == 1, then how is this statement different than d2 = values[s]?
Another question about using an underscore in the assignment here:
_,s = min((len(values[s]), s) for s in squares if len(values[s]) > 1)
Does the underscore have the effect of basically discarding the first value returned in the list?

d2, = values[s] is just like a,b=f(), except for unpacking 1 element tuples.
>>> T=(1,)
>>> a=T
>>> a
(1,)
>>> b,=T
>>> b
1
>>>
a is tuple, b is an integer.

_ is like any other variable name but usually it means "I don't care about this variable".
The second question: it is "value unpacking". When a function returns a tuple, you can unpack its elements.
>>> x=("v1", "v2")
>>> a,b = x
>>> print a,b
v1 v2

The _ in the Python shell also refers to the value of the last operation. Hence
>>> 1
1
>>> _
1
The commas refer to tuple unpacking. What happens is that the return value is a tuple, and so it is unpacked into the variables separated by commas, in the order of the tuple's elements.

You can use the trailing comma in a tuple like this:
>>> (2,)*2
(2, 2)
>>> (2)*2
4

Use split for specific elements of multidimensional list

I have the following list (the actual file is much larger and complex)
a = [[['3x5'], ['ff']], [['4x10'], ['gg']]]
I would like to use the split functionality for the first element in the list and get the value in which appears after "x". The final results should be 5 and 10 in this case. I tried to use split in this format
for line in a:
print str(line[0]).split("x")[1]
but the output is
5']
10']
I know I can easily manipulate the output to get 5 and 10 but what is the correct way of using split in this case?
And I am interested in using split for specific element of a list (first elements in this case).

You need to dive one level deeper, and dont use str() on the list.
>>> a = [[['3x5'], ['ff']], [['4x10'], ['gg']]]
>>> for y in a:
... if 'x' in y[0][0]:
... print y[0][0].split('x')[-1]
5
10

You shouldn't the list to a string object, however, you can do it use:
>>> [i[0][0].split('x')[1] for i in a]
['5', '10']
I think you also want to convert the output to int object, then you can simply add an int() like below:
>>> [int(i[0][0].split('x')[1]) for i in a]
[5, 10]
However, if you don't need save the output into a list, but print it out instead, you can just use the same code, but write another version:
a = [[['3x5'], ['ff']], [['4x10'], ['gg']]]
for i in a:
print(i[0][0].split('x')[1])
Output:
5
10
Remember that my code will failed (raise IndexError: list index out of range) when a is... For example [[['3x5'], ['ff']], [['kk'], ['gg']]] (the first element in one of the sublists isn't in format like '3x5').
However, a simple if can fix this:
>>> a = [[['3x5'], ['ff']], [['kk'], ['gg']]]
>>> [int(i[0][0].split('x')[1]) for i in a]
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 1, in <listcomp>
IndexError: list index out of range
>>> [int(i[0][0].split('x')[1]) for i in a if 'x' in i[0][0]]
[5]
Or even better, use RegEx to check, which can avoid something like a = [[['3x5'], ['ff']], [['xxxxxxx'], ['gg']]]:
>>> import re
>>> a = [[['3x5'], ['ff']], [['xxxxxxx'], ['gg']]]
>>> [int(i[0][0].split('x')[1]) for i in a if re.search(r'\d+x\d+', i[0][0])]
[5]
Another way, if you don't want import re:
>>> [int(i[0][0].split('x')[1]) for i in a
... if all(j.isdigit() for j in i[0][0].split('x'))]
[5]

What does [u'abcd', u'bcde'] mean in Python?

Used a loop to add a bunch of elements to a list with
mylist = []
for x in otherlist:
mylist.append(x[0:5])
But instead of the expected result ['x1','x2',...], I got: [u'x1', u'x2',...]. Where did the u's come from and why? Also is there a better way to loop through the other list, inserting the first six characters of each element into a new list?

The u means unicode, you probably will not need to worry about it
mylist.extend(x[:5] for x in otherlist)

The u means unicode. It's Python's internal string representation (from version ... ?).
Most times you don't need to worry about it. (Until you do.)

The answers above me already answered the "u" part - that the string is encoded in Unicode. About whether there's a better way to extract the first 6 letters from the items in a list:
>>> a = ["abcdefgh", "012345678"]
>>> b = map(lambda n: n[0:5], a);
>>> for x in b:
print(x)
abcde
01234
So, map applies a function (lambda n: n[0:5]) to each element of a and returns a new list with the results of the function for every element. More precisely, in Python 3, it returns an iterator, so the function gets called only as many times as needed (i.e. if your list has 5000 items, but you only pull 10 from the result b, lambda n: n[0:5] gets called only 10 times). In Python2, you need to use itertools.imap instead.
>>> a = [1, 2, 3]
>>> def plusone(x):
print("called with {}".format(x))
return x + 1
>>> b = map(plusone, a)
>>> print("first item: {}".format(b.__next__()))
called with 1
first item: 2
Of course, you can apply the function "eagerly" to every element by calling list(b), which will give you a normal list with the function applied to each element on creation.
>>> b = map(plusone, a)
>>> list(b)
called with 1
called with 2
called with 3
[2, 3, 4]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Behaviour of Python when attempting to iterate over characters in a string - python

Related

dictionary conversion from reading files in python

Shortcut to turn a ONE element dict into ONE tuple in Python

What does comma expression in for loop means in Python? [duplicate]

Use split for specific elements of multidimensional list

What does [u'abcd', u'bcde'] mean in Python?

Categories

Resources