python: list comprehension tactics - python

I'm looking to take a string and create a list of strings that build up the original string.
e.g.:
"asdf" => ["a", "as", "asd", "asdf"]
I'm sure there's a "pythonic" way to do it; I think I'm just losing my mind. What's the best way to get this done?

One possibility:
>>> st = 'asdf'
>>> [st[:n+1] for n in range(len(st))]
['a', 'as', 'asd', 'asdf']

If you're going to be looping over the elements of your "list", you may be better off using a generator rather than list comprehension:
>>> text = "I'm a little teapot."
>>> textgen = (text[:i + 1] for i in xrange(len(text)))
>>> textgen
<generator object <genexpr> at 0x0119BDA0>
>>> for item in textgen:
... if re.search("t$", item):
... print item
I'm a lit
I'm a litt
I'm a little t
I'm a little teapot
>>>
This code never creates a list object, nor does it ever (delta garbage collection) create more than one extra string (in addition to text).

Related

What is a better pythonic version of this conditional deleting?

i am refreshing my python (2.7) and i am discovering iterators and generators.
As i understood, they are an efficient way of navigating over values without consuming too much memory.
So the following code do some kind of logical indexing on a list:
removing the values of a list L that triggers a False conditional statement represented here by the function f.
I am not satisfied with my code because I feel this code is not optimal for three reasons:
I read somewhere that it is better to use a for loop than a while loop.
However, in the usual for i in range(10), i can't modify the value of 'i' because it seems that the iteration doesn't care.
Logical indexing is pretty strong in matrix-oriented languages, and there should be a way to do the same in python (by hand granted, but maybe better than my code).
Third reason is just that i want to use generator/iterator on this example to help me understand.
Third reason is just that i want to use generator/iterator on this example to help me understand.
TL;DR : Is this code a good pythonic way to do logical indexing ?
#f string -> bool
def f(s):
return 'c' in s
L=['','a','ab','abc','abcd','abcde','abde'] #example
length=len(L)
i=0
while i < length:
if not f(L[i]): #f is a conditional statement (input string output bool)
del L[i]
length-=1 #cut and push leftwise
else:
i+=1
print 'Updated list is :', L
print length
This code has a few problems, but the main one is that you must never modify a list you're iterating over. Rather, you create a new list from the elements that match your condition. This can be done simply in a for loop:
newlist = []
for item in L:
if f(item):
newlist.append(item)
which can be shortened to a simple list comprehension:
newlist = [item for item in L if f(item)]
It looks like filter() is what you're after:
newlist = filter(lambda x: not f(x), L)
filter() filters (...) an iterable and only keeps the items for which a predicate returns True. In your case f(..) is not quite the predicate but not f(...).
Simpler:
def f(s):
return 'c' not in s
newlist = filter(f, L)
See: https://docs.python.org/2/library/functions.html#filter
Never modify a list with del, pop or other methods that mutate the length of the list while iterating over it. Read this for more information.
The "pythonic" way to filter a list is to use reassignment and either a list comprehension or the built-in filter function:
List comprehension:
>>> [item for item in L if f(item)]
['abc', 'abcd', 'abcde']
i want to use generator/iterator on this example to help me understand
The for item in L part is implicitly making use of the iterator protocol. Python lists are iterable, and iter(somelist) returns an iterator .
>>> from collections import Iterable, Iterator
>>> isinstance([], Iterable)
True
>>> isinstance([], Iterator)
False
>>> isinstance(iter([]), Iterator)
True
__iter__ is not only being called when using a traditional for-loop, but also when you use a list comprehension:
>>> class mylist(list):
... def __iter__(self):
... print('iter has been called')
... return super(mylist, self).__iter__()
...
>>> m = mylist([1,2,3])
>>> [x for x in m]
iter has been called
[1, 2, 3]
Filtering:
>>> filter(f, L)
['abc', 'abcd', 'abcde']
In Python3, use list(filter(f, L)) to get a list.
Of course, to filter a list, Python needs to iterate over it, too:
>>> filter(None, mylist())
iter has been called
[]
"The python way" to do it would be to use a generator expression:
# list comprehension
L = [l for l in L if f(l)]
# alternative generator comprehension
L = (l for l in L if f(l))
It depends on your context if a list or a generator is "better" (see e.g. this so question). Because your source data is coming from a list, there is no real benefit of using a generator here.
For simply deleting elements, especially if the original list is no longer needed, just iterate backwards:
Python 2.x:
for i in xrange(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
Python 3.x:
for i in range(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
By iterating from the end, the "next" index does not change after deletion and a for loop is possible. Note that you should use the xrange generator in Python 2, or the range generator in Python 3, to save memory*.
In cases where you must iterate forward, use your given solution above.
*Note that Python 2's xrange will break if there are >= 2 ** 32 - 1 elements. Python 3's range, as well as the less efficient Python 2's range do not have this limitation.

What is the difference between list(a) and [a]?

I noticed a strange difference between two list constructors that I believed to be equivalent.
Here is a small example:
hello = 'Hello World'
first = list(hello)
second = [hello]
print(first)
print(second)
This code will produce the following output:
['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']
['Hello World']
So, the difference is quite clear between the two constructors... And, I guess that this could be generalized to other constructors as well, but I fail to understand the logic behind it.
Can somebody cast its lights upon my interrogations?
The list() constructor function takes exactly one argument, which must be an iterable. It returns a new list with each element being an element from the given iterable. Since strings are iterable (by character), a list with individual characters is returned.
[] takes as many "arguments" as you like, each being a single element in the list; the items are not "evaluated" or iterated, they are taken as is.
Everything as documented.
The first just transform the list "Hello world" (an character array) into a list
first = list(hello)
The second create a list with element inside brackets.
first = [hello]
In the second case for example you could also do:
first = [hello, 'hi', 'world']
and as output of the print you will get
['Hello World', 'hi', 'world']
your "first" uses the list method, which takes in hello and treats it as an iterable, converting it to a list. Which is why each chararcter is seperate.
your "second" creates a new list, using the string as its value
You are assuming that list(hello) should create a list containing one element, the object referred to by hello. That's not true; by that logic you would expect list(5) to return [5]. list takes a single iterable argument (a list, a tuple, a string, a dict, etc) and returns a list whose elements are taken from the given iterable.
The bracket notation, however, is not limited to containing a single item. Each comma-separated object is treated as a distinct element for the new list.
The most important distinction of these 2 behaviours comes when you work with generators. Given that Python 3 transformed things like map and zip into generators ...
If we assume map returns generators:
a = list(map(lambda x: str(x), [1, 2, 3]))
print(a)
The result is:
['1', '2', '3']
But if we do:
a = [map(lambda x: str(x), [1, 2, 3])]
print(a)
The result is:
[<map object at 0x00000209231CB2E8>]
It is obvious that the 2nd case is in most situations undesirable and not expected.
P.S.
If you are in Python 2, then do at the beginning: from itertools import imap as map
first = list(hello)
converts a string into a list.
second = [hello]
this places an item into a new list. it is not a constructor

When to drop list Comprehension and the Pythonic way?

I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr
Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.
I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?
You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]

when a python list iteration is and is not a reference

Could someone please offer a concise explanation for the difference between these two Python operations in terms of modifying the list?
demo = ["a", "b", "c"]
for d in demo:
d = ""
print demo
#output: ['a', 'b', 'c']
for c in range(len(demo)):
demo[c] = ""
print demo
#output: ['', '', '']
In other words, why doesn't the first iteration modify the list? Thanks!
The loop variable d is always a reference to an element of the iterable object. The question is not really a matter of when or when isn't it a reference. It is about the assignment operation that you are performing with the loop.
In the first example, you are rebinding the original reference of an element in the object, with another reference to an empty string. This means you don't actually do anything to the value. You just assign a new reference to the symbol.
In the second example, you are performing an indexing operation and assigning a new reference to the value at that index. demo remains the same reference, and you are replacing a value in the container.
The assignment is really the equivalent of: demo.__setitem__(c, "")
a = 'foo'
id(a) # 4313267976
a = 'bar'
id(a) # 4313268016
l = ['foo']
id(l) # 4328132552
l[0] = 'bar'
id(l) # 4328132552
Notice how in the first example, the object id has changed. It is a reference to a new object. In the second one, we index into the list and replace a value in the container, yet the list remains the same object.
In the first example, the variable d can be thought of a copy of the elements inside the list. When doing d = "", you're essentially modifying a copy of whatever's inside the list, which naturally won't change the list.
In the second example, by doing range(len(demo)) and indexing the elements inside the list, you're able to directly access and change the elements inside the list. Therefore, doing demo[c] would modify the list.
If you do want to directly modify a Python list from inside a loop, you could either make a copy out the list and operate on that, or, preferably, use a list comprehension.
So:
>>> demo = ["a", "b", "c"]
>>> test = ["" for item in demo]
>>> print test
["", "", ""]
>>> demo2 = [1, 5, 2, 4]
>>> test = [item for item in demo if item > 3]
>>> print test
[5, 4]
When you do d = <something> you are making the variable d refer to <something>. This way you can use d as if it was <something>. However, if you do d = <something else>, d now points to <something else> and no longer <something> (the = sign is used as the assignment operator). In the case of demo[c] = <something else>, you are assigning <something else> to the (c+1)th item in the list.
One thing to note, however, is that if the item d has self-modifying methods which you want to call, you can do
for d in demo:
d.<some method>()
since the list demo contains those objects (or references to the objects, I don't remember), and thus if those objects are modified, the list is modified too.

Add number to set

What am I doing wrong here?
a = set().add(1)
print a # Prints `None`
I'm trying to add the number 1 to the empty set.
It is a convention in Python that methods that mutate sequences return None.
Consider:
>>> a_list = [3, 2, 1]
>>> print a_list.sort()
None
>>> a_list
[1, 2, 3]
>>> a_dict = {}
>>> print a_dict.__setitem__('a', 1)
None
>>> a_dict
{'a': 1}
>>> a_set = set()
>>> print a_set.add(1)
None
>>> a_set
set([1])
Some may consider this convention "a horrible misdesign in Python", but the Design and History FAQ gives the reasoning behind this design decision (with respect to lists):
Why doesn’t list.sort() return the sorted list?
In situations where performance matters, making a copy of the list
just to sort it would be wasteful. Therefore, list.sort() sorts the
list in place. In order to remind you of that fact, it does not return
the sorted list. This way, you won’t be fooled into accidentally
overwriting a list when you need a sorted copy but also need to keep
the unsorted version around.
In Python 2.4 a new built-in function – sorted() – has been added.
This function creates a new list from a provided iterable, sorts it
and returns it.
Your particular problems with this feature come from a misunderstanding of good ways to create a set rather than a language misdesign. As Lattyware points out, in Python versions 2.7 and later you can use a set literal a = {1} or do a = set([1]) as per Sven Marnach's answer.
Parenthetically, I like Ruby's convention of placing an exclamation point after methods that mutate objects, but I find Python's approach acceptable.
The add() method adds an element to the set, but it does not return the set again -- it returns None.
a = set()
a.add(1)
or better
a = set([1])
would work.
Because add() is modifing your set in place returning None:
>>> empty = set()
>>> print(empty.add(1))
None
>>> empty
set([1])
Another way to do it that is relatively simple would be:
a = set()
a = set() | {1}
this creates a union between your set a and a set with 1 as the element
print(a) yields {1} then because a would now have all elements of both a and {1}
You should do this:
a = set()
a.add(1)
print a
Notice that you're assigning to a the result of adding 1, and the add operation, as defined in Python, returns None - and that's what is getting assigned to a in your code.
Alternatively, you can do this for initializing a set:
a = set([1, 2, 3])
The add method updates the set, but returns None.
a = set()
a.add(1)
print a
You are assigning the value returned by set().add(1) to a. This value is None, as add() does not return any value, it instead acts in-place on the list.
What you wanted to do was this:
a = set()
a.add(1)
print(a)
Of course, this example is trivial, but Python does support set literals, so if you really wanted to do this, it's better to do:
a = {1}
print(a)
The curly brackets denote a set (although be warned, {} denotes an empty dict, not an empty set due to the fact that curly brackets are used for both dicts and sets (dicts are separated by the use of the colon to separate keys and values.)
Alternatively to a = set() | {1} consider "in-place" operator:
a = set()
a |= {1}

Categories