Consider:
fooList = [1, 2, 3, 4] # Ints for example only, in real application using objects
for foo in fooList:
if fooChecker(foo):
remove_this_foo_from_list
How is the specific foo to be removed from the list? Note that I'm using ints for example only, in the real application there is a list of arbitrary objects.
Thanks.
Generally, you just don't want to do this. Instead, construct a new list instead. Most of the time, this is done with a list comprehension:
fooListFiltered = [foo for foo in fooList if not fooChecker(foo)]
Alternatively, a generator expression (my video linked above covers generator expressions as well as list comprehensions) or filter() (note that in 2.x, filter() is not lazy - use a generator expression or itertools.ifilter() instead) might be more appropriate (for example, a large file that is too big to be read into memory wouldn't work this way, but would with a generator expression).
If you need to actually modify the list (rare, but can be the case on occasion), then you can assign back:
fooList[:] = fooListFiltered
Iterate over a shallow copy of the list.
As you can't modify a list while iterating over so you need to iterate over a shallow copy of the list.
fooList = [1, 2, 3, 4]
for foo in fooList[:]: #equivalent to list(fooList), but much faster
if fooChecker(foo):
fooList.remove(foo)
Use filter:
newList = list(filter(fooChecker, fooList))
or
newItems = filter(fooChecker, fooList))
for item in newItems:
print item # or print(item) for python 3.x
http://docs.python.org/2/library/functions.html#filter
Related
I'm trying to figure out what is the pythonic way to unpack an iterator inside of a list.
For example:
my_iterator = zip([1, 2, 3, 4], [1, 2, 3, 4])
I have come with the following ways to unpack my iterator inside of a list:
1)
my_list = [*my_iterator]
2)
my_list = [e for e in my_iterator]
3)
my_list = list(my_iterator)
No 1) is my favorite way to do it since is less code, but I'm wondering if this is also the pythonic way. Or maybe there is another way to achieve this besides those 3 which is the pythonic way?
This might be a repeat of Fastest way to convert an iterator to a list, but your question is a bit different since you ask which is the most Pythonic. The accepted answer is list(my_iterator) over [e for e in my_iterator] because the prior runs in C under the hood. One commenter suggests [*my_iterator] is faster than list(my_iterator), so you might want to test that. My general vote is that they are all equally Pythonic, so I'd go with the faster of the two for your use case. It's also possible that the older answer is out of date.
After exploring more the subject I've come with some conclusions.
There should be one-- and preferably only one --obvious way to do it
(zen of python)
Deciding which option is the "pythonic" one should take into consideration some criteria :
how explicit,
simple,
and readable it is.
And the obvious "pythonic" option winning in all criteria is option number 3):
list = list(my_iterator)
Here is why is "obvious" that no 3) is the pythonic one:
Option 3) is close to natural language making you to 'instantly'
think what is the output.
Option 2) (using list comprehension) if you see for the first time
that line of code will take you to read a little bit more and to pay
a bit more attention. For example, I use list comprehension when I
want to add some extra steps(calling a function with the iterated
elements or having some checking using if statement), so when I see a
list comprehension I check for any possible function call inside or
for any if statment.
option 1) (unpacking using *) asterisk operator can be a bit confusing
if you don't use it regularly, there are 4 cases for using the
asterisk in Python:
For multiplication and power operations.
For repeatedly extending the list-type containers.
For using the variadic arguments. (so-called “packing”)
For unpacking the containers.
Another good argument is python docs themselves, I have done some statistics to check which options are chosen by the docs, for this I've chose 4 buil-in iterators and everything from the module itertools (that are used like: itertools.) to see how they are unpacked in a list:
map
range
filter
enumerate
itertools.
After exploring the docs I found: 0 iterators unpacked in a list using option 1) and 2) and 35 using option 3).
Conclusion :
The pythonic way to unpack an iterator inside of a list is: my_list = list(my_iterator)
While the unpacking operator * is not often used for unpacking a single iterable into a list (therefore [*it] is a bit less readable than list(it)), it is handy and more Pythonic in several other cases:
1. Unpacking an iterable into a single list / tuple / set, adding other values:
mixed_list = [a, *it, b]
This is more concise and efficient than
mixed_list = [a]
mixed_list.extend(it)
mixed_list.append(b)
2. Unpacking multiple iterables + values into a list / tuple / set
mixed_list = [*it1, *it2, a, b, ... ]
This is similar to the first case.
3. Unpacking an iterable into a list, excluding elements
first, *rest = it
This extracts the first element of it into first and unpacks the rest into a list. One can even do
_, *mid, last = it
This dumps the first element of it into a don't-care variable _, saves last element into last, and unpacks the rest into a list mid.
4. Nested unpacking of multiple levels of an iterable in one statement
it = (0, range(5), 3)
a1, (*a2,), a3 = it # Unpack the second element of it into a list a2
e1, (first, *rest), e3 = it # Separate the first element from the rest while unpacking it[1]
This can also be used in for statements:
from itertools import groupby
s = "Axyz123Bcba345D"
for k, (first, *rest) in groupby(s, key=str.isalpha):
...
If you're interested in the least amount of typing possible, you can actually do one character better than my_list = [*my_iterator] with iterable unpacking:
*my_list, = my_iterator
or (although this only equals my_list = [*my_iterator] in the number of characters):
[*my_list] = my_iterator
(Funny how it has the same effect as my_list = [*my_iterator].)
For the most Pythonic solution, however, my_list = list(my_iterator) is clearly the clearest and the most readable of all, and should therefore be considered the most Pythonic.
I tend to use zip if I need to convert a list to a dictionary or use it as a key-value pair in a loop or list comprehension.
However, if this is only for illustration to create an iterator. I will definitely vote for #3 for clarity.
So I am running into something I cant explain and was hoping someone could shed light on... Here is my code:
fd = open(inFile, 'r')
contents = fd1.readlines()
fd.close()
contentsOrig = contents
contents[3] = re.sub(replaceRegex, thingToReplaceWith, contentsOrig[3])
Now when I print out contents and contentsOrig they are exactly the same. I was trying to preserve what I originally read in but from this little code it doesn't seem to be working. Can anyone enlighten me?
I am running Python 2.7.7
Yes, when you assign a list to another variable, it's not a copy of that list, it is a reference. Meaning there is still one copy of that list and now both variables point to it.
contentsOrig = contents
contentsOrig is contents
# Result: True
When you change one of the values or modify the list in any it is changing the same list. So what you need to do is make a copy of the list. This is done by either of these ways:
contentsOrig = contents[:]
or
contentsOrig = list(contents)
The first way is using the list slicing to produce a new list from the beginning to the end. The second takes a list and returns a new copy of the list.
Note that both ways do not make new copies of the items inside the list. So it's the same items, but different containers. But since these are strings, they are not mutable and therefore if modified inside a list, the original string will be untouched in the the original list.
In Python, mutable objects cannot be copied the way you are copying them. If effect, doing x = y only sets x and y to the same address in memory - they reference the same object (in this case a list). If you use the ID function id(x) and id(y) they will actually be the same!
Edit: note that the poster below showed x is y; the is keyword is using the id function in the background.
A simpler example is here:
list_one = [1, 2, 3]
list_two = list_one
list_one.append(4)
print list_one #will show [1, 2, 3, 4], as expected
print list_two #will ALSO show [1, 2, 3, 4]!
In this case, to get around this problem you can use list splicing to make the new list:
new_list = original_list[:]
If you have multiple nested lists (or other mutable objects), then you can use the deepcopy module to make a copy, if you want all the nested lists to be copied as well.
I was wondering if some kind soul could explain why I have to convert a list that is already of type list before calling enumerate on it.
>>> l = ['russell', 'bird', 'thomas']
>>> print(type(l))
Type is: `class 'list'`.
If I enumerate over this list without explicitly declaring it as such, this is the output:
>>> print(enumerate(l))
enumerate object at 0x00B1A0F8
When explicitly declaring this list as a list, the output is as expected:
>>> print(list(enumerate(l)))
[(0, 'russell'), (1, 'bird'), (2, 'thomas')]
If someone could help explain why this is the case, it'd be greatly appreciated.
You have two different objects here. Let's say you define a function:
def my_func(my_list):
return 4
If you did my_func(l), of course it wouldn't return a list. Well, enumerate() is the same way. It returns an enumerate object, not a list. That object is iterable, however, so it can be converted to a list. For example:
def my_enumerate(my_list):
index = 0
for item in my_list:
yield index, item
index += 1
That is pretty much how enumerate works. Our function is a generator function and returns a generator object. Since generators are iterable, you can convert it to a list:
print(list(my_enumerate(l))
The Python docs tend to be pretty careful about saying what a function returns, and the docs for enumerate() say that it returns an enumerate object, not a list. (Note that this was already the case in Python 2.)
While it may seem like kind of a pain to have to create a fully-formed list explicitly, like
print(list(enumerate(mylist)))
the advantage of not returning a list is that it can be quicker and use less memory. It is expected that in real-world use, you will generally not need the whole list at once, and will instead be looping over the elements one at a time, such as
for i, elem in enumerate(mylist):
print(i, '->', elem)
You might even not use all of the elements:
mylist = ['russell', 'bird', 'thomas']
for i, elem in enumerate(mylist):
if elem == 'bird':
break
print((i, elem))
So enumerate() gives you the efficiency of generating items only as needed instead of all at once. (Imagine if you were working with a list of every NBA player in history, not just the three in your example.) And on the rare occasions that you really do need all the items at once, it is not that annoying to have to type list(enumerate(mylist)).
I checked on this link that set is mutable https://docs.python.org/3/library/stdtypes.html#frozenset while frozenset is immutable and hence hashable. So how is the set implemented in python and what is the element look up time? Actually I had a list of tuples [(1,2),(3,4),(2,1)] where each entry in the tuple is a id and I wanted to create a set/frozenset out of the this list. In this case the set should contain (1,2,3,4) as elements. Can I use frozenset to insert elements into it one by one from the list of tuples or I can only use a set?
You can instantiate a frozenset from a generator expression or other iterable. It's not immutable until it's finished being instantiated.
>>> L = [(1,2),(3,4),(2,1)]
>>> from itertools import chain
>>> frozenset(chain.from_iterable(L))
frozenset([1, 2, 3, 4])
Python3.3 also has an optimisation that turns set literals such as {1, 2, 3, 4} into precomputed frozensets when used as the right-hand side of an in operator.
Sets and frozensets are implemented the same way, as hashtables. (Why else would they require their elements to implement __hash__?) In fact, if you look at Objects/setobject.c, they share almost all their code. This means that as long as hash collisions don't get out of hand, lookup and deletion are O(1) and insertion is amortized O(1).
The usual way to create a frozenset is to initialize it with some other iterable. As gnibbler suggested, the best fit here would probably be itertools.chain.from_iterable:
>>> L = [(1,2),(3,4),(2,1)]
>>> from itertools import chain
>>> frozenset(chain.from_iterable(L))
frozenset([1, 2, 3, 4])
As for your first question, I haven't actually checked the source, but it seems safe to assume from the fact that sets need to contain objects of hashable types, that it is implemented using a hash table, and that its lookup time is, therefore, O(1).
As for your second question, you cannot insert the elements into a frozenset one by one (obviously, since it's immutable), but there's no reason to use a set instead; just construct it from a list (or other iterable) of the constituent values, e.g. like this:
data = [(1, 2), (3, 4), (2, 1)]
result = frozenset(reduce(list.__add__, [list(x) for x in data], []))
This question already has answers here:
Why do these list operations (methods: clear / extend / reverse / append / sort / remove) return None, rather than the resulting list?
(6 answers)
Closed 2 years ago.
I am attempting to sort a Python list of ints and then use the .pop() function to return the highest one. I have tried a writing the method in different ways:
def LongestPath(T):
paths = [Ancestors(T,x) for x in OrdLeaves(T)]
#^ Creating a lists of lists of ints, this part works
result =[len(y) for y in paths ]
#^ Creating a list of ints where each int is a length of the a list in paths
result = result.sort()
#^meant to sort the result
return result.pop()
#^meant to return the largest int in the list (the last one)
I have also tried
def LongestPath(T):
return[len(y) for y in [Ancestors(T,x) for x in OrdLeaves(T)] ].sort().pop()
In both cases .sort() causes the list to be None (which has no .pop() function and returns an error). When I remove the .sort() it works fine but does not return the largest int since the list is not sorted.
Simply remove the assignment from
result = result.sort()
leaving just
result.sort()
The sort method works in-place (it modifies the existing list), so it returns None. When you assign its result to the name of the list, you're assigning None. So no assignment is necessary.
But in any case, what you're trying to accomplish can easily (and more efficiently) be written as a one-liner:
max(len(Ancestors(T,x)) for x in OrdLeaves(T))
max operates in linear time, O(n), while sorting is O(nlogn). You also don't need nested list comprehensions, a single generator expression will do.
This
result = result.sort()
should be this
result.sort()
It is a convention in Python that methods that mutate sequences return None.
Consider:
>>> a_list = [3, 2, 1]
>>> print a_list.sort()
None
>>> a_list
[1, 2, 3]
>>> a_dict = {}
>>> print a_dict.__setitem__('a', 1)
None
>>> a_dict
{'a': 1}
>>> a_set = set()
>>> print a_set.add(1)
None
>>> a_set
set([1])
Python's Design and History FAQ gives the reasoning behind this design decision (with respect to lists):
Why doesn’t list.sort() return the sorted list?
In situations where performance matters, making a copy of the list
just to sort it would be wasteful. Therefore, list.sort() sorts the
list in place. In order to remind you of that fact, it does not return
the sorted list. This way, you won’t be fooled into accidentally
overwriting a list when you need a sorted copy but also need to keep
the unsorted version around.
In Python 2.4 a new built-in function – sorted() – has been added.
This function creates a new list from a provided iterable, sorts it
and returns it.
.sort() returns None and sorts the list in place.
This has already been correctly answered: list.sort() returns None. The reason why is "Command-Query Separation":
http://en.wikipedia.org/wiki/Command-query_separation
Python returns None because every function must return something, and the convention is that a function that doesn't produce any useful value should return None.
I have never before seen your convention of putting a comment after the line it references, but starting the comment with a carat to point at the line. Please put comments before the lines they reference.
While you can use the .pop() method, you can also just index the list. The last value in the list can always be indexed with -1, because in Python negative indices "wrap around" and index backward from the end.
But we can simplify even further. The only reason you are sorting the list is so you can find its max value. There is a built-in function in Python for this: max()
Using list.sort() requires building a whole list. You will then pull one value from the list and discard it. max() will consume an iterator without needing to allocate a potentially-large amount of memory to store the list.
Also, in Python, the community prefers the use of a coding standard called PEP 8. In PEP 8, you should use lower-case for function names, and an underscore to separate words, rather than CamelCase.
http://www.python.org/dev/peps/pep-0008/
With the above comments in mind, here is my rewrite of your function:
def longest_path(T):
paths = [Ancestors(T,x) for x in OrdLeaves(T)]
return max(len(path) for path in paths)
Inside the call to max() we have a "generator expression" that computes a length for each value in the list paths. max() will pull values out of this, keeping the biggest, until all values are exhausted.
But now it's clear that we don't even really need the paths list. Here's the final version:
def longest_path(T):
return max(len(Ancestors(T, x)) for x in OrdLeaves(T))
I actually think the version with the explicit paths variable is a bit more readable, but this isn't horrible, and if there might be a large number of paths, you might notice a performance improvement due to not building and destroying the paths list.
list.sort() does not return a list - it destructively modifies the list you are sorting:
In [177]: range(10)
Out[177]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [178]: range(10).sort()
In [179]:
That said, max finds the largest element in a list, and will be more efficient than your method.
In Python sort() is an inplace operation. So result.sort() returns None, but changes result to be sorted. So to avoid your issue, don't overwrite result when you call sort().
Is there any reason not to use the sorted function? sort() is only defined on lists, but sorted() works with any iterable, and functions the way you are expecting. See this article for sorting details.
Also, because internally it uses timsort, it is very efficient if you need to sort on key 1, then sort on key 2.
You don't need a custom function for what you want to achieve, you first need to understand the methods you are using!
sort()ing a list in python does it in place, that is, the return from sort() is None. The list itself is modified, a new list is not returned.
>>>results = ['list','of','items']
>>>results
['list','of','items']
>>>results.sort()
>>>type(results)
<type 'list'>
>>>results
['items','list','of']
>>>results = results.sort()
>>>results
>>>
>>>type(results)
<type 'NoneType'>
As you can see, when you try to assign the sort() , you no longer have the list type.