I need to turn a list of various entities into strings. So far I use:
all_ents_dead=[] # converted to strings
for i in all_ents:
all_ents_dead.append(str(i))
Is there an optimized way of doing that?
EDIT: I then need to find which of these contain certain string. So far I have:
matching = [s for s in all_ents_dead if "GROUPS" in s]
Whenever you have a name = [], then name.append() in a loop pattern, consider using a list comprehension. A list comprehension builds a list from a loop, without having to use list.append() lookups and calls, making it faster:
all_ents_dead = [str(i) for i in all_ents]
This directly echoes the code you had, but with the expression inside all_ents_dead.append(...) moved to the front of the for loop.
If you don't actually need a list, but only need to iterate over the str() conversions you should consider lazy conversion options. You can turn the list comprehension in to a generator expression:
all_ents_dead = (str(i) for i in all_ents)
or, when only applying a function, the faster alternative in the map() function:
all_ents_dead = map(str, all_ents) # assuming Python 3
both of which lazily apply str() as you iterate over the resulting object. This helps avoid creating a new list object where you don't actually need one, saving on memory. Do note that a generator expression can be slower however; if performance is at stake consider all options based on input sizes, memory constraints and time trials.
For your specific search example, you could just embed the map() call:
matching = [s for s in map(str, all_ents) if "GROUPS" in s]
which would produce a list of matching strings, without creating an intermediary list of string objects that you then don't use anywhere else.
Use the map() function. This will take your existing list, run a function on each item, and return a new list/iterator (see below) with the result of the function applied on each element.
all_ends_dead = map(str, all_ents)
In Python 3+, map() will return an iterator, while in Python 2 it will return a list. An iterator can have optimisations you desire since it generates the values when demanded, and not all at once (as opposed to a list).
I have values in a list of lists.
I would like to send the whole block to a conversion function which then returns all the converted values in the same structure.
my_list = [sensor1...sensor4] = [hum1...hum3] = [value1, value2, value3, value4]
So several nested lists
def conversion(my_list): dictionaries
for sensor in my_list:
for hum in sensor:
for value in hum:
map(function, value)
Is there a way to do a list comprehension as a one liner? I'm not sure how to use the map function in comprehensions especially when you have several nested iterations.
map(function, value)
Since you are just mapping a function on each value, without collecting the return value in a list, using a list comprehension is not a good idea. You could do it, but you would be collecting list items that have no value, for the sole purpose of throwing them away later—just so you can save a few lines that actually serve a much better purpose: Clearly telling what’s going on, without being in a single, long, and complicated line.
So my advice would be to keep it as it is. It makes more sense like that and clearly shows what’s going on.
I am however collecting the values. They all need to be converted and saved in the same structure as they were.
In that case, you still don’t want a list comprehension as that would mean that you created a new list (for no real reason). Instead, just update the most-inner list. To do that, you need to change the way you’re iterating though:
for sensor in my_list:
for hum in sensor:
for i, value in enumerate(hum):
hum[i] = map(function, value)
This will update the inner list.
Alternatively, since value is actually a list of values, you can also replace the value list’s contents using the slicing syntax:
for sensor in my_list:
for hum in sensor:
for value in hum:
value[:] = map(function, value)
Also one final note: If you are using Python 3, remember that map returns a generator, so you need to convert it to a list first using list(map(function, value)); or use a list comprehension for that part with [function(v) for v in value].
This is the right way to do it. You can use list comprehension to do that, but you shouldn't for code readability and because it's probably not faster.
I have large list of string and i want to iteratoe over this list. I want to figure out which is the best way to iterate over list. I have tried using the following ways:
Generator Expression: g = (x for x in list)
Itertools.chain: ch = itertools.chain(list)
Is there is another approach, better than these two, for list iteration?
The fastest way is just to iterate over the list. If you already have a list, layering more iterators/generators isn't going to speed anything up.
A good old for item in a_list: is going to be just as fast as any other option, and definitely more readable.
Iterators and generators are for when you don't already have a list sitting around in memory. itertools.count() for instance just generates a single number at a time; it's not working off of an existing list of numbers.
Another possible use is when you're chaining a number of operations - your intermediate steps can create iterators/generators rather than creating intermediate lists. For instance, if you're wanting to chain a lookup for each item in the list with a sum() call, you could use a generator expression for the output of the lookups, which sum() would then consume:
total_inches_of_snow = sum(inches_of_snow(date) for date in list_of_dates)
This allows you to avoid creating an intermediate list with all of the individual inches of snow and instead just generate them as sum() consumes them, thus saving memory.
This statement is running quite slowly, and I have run out of ideas to optimize it. Could someone help me out?
[dict(zip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list]
The small_lists contain only about 6 elements.
A really_huge_list_of_list of size 209,510 took approximately 16.5 seconds to finish executing.
Thank you!
Edit:
really_huge_list_of_list is a generator. Apologies for any confusion.
The size is obtained from the result list.
Possible minor improvement:
[dict(itertools.izip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list]
Also, you may consider to use generator instead of list comprehensions.
To expand on what the comments are trying to say, you should use a generator instead of that list comprehension. Your code currently looks like this:
[dict(zip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list]
and you should change it to this instead:
def my_generator(input_list_of_lists):
small_list1 = ["wherever", "small_list1", "comes", "from"]
for small_list2 in input_list_of_lists:
yield dict(zip(small_list1, small_list2))
What you're doing right now is taking ALL the results of iterating over your really huge list, and building up a huge list of the results, before doing whatever you do with that list of results. Instead, you should turn that list comprehension into a generator so that you never have to build up a list of 200,000 results. It's building that result list that's taking up so much memory and time.
... Or better yet, just turn that list comprehension into a generator comprehension by changing its outer brackets into parentheses:
(dict(zip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list)
That's really all you need to do. The syntax for list comprehensions and generator comprehensions is almost identical, on purpose: if you understand a list comprehension, you'll understand the corresponding generator comprehension. (In this case, I wrote out the generator in "long form" first so that you'd see what that comprehension expands to).
For more on generator comprehensions, see here, here and/or here.
Hope this helps you add another useful tool to your Python toolbox!
When should you use generator expressions and when should you use list comprehensions in Python?
# Generator expression
(x*2 for x in range(256))
# List comprehension
[x*2 for x in range(256)]
John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:
def gen():
return (something for something in get_some_stuff())
print gen()[:2] # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists
Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.
Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.
Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.
Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.
See Generator expressions and list comprehensions for more info.
The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will "filter" the source material on-the-fly as you consume the bits.
Imagine you have a 2TB log file called "hugefile.txt", and you want the content and length for all the lines that start with the word "ENTRY".
So you try starting out by writing a list comprehension:
logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]
This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That's a lot of RAM, and probably not practical for your purposes.
So instead we can use a generator to apply a "filter" to our content. No data is actually read until we start iterating over the result.
logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))
Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:
long_entries = ((line,length) for (line,length) in entry_lines if length > 80)
Still nothing has been read, but we've specified now two generators that will act on our data as we wish.
Lets write out our filtered lines to another file:
outfile = open("filtered.txt","a")
for entry,length in long_entries:
outfile.write(entry)
Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.
So instead of "pushing" data to your output function in the form of a fully-populated list, you're giving the output function a way to "pull" data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we've read gets immediately discarded, so we can't go back to a previous line. On the other hand, we don't have to worry about keeping data around once we're done with it.
The benefit of a generator expression is that it uses less memory since it doesn't build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.
For example:
sum(x*2 for x in xrange(256))
dict( (k, some_func(k)) for k in some_list_of_keys )
The advantage there is that the list isn't completely generated, and thus little memory is used (and should also be faster)
You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.
For example:
reversed( [x*2 for x in xrange(256)] )
When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:
>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing
If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.
Python 3.7:
List comprehensions are faster.
Generators are more memory efficient.
As all others have said, if you're looking to scale infinite data, you'll need a generator eventually. For relatively static small and medium-sized jobs where speed is necessary, a list comprehension is best.
Sometimes you can get away with the tee function from itertools, it returns multiple iterators for the same generator that can be used independently.
I'm using the Hadoop Mincemeat module. I think this is a great example to take a note of:
import mincemeat
def mapfn(k,v):
for w in v:
yield 'sum',w
#yield 'count',1
def reducefn(k,v):
r1=sum(v)
r2=len(v)
print r2
m=r1/r2
std=0
for i in range(r2):
std+=pow(abs(v[i]-m),2)
res=pow((std/r2),0.5)
return r1,r2,res
Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop's map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).
Hadoop is a great example for using all the advantages of Generators.
Some notes for built-in Python functions:
Use a generator expression if you need to exploit the short-circuiting behaviour of any or all. These functions are designed to stop iterating when the answer is known, but a list comprehension must evaluate every element before the function can be called.
For example, if we have
from time import sleep
def long_calculation(value):
sleep(1) # for simulation purposes
return value == 1
then any([long_calculation(x) for x in range(10)]) takes about ten seconds, as long_calculation will be called for every x. any(long_calculation(x) for x in range(10)) takes only about two seconds, since long_calculation will only be called with 0 and 1 inputs.
When any and all iterate over the list comprehension, they will still stop checking elements for truthiness once an answer is known (as soon as any finds a true result, or all finds a false one); however, this is usually trivial compared to the actual work done by the comprehension.
Generator expressions are of course more memory efficient, when it's possible to use them. List comprehensions will be slightly faster with the non-short-circuiting min, max and sum (timings for max shown here):
$ python -m timeit "max(_ for _ in range(1))"
500000 loops, best of 5: 476 nsec per loop
$ python -m timeit "max([_ for _ in range(1)])"
500000 loops, best of 5: 425 nsec per loop
$ python -m timeit "max(_ for _ in range(100))"
50000 loops, best of 5: 4.42 usec per loop
$ python -m timeit "max([_ for _ in range(100)])"
100000 loops, best of 5: 3.79 usec per loop
$ python -m timeit "max(_ for _ in range(10000))"
500 loops, best of 5: 468 usec per loop
$ python -m timeit "max([_ for _ in range(10000)])"
500 loops, best of 5: 442 usec per loop
List comprehensions are eager but generators are lazy.
In list comprehensions all objects are created right away, it takes longer to create and return the list. In generator expressions, object creation is delayed until request by next(). Upon next() generator object is created and returned immediately.
Iteration is faster in list comprehensions because objects are already created.
If you iterate all the elements in list comprehension and generator expression, time performance is about the same. Even though generator expression return generator object right away, it does not create all the elements. Everytime you iterate over a new element, it will create and return it.
But if you do not iterate through all the elements generator are more efficient. Let's say you need to create a list comprehensions that contains millions of items but you are using only 10 of them. You still have to create millions of items. You are just wasting time for making millions of calculations to create millions of items to use only 10. Or if you are making millions of api requests but end up using only 10 of them. Since generator expressions are lazy, it does not make all the calculations or api calls unless it is requested. In this case using generator expressions will be more efficient.
In list comprehensions entire collection is loaded to the memory. But generator expressions, once it returns a value to you upon your next() call, it is done with it and it does not need to store it in the memory any more. Only a single item is loaded to the memory. If you are iterating over a huge file in disk, if file is too big you might get memory issue. In this case using generator expression is more efficient.
There is something that I think most of the answers have missed. List comprehension basically creates a list and adds it to the stack. In cases where the list object is extremely large, your script process would be killed. A generator would be more preferred in this case as its values are not stored in memory but rather stored as a stateful function. Also speed of creation; list comprehension are slower than generator comprehension
In short;
use list comprehension when the size of the obj is not excessively large else use generator comprehension
For functional programming, we want to use as little indexing as possible. For this reason, If we want to continue using the elements after we take the first slice of elements, islice() is a better choice since the iterator state is saved.
from itertools import islice
def slice_and_continue(sequence):
ret = []
seq_i = iter(sequence) #create an iterator from the list
seq_slice = islice(seq_i,3) #take first 3 elements and print
for x in seq_slice: print(x),
for x in seq_i: print(x**2), #square the rest of the numbers
slice_and_continue([1,2,3,4,5])
output: 1 2 3 16 25