Here's an example of initializing an array of ten million random numbers, using a list (a), and using tuple-like generator (b). The result is exactly the same, the list or tuple is never used, so there's no practical advantage with one or the other
from random import randint
from array import array
a = array('H', [randint(1, 100) for _ in range(0, 10000000)])
b = array('H', (randint(1, 100) for _ in range(0, 10000000)))
So the question is which one to use. In principle, my understanding is that that a tuple should be able to get away with using less resources than a list, but since this list and tuple are not kept, it should be possible that the code is executed without ever initializing the intermediate data structure… My tests indicate that the list is slightly faster in this case. I can only imagine that this is because the Python implementation has more optimization around lists than tuples. Can I expect this to be consistent?
More generally, should I use one or the other, and why? (Or should I do this kind initialization some other way completely.)
Update: Answers and comments made me realize that the b example is not actually a tuple but a generator, so I edited a bit in the headline and the text above to reflect that. Also I tried splitting the list version into two lines like this, which should force the list to actually be instantiated:
g = [randint(1, 100) for _ in range(0, 10000000)]
a = array('H', g)
It appears to make no difference. The list version takes about 8.5 seconds, and the generator version takes about 9 seconds.
Although it looks like it, (randint(1, 100) for _ in range(0, 1000000)) is not a tuple, it's a generator:
>>> type((randint(1, 100) for _ in range(0, 1000000)))
<class 'generator'>
>>>
If you really want a tuple, use:
b = array('H', tuple(randint(1, 100) for _ in range(0, 1000000)))
The list being a bit faster than the generator makes sense, since the generator generates the next value when asked, one at a time, while the list comprehension allocates all the memory needed and then proceeds to fill it with values all in one go. That optimisation for speed is paid for in memory space.
I'd favour the generator, since it will work regardless of most reasonable memory restrictions and would work for any number of random numbers, while the speedup of the list is minimal. Unless you need to generate this list again and again, at which time the speedup would start to count - but then you'd probably use the same copy of the list each time to begin with.
[randint(1, 100) for _ in range(0, 10000000)]
This is a list comprehension. Every element is evaluated in a tight loop and put together into a list, so it is generally faster but takes more RAM (everything comes out at once).
(randint(1, 100) for _ in range(0, 10000000))
This is a generator expression. No element is evaluated at this point, and one of them comes out at a time when you call next() on the resulting generator. It's slower but takes a consistent (small) amount of memory.
As given in the other answer, if you want a tuple, you should convert either into one:
tuple([randint(1, 100) for _ in range(0, 10000000)])
tuple(randint(1, 100) for _ in range(0, 10000000))
Let's come back to your question:
When to use which?
In general, if you use a list comprehension or generator expression as an initializer of another sequential data structure (list, array, etc.), it makes no difference except for the memory-time tradeoff mentioned above. Things you need to consider is as simple as performance and memory budget. You would prefer the list comprehension if you need more speed (or write a C program to be absolutely fast) or the generator expression if you need to keep the memory consumption low.
If you plan to reuse the resulting sequence, things start to get interesting.
A list is strictly a list, and can for all purposes be used as a list:
a = [i for i in range(5)]
a[3] # 3
a.append(5) # a = [0, 1, 2, 3, 4, 5]
for _ in a:
print("Hello")
# Prints 6 lines in total
for _ in a:
print("Bye")
# Prints another 6 lines
b = list(reversed(a)) # b = [5, 4, 3, 2, 1, 0]
A generator can be only used once.
a = (i for i in range(5))
a[3] # TypeError: generator object isn't subscriptable
a.append(5) # AttributeError: generator has no attribute 'append'
for _ in a:
print("Hello")
# Prints 5 lines in total
for _ in a:
print("Bye")
# Nothing this time, because
# the generator has already been consumed
b = list(reversed(a)) # TypeError: generator isn't reversible
The final answer is: Know what you want to do, and find the appropriate data structure for it.
Related
I am trying to learn functional programming and algorithms at the same time, and Ive implemented a merge sort in Haskell. Then I converted the style into python and run a test on a learning platform, but I get return that it takes too long time to sort a list on a 1000 integers.
Is there a way i can optimize my python code and still keep my functional style or do I have to solve the problem iteratively?
Thanks in advance.
So here is the code I made in Haskell first.
merge :: Ord a => [a] -> [a] -> [a]
merge [] xs = xs
merge ys [] = ys
merge (x:xs) (y:ys)
| (x <= y) = x : (merge xs (y:ys))
| otherwise = y : (merge (x:xs) ys)
halve :: [a] -> ([a] , [a])
halve [x] = ([x], [])
halve xs = (take n xs , drop n xs)
where n = length xs `div` 2
msort :: Ord a => [a] -> [a]
msort [x] = [x]
msort [] = []
msort xs = merge (msort n) (msort m)
where (n,m) = halve xs
Then I made this code in python based on the Haskell style.
import sys
sys.setrecursionlimit(1002) #This is because the recursion will go 1002 times deep when I have a list on 1000 numbers.
def merge(xs,ys):
if len(xs) == 0:
return ys
elif len(ys) == 0:
return xs
else:
if xs[0] <= ys[0]:
return [xs[0]] + merge(xs[1:], ys)
else:
return [ys[0]] + merge(xs, ys[1:])
def halve(xs):
return (xs[:len(xs)//2],xs[len(xs)//2:])
def msort(xss):
if len(xss) <= 1:
return xss
else:
xs,ys = halve(xss)
return merge(msort(xs), msort(ys))
Is there a smarter way I can optimize the python version and still have a functional style?
Haskell lists are lazy. [x] ++ xs first produces the x, and then it produces all the elements in xs.
In e.g. Lisp the lists are singly-linked lists and appending them copies the first list, so prepending a singleton is an O(1) operation.
In Python though the appending copies the second list (as confirmed by #chepner in the comments), i.e. [x] + xs will copy the whole list xs and thus is an O(n) operation (where n is the length of xs).
This means that both your [xs[0]] + merge(xs[1:], ys) and [ys[0]] + merge(xs, ys[1:]) lead to quadratic behavior which you observe as the dramatic slowdown you describe.
Python's equivalent to Haskell's lazy lists is not lists, it's generators, which produce their elements one by one on each yield. Thus the rewrite could look something like
def merge(xs,ys):
if len(xs) == 0:
return ys
elif len(ys) == 0:
return xs
else:
a = (x for x in xs) # or maybe iter(xs)
b = (y for y in ys) # or maybe iter(ys)
list( merge_gen(a,b))
Now what's left is to re-implement your merge logic as merge_gen which expects two generators (or should that be iterators? do find out) as its input and generates the ordered stream of elements which it gets by pulling them one by one from the two sources as needed. The resulting stream of elements is converted back to list, as expected by the function's caller. No redundant copying will be performed.
If I've made some obvious Python errors, please treat the above as a pseudocode.
Your other option is to pre-allocate a second list of the same length and copy the elements between the two lists back and forth while merging, using indices to reference the elements of the arrays and mutating the contents to store the results.
I am no Haskell expert, so I might be missing something. Here's my best gamble:
Haskell list's are not state-aware. One implication of that is that lists can be shared. That make the action of halving leaner on memory allocations - To produce a 'drop n xs' you only have to allocate one list node (or whatever they are called in Haskell) and point it to the list element in the (n 'div' 2) + 1 node on the pre-halved list.
Note that 'take' is not able to do this little trick - it is not allowed to change a state of any node in the list, and hence it has to allocate new node object with equal values to the first n div 2 elements in the pre-halved list.
Now look at the python equivalent of that function - to halve the list, you use the list slicing:
def halve(xs):
return (xs[:len(xs)//2],xs[len(xs)//2:])
Here you allocate two lists instead of one - in every level of the recursion tree! (I am also pretty sure that a list is a much more complex thing in python than the Haskell list, so probably allocation is slower, too)
What I would do:
Check my gamble - use the time module to see if your code spends too long allocating those lists, compared to the overall running time.
In case my gamble proved correct - avoid those allocations. A (Not very elegant, but probably fast) way to work around it - Pass a list, alongside with indices that indicate where each halve begin and where it ends. Work with offsets instead of allocating a new list each time. (EDIT:) You can avoid similar allocations as well - whenever you want to slice, pass an index to the begin\end of the new list.
And a last word - one of the requirements you've mentioned is keeping the functional approach. One can interpret that as keeping your code side-effect free.
To do so, I'd define an output list, and store the elements you merge in it. Combined with the index approach, that will not change the state of the input list, and will produce a new sorted output list.
EDIT:
Another thing worth mentioning here: python lists are not singly linked-list, like Haskell lists. They are a data structure more commonly called Dynamic Arrays. This means that stuff like slicing, deleting an object from the middle of the list, etc. is expensive, since it has implication on ALL objects in the array. On the other hand, you are allowed to access an object at the i-th index in O(1). You should keep that in mind, it is closely related to the problem you came Up with.
I am coming from a Matlab background and I am finding it difficult to get around the concept of generators in Python.
Can someone please answer me the following:
The difference between a generator function and a loop
When each should be implemented
A generator provides a way to create elements "on the fly" without holding them all in memory before we start going over them. A loop is simply a way to make the generator, or another iterable, give us one element at a time.
For example:
for i in range(10):
print(i)
The for block is a loop, and range is basically a generator. range doesn't create a list from 1-10 before the loop starts, it just creates the generator, the creator of these elements. You can also imagine range(1000000000000000000), which again wouldn't take any time to create (and won't take up memory) because none of the elements are created until they are needed.
On the other hand, our loop can also take one element from objects that already exist, like a list:
for i in [0,1,2,3,4,5,6,7,8,9]:
print(i)
The same result would be printed, but the list is created and stored in its entriety before the loop starts. This means that while the loop is running, the list takes up memory space and time to create.
Both the examples are loops, but only the first one uses a generator.
This is just the basics, but there are more differences, like exceptions that can be raised and re-usability, iterating in parts, and more.
For more on the difference
EDIT: #Vicrobot is correct in stating that range isn't really a generator, but for the purposes of explaining the "laziness" of generators that's what I used for simplicity
Have a read in the following article How to Use Generators and yield in Python. Perhaps the following examples help a bit to understand to concept.
def my_range(n):
for i in range(n):
yield i
range_of_10 = my_range(10)
for i in range_of_ten:
print(i)
result:
0
1
3
4
5
6
7
8
9
or
>>> range_of_ten = my_range(10)
>>> next(range_of_ten)
0
>>> next(range_of_ten)
1
etc.
>>> next(range_of_ten)
9
>>> next(range_of_ten)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
I like the following example where you can replace a double loop in one loop as follows:
def double_loop(n, m):
for i in range(n):
for j in range(m):
yield i, j
n = double_loop(2, 4)
for i in n:
print(i)
result
(0, 0)
(0, 1)
(0, 2)
(0, 3)
(1, 0)
(1, 1)
(1, 2)
(1, 3)
I need to iterate a list of integers:
2, 3, 4, 6, 9, 13, 19, 28, 42, ...
So the general rule is list[i+1] = list[i]*3/2.
The list should end at or right before 10^34.
This means that the last element is a number between 10^34*2/3 and 10^34.
I obviously cannot have this list preallocated in memory, so something like:
list = [2]
while True:
next = list[-1]*3/2
if next > 10**34:
break
list.append(next)
Is out of the question.
I can of course simply use the above in order to iterate these integers without having them stored in a list or generated by an iterator of some sort.
But the problem is that I have nested loops, like so:
for i in xrange(...):
for j in xrange(...):
for m in xrange(...):
for n in xrange(...):
So breaking this into several while loops would make the code pretty horrible.
Ideally, I would like to have some sort of xrange which generates this list of numbers "on the fly" (as xranges normally do).
As I mentioned in my comment, the list won't actually be very long. You could therefore initialise it as:
[int(2 * (1.5 ** n)) for n in range(int(math.log(10 ** 34, 1.5) - 1))]
However, this is actually slightly different to the example you gave, wherein you round to integer before generating the next number. In this case, you would have to do something iterative instead (as far as I can tell):
i = 2
lst = []
while i < 10 ** 34:
lst.append(i)
i = int(i * 1.5)
This might be what you're looking for:
lst = [2*1.5**i for i in range(0,192)]
The last term would be 2*1.5^191.
If you want them all to be integers you could say use the int() cast.
But you should note that the time/memory to do this will probably be similar to what you were doing in your example code. They are both doing similar things in the end.
If you want them all to be integers throughout the process:
i = 2
lst = [2]
while i < 10**34:
lst.append(int(i*=1.5))
Again, this will only save a trace amount of memory/time.
Just some Python code for an example:
nums = [1,2,3]
start = timer()
for i in range(len(nums)):
print(nums[i])
end = timer()
print((end-start)) #computed to 0.0697546862831
start = timer()
print(nums[0])
print(nums[1])
print(nums[2])
end = timer()
print((end-start)) #computed to 0.0167170338524
I can grasp that some extra time will be taken in the loop because the value of i must be incremented a few times, but the difference between the running times of these two different methods seems a lot bigger than I expected. Is there something else happening underneath the hood that I'm not considering?
Short answer: it isn't, unless the loop is very small. The for loop has a small overhead, but the way you're doing it is inefficient. By using range(len(nums)) you're effectively creating another list and iterating through that, then doing the same index lookups anyway. Try this:
for i in nums:
print(i)
Results for me were as expected:
>>> import timeit
>>> timeit.timeit('nums[0];nums[1];nums[2]', setup='nums = [1,2,3]')
0.10711812973022461
>>> timeit.timeit('for i in nums:pass', setup='nums = [1,2,3]')
0.13474011421203613
>>> timeit.timeit('for i in range(len(nums)):pass', setup='nums = [1,2,3]')
0.42371487617492676
With a bigger list the advantage of the loop becomes apparent, because the incremental cost of accessing an element by index outweighs the one-off cost of the loop:
>>> timeit.timeit('for i in nums:pass', setup='nums = range(0,100)')
1.541944980621338
timeit.timeit(';'.join('nums[%s]' % i for i in range(0,100)), setup='nums = range(0,100)')
2.5244338512420654
In python 3, which puts a greater emphasis on iterators over indexable lists, the difference is even greater:
>>> timeit.timeit('for i in nums:pass', setup='nums = range(0,100)')
1.6542046590038808
>>> timeit.timeit(';'.join('nums[%s]' % i for i in range(0,100)), setup='nums = range(0,100)')
10.331634456000756
With such a small array you're probably measuring noise first, and then the overhead of calling range(). Note that range not only has to increment a variable a few times, it also creates an object that holds its state (the current value) because it's a generator. The function call and object creation are two things you don't pay for in the second example and for very short iterations they will probably dwarf three array accesses.
Essentially your second snippet does loop unrolling, which is a viable and frequent technique of speeding up performance-critical code.
The for loop have a cost in any case, and the one you write is especially costly. Here is four versions, using timeit for measure time:
from timeit import timeit
NUMS = [1, 2, 3]
def one():
for i in range(len(NUMS)):
NUMS[i]
def one_no_access():
for i in range(len(NUMS)):
i
def two():
NUMS[0]
NUMS[1]
NUMS[2]
def three():
for i in NUMS:
i
for func in (one, one_no_access, two, three):
print(func.__name__ + ':', timeit(func))
Here is the found times:
one: 1.0467438200000743
one_no_access: 0.8853238560000136
two: 0.3143197629999577
three: 0.3478466749998006
The one_no_access show the cost of the expression range(len(NUMS)).
While lists in python are stocked contiguously in memory, the random access of elements is in O(1), explaining two as the quicker.
x = [1,2,3,4,5,6,7,8,9,10]
#Random list elements
for i in range(int(len(x)/2)):
value = x[i]
x[i] = x[len(x)-i-1]
x[len(x)-i-1] = value
#Confusion on efficiency
print(x)
This is a uni course for first year. So no python shortcuts are allowed
Not sure what counts as "a shortcut" (reversed and the "Martian Smiley" [::-1] being obvious candidates -- but does either count as "a shortcut"?!), but at least a couple small improvements are easy:
L = len(x)
for i in range(L//2):
mirror = L - i - 1
x[i], x[mirror] = x[mirror], x[i]
This gets len(x) only once -- it's a fast operation but there's no reason to keep repeating it over and over -- also computes mirror but once, does the swap more directly, and halves L (for the range argument) directly with the truncating-division operator rather than using the non-truncating division and then truncating with int. Nanoseconds for each case, but it may be considered slightly clearer as well as microscopically faster.
x = [1,2,3,4,5,6,7,8,9,10]
x = x.__getitem__(slice(None,None,-1))
slice is a python builtin object (like range and len that you used in your example)
__getitem__ is a method belonging to iterable types ( of which x is)
there are absolutely no shortcuts here :) and its effectively one line.