Enormous Input - for loop faster than list comprehension - python

I'm trying to solve a codechef beginner problem - Enormous Input Test. My code
a,b = [ int(i) for i in raw_input().split()]
print [input()%b==0 for i in range(a)].count(True)
gets timed out. Another solution, which uses basic for-loops, seems to be working fine.
I believe that list comprehension is quicker than basic for - loops. Then why is the former slower? Also will using generators in this case reduce the memory used and perform the computation faster, if so how can I do it?

Why do you believe that list comprehension is quicker than basic for loops? (Hint: they are both implemented using the same underlying instructions.)
Your code will be executed in some manner like this:
a, b = ...
temp = []
for i in range(a):
temp.append(int(raw_input()) % b == 0)
print temp.count(True)
As you can see, it creates a large list in memory, iterates over it to create a second list, and then iterates over the second list to create a count. The list does not ever need to be created.
a, b = ...
count = 0
for i in xrange(a):
if int(raw_input()) % b == 0:
count += 1
print count
Some compilers are capable of optimizing hylomorphisms to remove the intermideate list, but I know of no Python implementation capable of this. So you are stuck optimizing by hand.
Note: Do not use input in Python 2.x, unless you know what you are doing. I have changed the code to use int(raw_input()) because that is safe, whereas input() is dangerous.

Related

Merge sorting algorithm in Python for two sorted lists - trouble constructing for-loop

I'm trying to create an algorithm to merge two ordered lists into a larger ordered list in Python. Essentially I began by trying to isolate the minimum elements in each list and then I compared them to see which was smallest, because that number would be smallest in the larger list as well. I then appended that element to the empty larger list, and then deleted it from the original list it came from. I then tried to loop through the original two lists doing the same thing. Inside the "if" statements, I've essentially tried to program the function to append the remainder of one list to the larger function if the other is/becomes empty, because there would be no point in asking which elements between the two lists are comparatively smaller then.
def merge_cabs(cab1, cab2):
for (i <= all(j) for j in cab1):
for (k <= all(l) for l in cab2):
if cab1 == []:
newcab.append(cab2)
if cab2 == []:
newcab.append(cab1)
else:
k = min(min(cab1), min(cab2))
newcab.append(k)
if min(cab1) < min(cab2):
cab1.remove(min(cab1))
if min(cab2) < min(cab1):
cab2.remove(min(cab2))
print(newcab)
cab1 = [1,2,5,6,8,9]
cab2 = [3,4,7,10,11]
newcab = []
merge_cabs(cab1, cab2)
I've had a bit of trouble constructing the for-loop unfortunately. One way I've tried to isolate the minimum values was as I wrote in the two "for" lines. Right now, Python is returning "SyntaxError: invalid syntax," pointing to the colon in the first "for" line. Another way I've tried to construct the for-loop was like this:
def merge_cabs(cabs1, cabs2):
for min(i) in cab1:
for min(j) in cab2:
I've also tried to write the expression all in one line like this:
def merge_cabs(cab1, cab2):
for min(i) in cabs1 and min(j) in cabs2:
and to loop through a copy of the original lists rather than looping through the lists themselves, because searching through the site, I've found that it can sometimes be difficult to remove elements from a list you're looping through. I've also tried to protect the expressions after the "for" statements inside various configurations of parentheses. If someone sees where the problem(s) lies, it would really be great if you could point it out, or if you have any other observations that could help me better construct this function, I would really appreciate those too.
Here's a very simple-minded solution to this that uses only very basic Python operations:
def merge_cabs(cab1, cab2):
len1 = len(cab1)
len2 = len(cab2)
i = 0
j = 0
newcab = []
while i < len1 and j < len2:
v1 = cab1[i]
v2 = cab2[j]
if v1 <= v2:
newcab.append(v1)
i += 1
else:
newcab.append(v2)
j += 1
while i < len1:
newcab.append(cab1[i])
i += 1
while j < len2:
newcab.append(cab2[j])
j += 1
return newcab
Things to keep in mind:
You should not have any nested loops. Merging two sorted lists is typically used to implement a merge sort, and the merge step should be linear. I.e., the algorithm should be O(n).
You need to walk both lists together, choosing the smallest value at east step, and advancing only the list that contains the smallest value. When one of the lists is consumed, the remaining elements from the unconsumed list are simply appended in order.
You should not be calling min or max etc. in your loop, since that will effectively introduce a nested loop, turning the merge into an O(n**2) algorithm, which ignores the fact that the lists are known to be sorted.
Similarly, you should not be calling any external sort function to do the merge, since that will result in an O(n*log(n)) merge (or worse, depending on the sort algorithm), and again ignores the fact that the lists are known to be sorted.
Firstly, there's a function in the (standard library) heapq module for doing exactly this, heapq.merge; if this is a real problem (rather than an exercise), you want to use that one instead.
If this is an exercise, there are a couple of points:
You'll need to use a while loop rather than a for loop:
while cab1 or cab2:
This will keep repeating the body while there are any items in either of your source lists.
You probably shouldn't delete items from the source lists; that's a relatively expensive operation. In addition, on the balance having a merge_lists function destroy its arguments would be unexpected.
Within the loop you'll refer to cab1[i1] and cab2[i2] (and, in the condition, to i1 < len(cab1)).
(By the time I typed out the explanation, Tom Karzes typed out the corresponding code in another answer...)

Python: weird list index out of range error [duplicate]

This question already has answers here:
Strange result when removing item from a list while iterating over it
(8 answers)
Closed 7 years ago.
l = range(100)
for i in l:
print i,
print l.pop(0),
print l.pop(0)
The above python code gives the output quite different from expected. I want to loop over items so that I can skip an item while looping.
Please explain.
Never alter the container you're looping on, because iterators on that container are not going to be informed of your alterations and, as you've noticed, that's quite likely to produce a very different loop and/or an incorrect one. In normal cases, looping on a copy of the container helps, but in your case it's clear that you don't want that, as the container will be empty after 50 legs of the loop and if you then try popping again you'll get an exception.
What's anything BUT clear is, what behavior are you trying to achieve, if any?! Maybe you can express your desires with a while...?
i = 0
while i < len(some_list):
print i,
print some_list.pop(0),
print some_list.pop(0)
I've been bitten before by (someone else's) "clever" code that tries to modify a list while iterating over it. I resolved that I would never do it under any circumstance.
You can use the slice operator mylist[::3] to skip across to every third item in your list.
mylist = [i for i in range(100)]
for i in mylist[::3]:
print(i)
Other points about my example relate to new syntax in python 3.0.
I use a list comprehension to define mylist because it works in Python 3.0 (see below)
print is a function in python 3.0
Python 3.0 range() now behaves like xrange() used to behave, except it works with values of arbitrary size. The latter no longer exists.
The general rule of thumb is that you don't modify a collection/array/list while iterating over it.
Use a secondary list to store the items you want to act upon and execute that logic in a loop after your initial loop.
Use a while loop that checks for the truthfulness of the array:
while array:
value = array.pop(0)
# do some calculation here
And it should do it without any errors or funny behaviour.
Try this. It avoids mutating a thing you're iterating across, which is generally a code smell.
for i in xrange(0, 100, 3):
print i
See xrange.
I guess this is what you want:
l = range(100)
index = 0
for i in l:
print i,
try:
print l.pop(index+1),
print l.pop(index+1)
except IndexError:
pass
index += 1
It is quite handy to code when the number of item to be popped is a run time decision.
But it runs with very a bad efficiency and the code is hard to maintain.
This slice syntax makes a copy of the list and does what you want:
l = range(100)
for i in l[:]:
print i,
print l.pop(0),
print l.pop(0)

python 2.7 for loop to generate a list

I have tested in Python 2.7, the two styles are the same. My confusion is, when reading first method to generate a list, I am always a bit confused if i%2 == 0 controls if we should execute the whole loop of i in range(100), or i%2 == 0 is under loop of i in range(100). I have the confusion maybe in the past I write Java and C++, thinking methods from there.
Looking for advice how to read list generation code, normally the pattern is [<something before loop> <the loop> <something after the loop>], in this case "something before loop" is 1, and "the loop" is for i in range(100) and "something after the loop" is i%2 == 0.
Also asking for advice if writing code in method 1 is good coding style in Python 2.7? Thanks.
a = [1 for i in range(100) if i%2 == 0]
print a
a=[]
for i in range(100):
if i%2==0:
a.append(1)
print a
Edit 1,
I also want to compare of using xrange in an explicit loop (compare to first method of list comprehension for pros and cons), for example,
a=[]
for i in xrange(100):
if i%2==0:
a.append(1)
print a
Edit 2,
a = [1 for i in xrange(100) if i%2 == 0]
1) as already mentioned in python 2.7 it is usually suggested to use xrange since it will (like in C) only keep a counter that will be incremented.
Instead the range is really creating in memory a whole list from 0 till 99!
Maybe here you have to think, if you need the 100 included --> then please use 101 ;)
2) You got my point, the question is valid and you have to think that operation will be executed indeed "under" the loop!!
Bearing in mind that the list comprehension is quite powerful in order to create the needful!! Anyway be careful that in some cases is not so easy to read especially when you are using inside multiple variable like x,y and so on.
I would chose your first line, just take care of min and max of your array. As said maybe you have to incorporate the 100th element and you can speed up using the xrange function instead of range.
a = [1 for i in range(100) if i%2 == 0]
3) A good suggestion is also to document yourself on xrange and while loop --> on stackoverflow you can find plenty of discussions looking for the speed of the two mentioned operation!! (This is only suggestion)
Hope this clarify your query! Have a nice day!

replacing while loop with list comprehension

It is common to express for loops as list comprehensions:
mylist=[]
for i in range(30):
mylist.append(i**2)
This is equivalent to:
mylist = [i**2 for i in range(30)]
Is there any sort of mechanism by which this sort of iteration could be done with a while loop?
mylist=[]
i=0
while i<30:
mylist.append(i**2)
i+=1
Of course with this simple example it's easy to translate to a for loop and then to a list comprehension, but what if it isn't quite so easy?
e.g.
mylist = [i**2 while i=0;i<30;i++ ]
(Of course the above pseudo-code isn't legitimate python) (itertools comes to mind for this sort of thing, but I don't know that module terribly well.)
EDIT
An (very simple) example where I think a while comprehension would be useful would be:
dt=0.05
t=0
mytimes=[]
while t<maxtime:
mytimes.append(t)
t+=dt
This could translate to:
dt=0.05
t=0
nsteps=maxtime/dt
mytimes=[]
for t in (i*dt for i in xrange(nsteps)):
mytimes.append(t)
which can be written as a (compound) list comprehension:
nsteps=maxtime/dt
mytimes=[t for t in (i*dt for i in xrange(nsteps)]
But, I would argue that the while loop is MUCH easier to read (and not have index errors) Also, what if your object (dt) supports '+' but not '*'? More complicated examples could happen if maxtime somehow changes for each iteration of the loop...
If your while loop justs checks a local variable that is being incremented, you should convert it to a for loop or the equivalent list comprehension.
You should only use a while loop only if you can not express the loop as iterating over something. An example of a typical use case are checks for the state of an Event, or a low-level loop that calls into native code. It follows that (correctly used) while loops are rare, and best just written out. A while comprehension would just make them harder to read.
If you just want to return multiple values, you should consider writing a generator.
For example, your edited algorithm should be written as (using numpy.arange):
mytimes = numpy.arange(0, maxtime, 0.05)
Alternatively, with a generator:
def calcTimes(maxtime):
dt = 0.05
t = 0
while t < maxtime:
yield t
t += dt

Is there any built-in way to get the length of an iterable in python?

For example, files, in Python, are iterable - they iterate over the lines in the file. I want to count the number of lines.
One quick way is to do this:
lines = len(list(open(fname)))
However, this loads the whole file into memory (at once). This rather defeats the purpose of an iterator (which only needs to keep the current line in memory).
This doesn't work:
lines = len(line for line in open(fname))
as generators don't have a length.
Is there any way to do this short of defining a count function?
def count(i):
c = 0
for el in i: c += 1
return c
To clarify, I understand that the whole file will have to be read! I just don't want it in memory all at once
Short of iterating through the iterable and counting the number of iterations, no. That's what makes it an iterable and not a list. This isn't really even a python-specific problem. Look at the classic linked-list data structure. Finding the length is an O(n) operation that involves iterating the whole list to find the number of elements.
As mcrute mentioned above, you can probably reduce your function to:
def count_iterable(i):
return sum(1 for e in i)
Of course, if you're defining your own iterable object you can always implement __len__ yourself and keep an element count somewhere.
If you need a count of lines you can do this, I don't know of any better way to do it:
line_count = sum(1 for line in open("yourfile.txt"))
The cardinality package provides an efficient count() function and some related functions to count and check the size of any iterable: http://cardinality.readthedocs.org/
import cardinality
it = some_iterable(...)
print(cardinality.count(it))
Internally it uses enumerate() and collections.deque() to move all the actual looping and counting logic to the C level, resulting in a considerable speedup over for loops in Python.
I've used this redefinition for some time now:
def len(thingy):
try:
return thingy.__len__()
except AttributeError:
return sum(1 for item in iter(thingy))
It turns out there is an implemented solution for this common problem. Consider using the ilen() function from more_itertools.
more_itertools.ilen(iterable)
An example of printing a number of lines in a file (we use the with statement to safely handle closing files):
# Example
import more_itertools
with open("foo.py", "r+") as f:
print(more_itertools.ilen(f))
# Output: 433
This example returns the same result as solutions presented earlier for totaling lines in a file:
# Equivalent code
with open("foo.py", "r+") as f:
print(sum(1 for line in f))
# Output: 433
Absolutely not, for the simple reason that iterables are not guaranteed to be finite.
Consider this perfectly legal generator function:
def forever():
while True:
yield "I will run forever"
Attempting to calculate the length of this function with len([x for x in forever()]) will clearly not work.
As you noted, much of the purpose of iterators/generators is to be able to work on a large dataset without loading it all into memory. The fact that you can't get an immediate length should be considered a tradeoff.
Because apparently the duplication wasn't noticed at the time, I'll post an extract from my answer to the duplicate here as well:
There is a way to perform meaningfully faster than sum(1 for i in it) when the iterable may be long (and not meaningfully slower when the iterable is short), while maintaining fixed memory overhead behavior (unlike len(list(it))) to avoid swap thrashing and reallocation overhead for larger inputs.
# On Python 2 only, get zip that lazily generates results instead of returning list
from future_builtins import zip
from collections import deque
from itertools import count
def ilen(it):
# Make a stateful counting iterator
cnt = count()
# zip it with the input iterator, then drain until input exhausted at C level
deque(zip(it, cnt), 0) # cnt must be second zip arg to avoid advancing too far
# Since count 0 based, the next value is the count
return next(cnt)
Like len(list(it)), ilen(it) performs the loop in C code on CPython (deque, count and zip are all implemented in C); avoiding byte code execution per loop is usually the key to performance in CPython.
Rather than repeat all the performance numbers here, I'll just point you to my answer with the full perf details.
For filtering, this variation can be used:
sum(is_good(item) for item in iterable)
which can be naturally read as "count good items" and is shorter and simpler (although perhaps less idiomatic) than:
sum(1 for item in iterable if is_good(item)))
Note: The fact that True evaluates to 1 in numeric contexts is specified in the docs
(https://docs.python.org/3.6/library/stdtypes.html#boolean-values), so this coercion is not a hack (as opposed to some other languages like C/C++).
We'll, if you think about it, how do you propose you find the number of lines in a file without reading the whole file for newlines? Sure, you can find the size of the file, and if you can gurantee that the length of a line is x, you can get the number of lines in a file. But unless you have some kind of constraint, I fail to see how this can work at all. Also, since iterables can be infinitely long...
I did a test between the two common procedures in some code of mine, which finds how many graphs on n vertices there are, to see which method of counting elements of a generated list goes faster. Sage has a generator graphs(n) which generates all graphs on n vertices. I created two functions which obtain the length of a list obtained by an iterator in two different ways and timed each of them (averaging over 100 test runs) using the time.time() function. The functions were as follows:
def test_code_list(n):
l = graphs(n)
return len(list(l))
and
def test_code_sum(n):
S = sum(1 for _ in graphs(n))
return S
Now I time each method
import time
t0 = time.time()
for i in range(100):
test_code_list(5)
t1 = time.time()
avg_time = (t1-t0)/10
print 'average list method time = %s' % avg_time
t0 = time.time()
for i in range(100):
test_code_sum(5)
t1 = time.time()
avg_time = (t1-t0)/100
print "average sum method time = %s" % avg_time
average list method time = 0.0391882109642
average sum method time = 0.0418473792076
So computing the number of graphs on n=5 vertices this way, the list method is slightly faster (although 100 test runs isn't a great sample size). But when I increased the length of the list being computed by trying graphs on n=7 vertices (i.e. changing graphs(5) to graphs(7)), the result was this:
average list method time = 4.14753051996
average sum method time = 3.96504004002
In this case the sum method was slightly faster. All in all, the two methods are approximately the same speed but the difference MIGHT depend on the length of your list (it might also just be that I only averaged over 100 test runs, which isn't very high -- would have taken forever otherwise).

Categories