python simple iteration - python

i would like to ask what is the best way to make simple iteration. suppose i want to repeat certain task 1000 times, which one of the following is the best? or is there a better way?
for i in range(1000):
do something with no reference to i
i = 0
while i < 1000:
do something with no reference to i
i += 1
thanks very much

The first is considered idiomatic. In Python 2.x, use xrange instead of range.

The for loop is more concise and more readable. while loops are rarely used in Python (with the exception of while True).
A bit of idiomatic Python: if you're trying to do something a set number of times with a range (with no need to use the counter), it's good practice to name the counter _. Example:
for _ in range(1000):
# do something 1000 times

In Python 2, use
for i in xrange(1000):
pass
In Python 3, use
for i in range(1000):
pass
Performance figures for Python 2.6:
$ python -s -m timeit '' 'i = 0
> while i < 1000:
> i += 1'
10000 loops, best of 3: 71.1 usec per loop
$ python -s -m timeit '' 'for i in range(1000): pass'
10000 loops, best of 3: 28.8 usec per loop
$ python -s -m timeit '' 'for i in xrange(1000): pass'
10000 loops, best of 3: 21.9 usec per loop
xrange is preferable to range in this case because it produces a generator rather than the whole list [0, 1, 2, ..., 998, 999]. It'll use less memory, too. If you needed the actual list to work with all at once, that's when you use range. Normally you want xrange: that's why in Python 3, xrange(...) becomes range(...) and range(...) becomes list(range(...)).

first. because the integer is done in the internal layer rather than interpretor. Also one less global variable.

Related

Which is the efficient way to convert a float into an int in python?

I've been using n = int(n) to convert a float into an int.
Recently, I came across another way to do the same thing :
n = n // 1
Which is the most efficient way, and why?
Test it with timeit:
$ bin/python -mtimeit -n10000000 -s 'n = 1.345' 'int(n)'
10000000 loops, best of 3: 0.234 usec per loop
$ bin/python -mtimeit -n10000000 -s 'n = 1.345' 'n // 1'
10000000 loops, best of 3: 0.218 usec per loop
So floor division is only a faster by a small margin. Note that these values are very close, and I had to crank up the loop repeat count to iron out random influences on my machine. Even with such a high count, you need to repeat the experiments a few times to see how much the numbers still vary and what comes out faster most of the time.
This is logical, as int() requires a global lookup and a function call (so state is pushed and popped):
>>> import dis
>>> def use_int(n):
... return int(n)
...
>>> def use_floordiv(n):
... return n // 1
...
>>> dis.dis(use_int)
2 0 LOAD_GLOBAL 0 (int)
3 LOAD_FAST 0 (n)
6 CALL_FUNCTION 1
9 RETURN_VALUE
>>> dis.dis(use_floordiv)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (1)
6 BINARY_FLOOR_DIVIDE
7 RETURN_VALUE
It is the LOAD_GLOBAL and CALL_FUNCTION opcodes that are slower than the LOAD_CONST and BINARY_FLOOR_DIVIDE opcodes; LOAD_CONST is a simple array lookup, LOAD_GLOBAL needs to do a dictionary lookup instead.
Binding int() to a local name can make a small difference, giving it the edge again (as it has to do less work than // 1 floor division):
$ bin/python -mtimeit -n10000000 -s 'n = 1.345' 'int(n)'
10000000 loops, best of 3: 0.233 usec per loop
$ bin/python -mtimeit -n10000000 -s 'n = 1.345; int_=int' 'int_(n)'
10000000 loops, best of 3: 0.195 usec per loop
$ bin/python -mtimeit -n10000000 -s 'n = 1.345' 'n // 1'
10000000 loops, best of 3: 0.225 usec per loop
Again, you need to run this with 10 million loops to see the differences consistently.
That said, int(n) is a lot more explicit and unless you are doing this in a time-critical loop, int(n) wins it in readability over n // 1. The timing differences are too small to make the cognitive cost of having to work out what // 1 does here worthwhile.
Although Martijn Pieters answered your question of what is faster and how to test it I feel like speed isn't that important for such a small operation. I would use int() for readability as Inbar Rose said. Typically when dealing with something this small readability is far more important; although, a common equation can be an exception to this.
Actually, int seems to be faster than the division. The slow part is looking the function up in the global scope.
Here are my numbers if we avoid it:
$ python -mtimeit -s 'i=int; a=123.456' 'i(a)'
10000000 loops, best of 3: 0.122 usec per loop
$ python -mtimeit -s 'i=int; a=123.456' 'a//1'
10000000 loops, best of 3: 0.145 usec per loop
Notice that you are not converting from float to int using the floor division operator. The result of this operation is still a float. In Python 2.7.5 (CPython), n=n//1 is exactly the same thing of:
n.__floordiv__(1)
that is basically the same thing of:
n.__divmod__(1)[0]
both functions return a float instead of an int. Inside the CPython __divmod__ function, the denominator and numerator must be converted from PyObject to double. So, in this case, it's faster to use the floor function instead of the // operator, because only one conversion is needed.
from cmath import floor
n=floor(n)
In the case you really want to convert a float to integer, I don't think there is a way to beat up the int(n) performance.
Too long; didn't read:
Using float.__trunc__() is 30% faster than builtins.int()
I like long explanations:
#MartijnPieters trick to bind builtins.int is interesting indeed and it reminds me to An Optimization Anecdote. However, calling builtins.int is not the most efficient.
Let's take a look at this:
python -m timeit -n10000000 -s "n = 1.345" "int(n)"
10000000 loops, best of 5: 48.5 nsec per loop
python -m timeit -n10000000 -s "n = 1.345" "n.__trunc__()"
10000000 loops, best of 5: 33.1 nsec per loop
That's a 30% gain! What's happening here?
It turns out all builtints.int does is invoke the following method-chains:
If 1.345.__int__ is defined return 1.345.__int__() else:
If 1.345.__index__ is defined return 1.345.__index__() else:
If 1.345.__trunc__ is defined return 1.345.__trunc__()
1.345.__int__ is not defined 1 - and neither is 1.345.__index__. Therefore, directly calling 1.345.__trunc__() allow us to skip all the unnecessary method calls - which is relatively expensive.
What about the binding trick? Well float.__trunc__ is essentially just an instance method and we can pass 1.345 as the self argument.
python -m timeit -n10000000 -s "n = 1.345; f=int" "f(n)"
10000000 loops, best of 5: 43 nsec per loop
python -m timeit -n10000000 -s "n = 1.345; f=float.__trunc__" "f(n)"
10000000 loops, best of 5: 27.4 nsec per loop
Both methods improved as expected 2 and they maintain roughly the same ratio!
1 I'm not entirely certain about this - correct me if somebody knows otherwise.
2 This surprised me because I was under the impression that float.__trunc__ is binded to 1.345 during instance creation. It'd be great if anyone'd be kind enough to explain this to me.
There is also this method builtins.float.__floor__ that is not mentioned in the documentation - and is faster than builtins.int but slower than buitlins.float.__trunc__.
python -m timeit -n10000000 -s "n = 1.345; f=float.__floor__" "f(n)"
10000000 loops, best of 5: 32.4 nsec per loop
It seems to produce the same results on both negative and positive floats. Would be awesome if someone could explain how this fits among the other methods.
Just a statistical test to have a little fun - change the timeit tests to whatever you prefer:
import timeit
from scipy import mean, std, stats, sqrt
# Parameters:
reps = 100000
dups = 50
signif = 0.01
timeit_setup1 = 'i=int; a=123.456'
timeit_test1 = 'i(a)'
timeit_setup2 = 'i=int; a=123.456'
timeit_test2 = 'a//1'
#Some vars
t1_data = []
t2_data = []
frmt = '{:.3f}'
testformat = '{:<'+ str(max([len(timeit_test1), len(timeit_test2)]))+ '}'
def reportdata(mylist):
string = 'mean = ' + frmt.format(mean(mylist)) + ' seconds, st.dev. = ' + \
frmt.format(std(mylist))
return string
for i in range(dups):
t1_data.append(timeit.timeit(timeit_test1, setup = timeit_setup1,
number = reps))
t2_data.append(timeit.timeit(timeit_test2, setup = timeit_setup2,
number = reps))
print testformat.format(timeit_test1) + ':', reportdata(t1_data)
print testformat.format(timeit_test2) + ':', reportdata(t2_data)
ttest = stats.ttest_ind(t1_data, t2_data)
print 't-test: the t value is ' + frmt.format(float(ttest[0])) + \
' and the p-value is ' + frmt.format(float(ttest[1]))
isit = ''
if float(ttest[1]) > signif:
isit = "not "
print 'The difference of ' + \
'{:.2%}'.format(abs((mean(t1_data)-mean(t2_data))/mean(t1_data))) + \
' +/- ' + \
'{:.2%}'.format(3*sqrt((std(t1_data)**2 + std(t2_data)**2)/dups)) + \
' is ' + isit + 'significative.'

What is the fastest way to remove all instances of a particular entry from a list in Python?

Suppose you have a list that is n entries long. This list does not contain uniform data (some entries maybe strings, others integers, or even other lists). Assuming that list contains at least one instance of a given value, what is the fastest to remove all instances in that list?
I can think of two, a list comprehension, or .remove()
[item for item in lst if item != itemToExclude]
for i in range(lst.count(itemToExclude)): lst.remove(itemToExclude)
But I have no sense for which of these will be fastest for an arbitrarily large list, or if there are any other ways. As a side note, if someone could provide some guidelines for determining the speed of methods at a glance, I would greatly appreciate it!
Your method 1. will be faster in general because it iterates the list just once, in C code. The second method iterates through the list for the lst.count call firstly, and iterates from the start again every time lst.remove gets called!
To measure these things, use timeit.
It is also worth mentioning that the two methods you propose are doing slightly different things:
[item for item in lst if item != itemToExclude]
This creates a new list.
for i in range(lst.count(itemToExclude)): lst.remove(itemToExclude)
This modifies the existing list.
Your second solution is much less efficient than your first. count and remove both traverse the list, so to remove N copies of an item, you have to traverse the list N+1 times. Whereas the list comprehension only traverses the list once no matter how many copies there are.
Try this one:
filter(lambda x: x != itemToExclude, lst)
There are no Python-level loops here - the loop, going once over the data, is done "at C speed" (well, in CPython, "the usual" implementation).
test.py:
lst = range(100) * 100
itemToExclude = 1
def do_nothing(lst):
return lst
def listcomp(lst):
return [item for item in lst if item != itemToExclude]
def listgenerator(lst):
return list(item for item in lst if item != itemToExclude)
def remove(lst):
for i in range(lst.count(itemToExclude)):
lst.remove(itemToExclude)
def filter_lambda(lst):
return filter(lambda x: x != itemToExclude, lst)
import operator
import functools
def filter_functools(lst):
return filter(functools.partial(operator.ne, itemToExclude), lst)
lstcopy = list(lst)
remove(lstcopy)
assert(lstcopy == listcomp(list(lst)))
assert(lstcopy == listgenerator(list(lst)))
assert(lstcopy == filter_lambda(list(lst)))
assert(lstcopy == filter_functools(list(lst)))
Results:
$ python -mtimeit "import test; test.do_nothing(list(test.lst))"
10000 loops, best of 3: 26.9 usec per loop
$ python -mtimeit "import test; test.listcomp(list(test.lst))"
1000 loops, best of 3: 686 usec per loop
$ python -mtimeit "import test; test.listgenerator(list(test.lst))"
1000 loops, best of 3: 737 usec per loop
$ python -mtimeit "import test; test.remove(list(test.lst))"
100 loops, best of 3: 8.94 msec per loop
$ python -mtimeit "import test; test.filter_lambda(list(test.lst))"
1000 loops, best of 3: 994 usec per loop
$ python -mtimeit "import test; test.filter_functools(list(test.lst))"
1000 loops, best of 3: 815 usec per loop
So remove loses but the rest are pretty similar: the list comprehension may have the edge over filter. Obviously you can do the same thing for an input size, number of removed items, and type of item to remove, that are all more representative of your real intended use.

Possible to return two lists from a list comprehension?

Is it possible to return two lists from a list comprehension? Well, this obviously doesn't work, but something like:
rr, tt = [i*10, i*12 for i in xrange(4)]
So rr and tt both are lists with the results from i*10 and i*12 respectively.
Many thanks
>>> rr,tt = zip(*[(i*10, i*12) for i in xrange(4)])
>>> rr
(0, 10, 20, 30)
>>> tt
(0, 12, 24, 36)
Creating two comprehensions list is better (at least for long lists). Be aware that, the best voted answer is slower can be even slower than traditional for loops. List comprehensions are faster and clearer.
python -m timeit -n 100 -s 'rr=[];tt = [];' 'for i in range(500000): rr.append(i*10);tt.append(i*12)'
10 loops, best of 3: 123 msec per loop
> python -m timeit -n 100 'rr,tt = zip(*[(i*10, i*12) for i in range(500000)])'
10 loops, best of 3: 170 msec per loop
> python -m timeit -n 100 'rr = [i*10 for i in range(500000)]; tt = [i*10 for i in range(500000)]'
10 loops, best of 3: 68.5 msec per loop
It would be nice to see list comprehensionss supporting the creation of multiple lists at a time.
However,
if you can take an advantage of using a traditional loop (to be precise, intermediate calculations), then it is possible that you will be better of with a loop (or an iterator/generator using yield). Here is an example:
$ python3 -m timeit -n 100 -s 'rr=[];tt=[];' "for i in (range(1000) for x in range(10000)): tmp = list(i); rr.append(min(tmp));tt.append(max(tmp))"
100 loops, best of 3: 314 msec per loop
$ python3 -m timeit -n 100 "rr=[min(list(i)) for i in (range(1000) for x in range(10000))];tt=[max(list(i)) for i in (range(1000) for x in range(10000))]"
100 loops, best of 3: 413 msec per loop
Of course, the comparison in these cases are unfair; in the example, the code and calculations are not equivalent because in the traditional loop a temporary result is stored (see tmp variable). So, the list comprehension is doing much more internal operations (it calculates the tmp variable twice!, yet it is only 25% slower).
It is possible for a list comprehension to return multiple lists if the elements are lists.
So for example:
>>> x, y = [[] for x in range(2)]
>>> x
[]
>>> y
[]
>>>
The trick with zip function would do the job, but actually is much more simpler and readable if you just collect the results in lists with a loop.

Binary negation in python

I can't seem to find logical negation of integers as an operator anywhere in Python.
Currently I'm using this:
def not_(x):
assert x in (0, 1)
return abs(1-x)
But I feel a little stupid. Isn't there a built-in operator for this? The logical negation (not) returns a Boolean -- that's not really what I want. Is there a different operator, or a way to make not return an integer, or am I stuck with this dodgy workaround?
You can use:
int(not x)
to convert the boolean to 0 or 1.
Did you mean:
int(not(x))
? Assuming that any non-zero integer value is true and 0 is false you'll always get integer 0 or 1 as a result.
If what you expect is to get 1 when input is 0, and 0 and when input is 1, then XOR is your friend. You need to XOR your value with 1:
negate = lambda x: x ^ True
negate(0)
Out: 1
negate(1)
Out: 0
negate(False)
Out: True
negate(True)
Out: False
If you are looking for Bitwise Not, then ~ is what you are looking for. However, it works in the two's complement form.
This will raise a KeyError if x is not in (0,1)
def not_(x):
return {1:0,0:1}[x]
The tuple version would also accept -1 if you don't add a check for it, but is probably faster
def not_(x):
return (1,0)[x]
$ python -m timeit "(1,0)[0]"
10000000 loops, best of 3: 0.0629 usec per loop
$ python -m timeit "(1,0)[1]"
10000000 loops, best of 3: 0.0646 usec per loop
$ python -m timeit "1^1"
10000000 loops, best of 3: 0.063 usec per loop
$ python -m timeit "1^0"
10000000 loops, best of 3: 0.0638 usec per loop
$ python -m timeit "int(not(0))"
1000000 loops, best of 3: 0.354 usec per loop
$ python -m timeit "int(not(1))"
1000000 loops, best of 3: 0.354 usec per loop
$ python -m timeit "{1:0,0:1}[0]"
1000000 loops, best of 3: 0.446 usec per loop
$ python -m timeit "{1:0,0:1}[1]"
1000000 loops, best of 3: 0.443 usec per loop
You can use not but then convert result to integer.
int(False)
0
int(True)
1
I think Your approach is very good for two reasons:
It is fast, clear and understandable
It does error-checking
I assume that there cannot be such operator defined on the integers, because of the following problem: what to return if given value is not 0 or 1? Throw exception? Assume positive integers to mean 1? But negative integers?
Your approach defines concrete behaviour - accept only 0 or 1.
This can be easily done using some basic binary and string manipulation features in python
if x be an integer for which we want a bitwise negation, which is
called x_bar(learned in digital class :))
>>> x_bar = x^int('1'*len(bin(x).split('b')[1]),2)
>>> bin(x_bar) #returns the binary string representation of integer 'x'
bin(int_value) function returns the binary string representation of any integer eg: '0b11011011'
xor operation is done with '1's'

What is the most efficient way to concatenate two strings and remove everything before the first ',' in Python?

In Python, I have a string which is a comma separated list of values. e.g. '5,2,7,8,3,4'
I need to add a new value onto the end and remove the first value,
e.g.
'5,22,7,814,3,4' -> '22,7,814,3,4,1'
Currently, I do this as follows:
mystr = '5,22,7,814,3,4'
latestValue='1'
mylist = mystr.split(',')
mystr = ''
for i in range(len(mylist)-1):
if i==0:
mystr += mylist[i+1]
if i>0:
mystr += ','+mylist[i+1]
mystr += ','+latestValue
This runs millions of times in my code and I've identified it as a bottleneck, so I'm keen to optimize it to make it run faster.
What is the most efficient to do this (in terms of runtime)?
Use this:
if mystr == '':
mystr = latestValue
else:
mystr = mystr[mystr.find(",")+1:] + "," + latestValue
This should be much faster than any solution which splits the list. It only finds the first occurrence of , and "removes" the beginning of the string. Also, if the list is empty, then mystr will be just latestValue (insignificant overhead added by this) -- thanks Paulo Scardine for pointing that out.
_, sep, rest = mystr.partition(",")
mystr = rest + sep + latestValue
It also works without any changes if mystr is empty or a single item (without comma after it) due to str.partition returns empty sep if there is no sep in mystr.
You could use mystr.rstrip(",") before calling partition() if there might be a trailing comma in the mystr.
mystr = mystr.partition(",")[2]+","+latestValue
improvement suggested by Paulo to work if mystr has < 2 elements.
In the case of 0 elements, it does extend mystr to hold one element.
_,_,mystr = (mystr+','+latestValue).partition(',')
$ python -m timeit -s "mystr = '5,22,7,814,3,4';latestValue='1'" "mystr[mystr.find(',')+1:]+','+latestValue"
1000000 loops, best of 3: 0.847 usec per loop
$ python -m timeit -s "mystr = '5,22,7,814,3,4';latestValue='1'" "mystr = mystr.partition(',')[2]+','+latestValue"
1000000 loops, best of 3: 0.703 usec per loop
best version: gnibbler's answer
Since you need speed (millions of times is a lot), I profiled. This one is about twice as fast as splitting the list:
i = 0
while 1:
if mystr[i] == ',': break
i += 1
mystr = mystr[i+1:] + ', ' + latest_value
It assumes that there is one space after each comma. If that's a problem, you can use:
i = 0
while 1:
if mystr[i] == ',': break
i += 1
mystr = mystr[i+1:].strip() + ', ' + latest_value
which is only slightly slower than the original but much more robust. It's really up to you to decide how much speed you need to squeeze out of it. They both assume that there will be a comma in the string and will raise an IndexError if one fails to appear. The safe version is:
i = 0
while 1:
try:
if mystr[i] == ',': break
except IndexError:
i = -1
break
i += 1
mystr = mystr[i+1:].strip() + ', ' + latest_value
Again, this is still significantly faster than than splitting the string but does add robustness at the cost of speed.
Here's the timeit results. You can see that the fourth method is noticeably faster than the third (most robust) method, but slightly slower than the first two methods. It's the fastest of the two robust solutions though so unless you are sure that your strings will have commas in them (i.e. it would already be considered an error if they didn't) then I would use it anyway.
$ python -mtimeit -s'from strings import tests, method1' 'method1(tests[0], "10")'
1000000 loops, best of 3: 1.34 usec per loop
$ python -mtimeit -s'from strings import tests, method2' 'method2(tests[0], "10")'
1000000 loops, best of 3: 1.34 usec per loop
$ python -mtimeit -s'from strings import tests, method3' 'method3(tests[0], "10")'
1000000 loops, best of 3: 1.5 usec per loop
$ python -mtimeit -s'from strings import tests, method4' 'method4(tests[0], "10")'
1000000 loops, best of 3: 1.38 usec per loop
$ python -mtimeit -s'from strings import tests, method5' 'method5(tests[0], "10")'
100000 loops, best of 3: 1.18 usec per loop
This is gnibbler's answer
mylist = mystr.split(',')
mylist.append(latestValue);
mystr = ",".join(mylist[1:])
String concatenation in python isn't very efficient (since strings are immutable). It's easier to work with them as lists (and more efficient). Basically in your code you are copying your string over and over again each time you concatenate to it.
Edited:
Not the best, but I love one-liners. :-)
mystr = ','.join(mystr.split(',')[1:]+[latestValue])
Before testing I would bet it would perform better.
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'" \
"mylist = mystr.split(',')" "mylist.append(latestValue);" \
"mystr = ','.join(mylist[1:])"
1000000 loops, best of 3: 1.37 usec per loop
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'"\
"','.join(mystr.split(',')[1:]+[latestValue])"
1000000 loops, best of 3: 1.5 usec per loop
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'"\
'mystr=mystr[mystr.find(",")+1:]+","+latestValue'
1000000 loops, best of 3: 0.625 usec per loop

Categories