"Non-equal" or "greater than" is quicker?

"Non-equal" or "greater than" is quicker? - python

I wonder which of the following is done quicker for a tuple (also for a list or an int) :
a_tuple = ('a', 'b',)
if (len(a_tuple) != 0): pass
if (len(a_tuple) > 0): pass
I did some timeit experiment and the result is rather similar (vary each time I run timeit for 100000 iterations). I just wonder if there is a time benefit.

Use not a_tuple (True if empty) or tuple (True if not empty) instead of testing for the length:
if a_tuple:
pass
Or, as a demonstration speaks louder than words:
>>> if not ():
... print('empty!')
...
empty!
>>> if (1, 0):
... print('not empty!')
...
not empty!
Apart from the fact that this is a micro optimization, testing for the falsy-ness of the empty tuple is faster too. When in doubt about speed, use the timeit module:
>>> import timeit
>>> a_tuple = (1,0)
>>> def ft_bool():
... if a_tuple:
... pass
...
>>> def ft_len_gt():
... if len(a_tuple) > 0:
... pass
...
>>> def ft_len_ne():
... if len(a_tuple) != 0:
... pass
...
>>> timeit.timeit('ft()', 'from __main__ import ft_bool as ft')
0.17232918739318848
>>> timeit.timeit('ft()', 'from __main__ import ft_len_gt as ft')
0.2506139278411865
>>> timeit.timeit('ft()', 'from __main__ import ft_len_ne as ft')
0.23904109001159668

Related

Improving index search speed in a large (million entry) list

I have a list which is a million items long of random, repeatable integers. I need to sort that list, and then find the index of the first iteration of every unique element in the list. When I do this, I am running into run time >5 minutes long. Can anyone give me any suggestions to speed up my code? An example of my process is shown below.
import random
a = []
for x in range(1000000):
a.append(random.randint(1,10000))
unique_a = set(a)
inds=[0]
inds = [a.index(i) for i in sorted(unique_a) if i not in inds]

inds = [a.index(i) for i in sorted(unique_a) if i not in inds] is implicitly quadratic is a.index(i) is linear. Use a dictionary to grab the indices in one pass over the sorted list:
a =sorted([0,4,3,5,21,5,6,3,1,23,4,6,1,93,34,10])
unique_a = set(a)
first_inds = {}
for i,x in enumerate(a):
if not x in first_inds:
first_inds[x] = i
my_inds = [first_inds[x] for x in sorted(unique_a)]

Just store the first position for every unique element:
first_position = {}
for i, value in enumerate(a):
if value not in first_position:
first_position[value] = i
And then replace a.index(i) for first_position[i]
Or just use:
_, indices = zip(*sorted(first_position.items()))

You can use the bisect_left function from the standard library's bisect module to do this. On a sorted list, a bisection search is faster than searching through the list as index does.
>>> L = [random.randint(0, 10) for _ in range(100)]
>>> L.sort()
>>> L.index(9)
83
>>> bisect.bisect_left(L, 9)
83
>>> timeit.timeit(setup="from __main__ import L", stmt="L.index(9)")
2.1408978551626205
>>> timeit.timeit(setup="from __main__ import L;from bisect import bisect_left", stmt="bisect_left(L, 9)")
0.5187544231303036
On my machine, using bisect.bisect_left is faster than iterating over the list and accumulating indexes on the way:
>>> L = [random.randint(0, 100) for _ in range(10000)]
>>> L.sort()
>>> def iterative_approach(list_):
... unique = set(list_)
... first_inds = {}
... for i, x in enumerate(list_):
... if x not in first_inds:
... first_inds[x] = i
... return [first_inds[x] for x in sorted(unique)]
...
>>> ia = iterative_approach(L)
>>> bisect_left = bisect.bisect_left
>>> def bisect_approach(list_):
... unique = set(list_)
... out = {}
... for x in unique:
... out[x] = bisect_left(list_, x)
... return [out[x] for x in sorted(unique)]
...
>>> ba = bisect_approach(L)
>>> ia == ba
True
>>> timeit.timeit(setup="from __main__ import L, iterative_approach", stmt="iterative_approach(L)")
1488.956467495067
>>> timeit.timeit(setup="from __main__ import L, bisect_approach", stmt="bisect_approach(L)")
407.6803469741717

Max index of repeating elements divisible by 'n' in a List in Python

I have a sorted list of numbers like:
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
I need to find the max index of each values which is divisible by 100.
Output should be like: 4,10,15
My Code:
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
idx = 1
for i in (a):
if i%100 == 0:
print idx
idx = idx+1
Output of above code:
4
9
10
13
14
15

In case people are curious, I benchmarked the dict comprehension technique against the backward iteration technique. Dict comprehension is about twice the speed. Changing to OrderedDict resulted in MASSIVE slowdown. About 15x slower than the dict comprehension.
def test1():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
max_index = {}
for i, item in enumerate(a[::-1]):
if item not in max_index:
max_index[item] = len(a) - (i + 1)
return max_index
def test2():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
return {item: index for index, item in enumerate(a, 1)}
def test3():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
OrderedDict((item, index) for index, item in enumerate(a, 1))
if __name__ == "__main__":
import timeit
print(timeit.timeit("test1()", setup="from __main__ import test1"))
print(timeit.timeit("test2()", setup="from __main__ import test2"))
print(timeit.timeit("test3()", setup="from __main__ import test3; from collections import OrderedDict"))
3.40622282028
1.97545695305
26.347012043

Use a simple dict-comprehension or OrderedDict with divisible items as the keys, old values will be replaced by newest values automatically.
>>> {item: index for index, item in enumerate(lst, 1) if not item % 100}.values()
dict_values([4, 10, 15])
# if order matters
>>> from collections import OrderedDict
>>> OrderedDict((item, index) for index, item in enumerate(lst, 1) if not item % 100).values()
odict_values([4, 10, 15])
Another way will be to loop over reversed list and use a set to keep track of items seen so far(lst[::-1] may be slightly faster than reversed(lst) for tiny lists).
>>> seen = set()
>>> [len(lst) - index for index, item in enumerate(reversed(lst))
if not item % 100 and item not in seen and not seen.add(item)][::-1]
[4, 10, 15]
You can see the sort-of equivalent code of the above here.

You could use itertools.groupby since your data is sorted:
>>> a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
>>> from itertools import groupby
>>> [list(g)[-1][0] for k,g in groupby(enumerate(a), lambda t: (t[1] % 100, t[1])) if k[0] == 0]
[3, 9, 14]
Although this is a little cryptic.
Here's a complicated approach using only a list-iterator and accumulating into a list:
>>> run, prev, idx = False, None, []
>>> for i, e in enumerate(a):
... if not (e % 100 == 0):
... if not run:
... prev = e
... continue
... idx.append(i - 1)
... run = False
... else:
... if prev != e and run:
... idx.append(i - 1)
... run = True
... prev = e
...
>>> if run:
... idx.append(i)
...
>>> idx
[3, 9, 14]
I think this is best dealt with a dictionary approach like #AshwiniChaudhary It is more straightforward, and much faster:
>>> timeit.timeit("{item: index for index, item in enumerate(a, 1)}", "from __main__ import a")
1.842843743012054
>>> timeit.timeit("[list(g)[-1][0] for k,g in groupby(enumerate(a), lambda t: (t[1] % 100, t[1])) if k[0] == 0]", "from __main__ import a, groupby")
8.479677081981208
The groupby approach is pretty slow, note, the complicated approach is faster, and not far-off form the dict-comprehension approach:
>>> def complicated(a):
... run, prev, idx = False, None, []
... for i, e in enumerate(a):
... if not (e % 100 == 0):
... if not run:
... prev = e
... continue
... idx.append(i - 1)
... run = False
... else:
... if prev != e and run:
... idx.append(i - 1)
... run = True
... prev = e
... if run:
... idx.append(i)
... return idx
...
>>> timeit.timeit("complicated(a)", "from __main__ import a, complicated")
2.6667005629860796
Edit Note, the performance difference narrows if we call list on the dict-comprehension .values():
>>> timeit.timeit("list({item: index for index, item in enumerate(a, 1)}.values())", "from __main__ import a")
2.3839886570058297
>>> timeit.timeit("complicated(a)", "from __main__ import a, complicated")
2.708565960987471

it seemed like a good idea at the start, got a bit twisty, had to patch a couple of cases...
a = [0,77,98,99,100,101,102,198,199,200,200,278,299,300,300,300, 459, 700,700]
bz = [*zip(*((i, d//100) for i, d in enumerate(a) if d%100 == 0 and d != 0))]
[a for a, b, c in zip(*bz, bz[1][1:]) if c-b != 0] + [bz[0][-1]]
Out[78]: [4, 10, 15, 18]
enumerate, zip to create bz which mates 100's numerator(s) with indices
bz = [*zip(*((i, d//100) for i, d in enumerate(a) if d%100 == 0 and d != 0))]
print(*bz, sep='\n')
(4, 9, 10, 13, 14, 15, 17, 18)
(1, 2, 2, 3, 3, 3, 7, 7)
then zip again, zip(*bz, bz[1][1:]) lagging the numerator tuple to allow the lagged difference to give a selection logic if c-b != 0for the last index of each run but the last
add the last 100's match because its always the end of the last run + [bz[0][-1]]

Efficiency of dict.get() method

I have just started learning python and worried that if I use dict.get(key,default_value) or I define my own method for it....so do they have any differences:
[1st method]:
dict={}
for c in string:
if c in dict:
dict[c]+=1
else:
dict[c]=1
and the other dict.get() method that python provides
for c in string:
dict[c]=dict.get(c,0)+1
do they have any differences on efficiency or speed...or they are just the same and 2nd one only saves writing few more lines of code...

For this specific case, use either a collections.Counter() or a collections.defaultdict() object instead:
import collections
dct = collections.defaultdict(int)
for c in string:
dict[c] += 1
or
dct = collections.Counter(string)
Both are subclasses of the standard dict type. The Counter type adds some more helpful functionality like summing two counters or listing the most common entities that have been counted. The defaultdict class can also be given other default types; use defaultdict(list) for example to collect things into lists per key.
When you want to compare performance of two different approaches, you want to use the timeit module:
>>> import timeit
>>> def intest(dct, values):
... for c in values:
... if c in dct:
... dct[c]+=1
... else:
... dct[c]=1
...
>>> def get(dct, values):
... for c in values:
... dct[c] = dct.get(c, 0) + 1
...
>>> values = range(10) * 10
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, intest as test; dct={}')
22.210275888442993
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, get as test; dct={}')
27.442166090011597
This shows that using in is a little faster.
There is, however, a third option to consider; catching the KeyError exception:
>>> def tryexcept(dct, values):
... for c in values:
... try:
... dct[c] += 1
... except KeyError:
... dct[c] = 1
...
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, tryexcept as test; dct={}')
18.023509979248047
which happens to be the fastest, because only 1 in 10 cases are for a new key.
Last but not least, the two alternatives I proposed:
>>> def default(dct, values):
... for c in values:
... dct[c] += 1
...
>>> timeit.timeit('test(dct, values)', 'from __main__ import values, default as test; from collections import defaultdict; dct=defaultdict(int)')
15.277361154556274
>>> timeit.timeit('Counter(values)', 'from __main__ import values; from collections import Counter')
38.657804012298584
So the Counter() type is slowest, but defaultdict is very fast indeed. Counter()s do a lot more work though, and the extra functionality can bring ease of development and execution speed benefits elsewhere.

Fastest way to check if a string contains specific characters in any of the items in a list

What is the fastest way to check if a string contains some characters from any items of a list?
Currently, I'm using this method:
lestring = "Text123"
lelist = ["Text", "foo", "bar"]
for x in lelist:
if lestring.count(x):
print 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)
Is there any way to do it without iteration (which will make it faster I suppose.)?

You can try list comprehension with membership check
>>> lestring = "Text123"
>>> lelist = ["Text", "foo", "bar"]
>>> [e for e in lelist if e in lestring]
['Text']
Compared to your implementation, though LC has an implicit loop but its faster as there is no explicit function call as in your case with count
Compared to Joe's implementation, yours is way faster, as the filter function would require to call two functions in a loop, lambda and count
>>> def joe(lelist, lestring):
return ''.join(random.sample(x + 'b'*len(x), len(x)))
>>> def uz(lelist, lestring):
for x in lelist:
if lestring.count(x):
return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)
>>> def ab(lelist, lestring):
return [e for e in lelist if e in lestring]
>>> t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
>>> t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
>>> t_joe = timeit.Timer("joe(lelist, lestring)", setup="from __main__ import lelist, lestring, joe")
>>> t_ab.timeit(100000)
0.09391469893125759
>>> t_uz.timeit(100000)
0.1528471407273173
>>> t_joe.timeit(100000)
1.4272649857800843
Jamie's commented solution is slower for shorter string's. Here is the test result
>>> def jamie(lelist, lestring):
return next(itertools.chain((e for e in lelist if e in lestring), (None,))) is not None
>>> t_jamie = timeit.Timer("jamie(lelist, lestring)", setup="from __main__ import lelist, lestring, jamie")
>>> t_jamie.timeit(100000)
0.22237164127909637
If you need Boolean values, for shorter strings, just modify the above LC expression
[e in lestring for e in lelist if e in lestring]
Or for longer strings, you can do the following
>>> next(e in lestring for e in lelist if e in lestring)
True
or
>>> any(e in lestring for e in lelist)

filter(lambda x: lestring.count(x), lelist)
That will return all the strings that you're trying to find as a list.

if the test is to see if there are any characters in common (not words or segments), create a set out of the letters in the list and then check the letters agains the string:
char_list = set(''.join(list_of_words))
test_set = set(string_to_teat)
common_chars = char_list.intersection(test_set)
However I'm assuming you're looking for as little as one character in common...

The esmre library does the trick. In your case, the simpler, esm (part of esmre) is what you want.
https://pypi.python.org/pypi/esmre/
https://code.google.com/p/esmre/
They have good documentation and examples:
Taken from their examples:
>>> import esm
>>> index = esm.Index()
>>> index.enter("he")
>>> index.enter("she")
>>> index.enter("his")
>>> index.enter("hers")
>>> index.fix()
>>> index.query("this here is history")
[((1, 4), 'his'), ((5, 7), 'he'), ((13, 16), 'his')]
>>> index.query("Those are his sheep!")
[((10, 13), 'his'), ((14, 17), 'she'), ((15, 17), 'he')]
>>>
I ran some performance tests:
import random, timeit, string, esm
def uz(lelist, lestring):
for x in lelist:
if lestring.count(x):
return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)
def ab(lelist, lestring):
return [e for e in lelist if e in lestring]
def use_esm(index, lestring):
return index.query(lestring)
for TEXT_LEN in [5, 50, 1000]:
for SEARCH_LEN in [5, 20]:
for N in [5, 50, 1000, 10000]:
if TEXT_LEN < SEARCH_LEN:
continue
print 'TEXT_LEN:', TEXT_LEN, 'SEARCH_LEN:', SEARCH_LEN, 'N:', N
lestring = ''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(TEXT_LEN)))
lelist = [''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(SEARCH_LEN))) for _
in range(N)]
index = esm.Index()
for i in lelist:
index.enter(i)
index.fix()
t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
t_esm = timeit.Timer("use_esm(index, lestring)", setup="from __main__ import index, lestring, use_esm")
ab_time = t_ab.timeit(1000)
uz_time = t_uz.timeit(1000)
esm_time = t_esm.timeit(1000)
min_time = min(ab_time, uz_time, esm_time)
print ' ab%s: %f' % ('*' if ab_time == min_time else '', ab_time)
print ' uz%s: %f' % ('*' if uz_time == min_time else '', uz_time)
print ' esm%s %f:' % ('*' if esm_time == min_time else '', esm_time)
And got that results depends mostly on the number of items that one is looking for (in my case, 'N'):
TEXT_LEN: 1000 SEARCH_LEN: 20 N: 5
ab*: 0.001733
uz: 0.002512
esm 0.126853:
TEXT_LEN: 1000 SEARCH_LEN: 20 N: 50
ab*: 0.017564
uz: 0.023701
esm 0.079925:
TEXT_LEN: 1000 SEARCH_LEN: 20 N: 1000
ab: 0.370371
uz: 0.489523
esm* 0.133783:
TEXT_LEN: 1000 SEARCH_LEN: 20 N: 10000
ab: 3.678790
uz: 4.883575
esm* 0.259605:

Use frozenset as a pair in python

I would like to make a pair of two elements. I don't care about the order of the elements, so I use frozenset.
I can think of the following two methods to iterate the elements back from the frozenset. Isn't there any fancier method? Thanks in advance.
pair = frozenset([element1, element2])
pair2 = list(pair)
elem1 = pair2[0]
elem2 = pair2[1]
pair = frozenset([element1, element2])
elems = []
for elem in pair:
elems.append(elem)
elem1 = elems[0]
elem2 = elems[1]

pair = frozenset([element1, element2])
elem1, elem2 = pair

If you have a lot of those pair things, using frozenset() is NOT a good idea. Use tuples instead.
>>> import sys
>>> fs1 = frozenset([42, 666])
>>> fs2 = frozenset([666, 42])
>>> fs1 == fs2
True
>>> t1 = tuple(sorted([42, 666]))
>>> t2 = tuple(sorted([666, 42]))
>>> t1 == t2
True
>>> sys.getsizeof(fs1)
116
>>> sys.getsizeof(t1)
36
>>>
Update Bonus: sorted tuples have a predictable iteration sequence:
>>> for thing in fs1, fs2, t1, t2: print [x for x in thing]
...
[42, 666]
[666, 42]
[42, 666]
[42, 666]
>>>
Update 2 ... and their repr() is the same:
>>> repr(fs1)
'frozenset([42, 666])'
>>> repr(fs2)
'frozenset([666, 42])' # possible source of confusion
>>> repr(t1)
'(42, 666)'
>>> repr(t2)
'(42, 666)'
>>>

If it is just two elements you are de-sequence them. But I am not sure, what you are trying to do here with the frozenset
>>> s = frozenset([1,2])
>>> s
frozenset({1, 2})
>>> x,y = s
>>> x
1
>>> y
2

Just to elaborate on an above comment, and assuming your elements are easily sortable, you could make an unordered pair class from tuple using:
class Pair(tuple):
def __new__(cls, seq=()):
assert len(seq) == 2
return tuple.__new__(tuple, sorted(seq))
Then you get:
>>> Pair((0, 1))
(0, 1)
>>> Pair((1, 0))
(0, 1)
>>> Pair((0, 1)) == Pair((1, 0))
True

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

"Non-equal" or "greater than" is quicker? - python

Related

Improving index search speed in a large (million entry) list

Max index of repeating elements divisible by 'n' in a List in Python

Efficiency of dict.get() method

Fastest way to check if a string contains specific characters in any of the items in a list

Use frozenset as a pair in python

Categories

Resources