map string sequence with a condition - python

I am trying to use map to avoid loop in Python in order to get better performance. my code is
def fun(s):
result = []
for i in range(len(s)-1):
if (s[i:i+2]=="ab"):
result.append(s[:i]+"cd"+s[i+2:])
return result
My guess for the function is:
def fun(s):
return map(lambda s : s[:i]+"cd"+s[i+2:] if s[i:i+2]=="ab", s)
However, I do not know how to associate i with s in this case... And the function above is wrong in syntax.
Anyone could help?
-------------------------------------------------------Add explanation-------------------------------------------------------
A lot of people are confused why I do this. The idea simply comes from Python performance document(see Loop section) and Guido's article. I am just learning.
Big thanks to #gboffi, perfect and neat answer!

A Possible Solution
I've written the function using two auxiliary definitions, but if you want you can write it as a one liner,
def fun(s):
substitute = lambda i: s[:i]+'cd'+s[i+2:]
match = lambda i: s[i:i+2]=='ab'
return map(substitute, filter(match, range(len(s)-1)))
it works by creating a list of indices for which s[i:i+2] matches 'ab' using filter and mapping the string substitution function only for the indices that matched.
Timings
It is apparent that there is a large overhead due to the compilation of the lambdas at each invocation but furtunately it is easy to test this hypotesis
In [41]: def fun(s):
result = []
for i in range(len(s)-1):
if (s[i:i+2]=="ab"):
result.append(s[:i]+"cd"+s[i+2:])
return result
....:
In [42]: def fun2(s):
substitute = lambda i: s[:i]+'cd'+s[i+2:]
match = lambda i: s[i:i+2]=='ab'
return map(substitute, filter(match, range(len(s)-1)))
....:
In [43]: %timeit fun('aaaaaaabaaaabaaabaaab')
100000 loops, best of 3: 2.38 µs per loop
In [44]: %timeit fun2('aaaaaaabaaaabaaabaaab')
100000 loops, best of 3: 3.74 µs per loop
In [45]: %timeit fun('aaaaaaabaaaabaaabaaab'*1000)
10 loops, best of 3: 33.7 ms per loop
In [46]: %timeit fun2('aaaaaaabaaaabaaabaaab'*1000)
10 loops, best of 3: 33.8 ms per loop
In [47]:
for a short string the map version is 50% slower, while for a very long string the timings are asymptotically equal...

First, I don't think that map has a performance advantage over a for loop.
If 's' is large then you may use xrange instead of range https://docs.python.org/2/library/functions.html#xrange
Second map can not filter elements, it can only map them to a new value.
You may use a comprehension instead of a for loop, but i don't think you get a performance advantage either.

Related

Python Best Practice for Looping Backwards in For Loop without reversing the array

In doing problems on Leetcode etc it's often required to iterate from the end of the array to the front and I'm used to more traditional programming languages where the for loop is less awkward i.e. for(int i = n; i >= 0; i--) where n is the last index of the array but in Python I find that I'm doing something like this for i in range(n,-1,-1) which looks a bit awkward so I just wanted to know if there was something more elegant. I know that I can reverse the array by doing array[::-1] and then loop as per usual with for range but that's not really what I want to do since it adds computational complexity to the problem.
Use reversed which doesn't create a new list, but instead creates a a reverse iterator, and allows you to iterate in reverse:
a = [1, 2, 3, 4, 5]
for n in reversed(a):
print(n)
Just a comparison of three methods.
array = list(range(100000))
def go_by_index():
for i in range(len(array)-1,-1,-1):
array[i]
def revert_array_directly():
for n in array[::-1]:
n
def reversed_fn():
for n in reversed(array):
n
%timeit go_by_index()
%timeit revert_array_directly()
%timeit reversed_fn()
Outputs
100 loops, best of 3: 4.84 ms per loop
100 loops, best of 3: 2.01 ms per loop
1000 loops, best of 3: 1.49 ms per loop
The time difference is visible, but as you may see, the second and the third option is not that different, especially if the array of interest is of small or medium size.

what is the most efficient way to find the position of the first np.nan value?

consider the array a
a = np.array([3, 3, np.nan, 3, 3, np.nan])
I could do
np.isnan(a).argmax()
But this requires finding all np.nan just to find the first.
Is there a more efficient way?
I've been trying to figure out if I can pass a parameter to np.argpartition such that np.nan get's sorted first as opposed to last.
EDIT regarding [dup].
There are several reasons this question is different.
That question and answers addressed equality of values. This is in regards to isnan.
Those answers all suffer from the same issue my answer faces. Note, I provided a perfectly valid answer but highlighted it's inefficiency. I'm looking to fix the inefficiency.
EDIT regarding second [dup].
Still addressing equality and question/answers are old and very possibly outdated.
It might also be worth to look into numba.jit; without it, the vectorized version will likely beat a straight-forward pure-Python search in most scenarios, but after compiling the code, the ordinary search will take the lead, at least in my testing:
In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [70]: %paste
import numba
def naive(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
def short(a):
return np.isnan(a).argmax()
#numba.jit
def naive_jit(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
#numba.jit
def short_jit(a):
return np.isnan(a).argmax()
## -- End pasted text --
In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop
In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 µs per loop
In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 µs per loop
In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop
Edit: As pointed out by #hpaulj in their answer, numpy actually ships with an optimized short-circuited search whose performance is comparable with the JITted search above:
In [26]: %paste
def plain(a):
return a.argmax()
#numba.jit
def plain_jit(a):
return a.argmax()
## -- End pasted text --
In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop
In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 µs per loop
In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 µs per loop
In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 µs per loop
I'll nominate
a.argmax()
With #fuglede's test array:
In [1]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [2]: np.isnan(a).argmax()
Out[2]: 9999
In [3]: np.argmax(a)
Out[3]: 9999
In [4]: a.argmax()
Out[4]: 9999
In [5]: timeit a.argmax()
The slowest run took 29.94 ....
10000 loops, best of 3: 20.3 µs per loop
In [6]: timeit np.isnan(a).argmax()
The slowest run took 7.82 ...
1000 loops, best of 3: 462 µs per loop
I don't have numba installed, so can compare that. But my speedup relative to short is greater than #fuglede's 6x.
I'm testing in Py3, which accepts <np.nan, while Py2 raises a runtime warning. But the code search suggests this isn't dependent on that comparison.
/numpy/core/src/multiarray/calculation.c PyArray_ArgMax plays with axes (moving the one of interest to the end), and delegates the action to arg_func = PyArray_DESCR(ap)->f->argmax, a function that depends on the dtype.
In numpy/core/src/multiarray/arraytypes.c.src it looks like BOOL_argmax short circuits, returning as soon as it encounters a True.
for (; i < n; i++) {
if (ip[i]) {
*max_ind = i;
return 0;
}
}
And #fname#_argmax also short circuits on maximal nan. np.nan is 'maximal' in argmin as well.
#if #isfloat#
if (#isnan#(mp)) {
/* nan encountered; it's maximal */
return 0;
}
#endif
Comments from experienced c coders are welcomed, but it appears to me that at least for np.nan, a plain argmax will be as fast you we can get.
Playing with the 9999 in generating a shows that the a.argmax time depends on that value, consistent with short circuiting.
Here is a pythonic approach using itertools.takewhile():
from itertools import takewhile
sum(1 for _ in takewhile(np.isfinite, a))
Benchmark with generator_expression_within_next approach: 1
In [118]: a = np.repeat(a, 10000)
In [120]: %timeit next(i for i, j in enumerate(a) if np.isnan(j))
100 loops, best of 3: 12.4 ms per loop
In [121]: %timeit sum(1 for _ in takewhile(np.isfinite, a))
100 loops, best of 3: 11.5 ms per loop
But still (by far) slower than numpy approach:
In [119]: %timeit np.isnan(a).argmax()
100000 loops, best of 3: 16.8 µs per loop
1. The problem with this approach is using enumerate function. Which returns an enumerate object from the numpy array first (which is an iterator like object) and calling the generator function and next attribute of the iterator will take time.
When looking for the first match in various scenarios, we could iterate through and look for the first match and exit out on the first match rather than going/processing the entire array. So, we would have an approach using Python's next function , like so -
next((i for i, val in enumerate(a) if np.isnan(val)))
Sample runs -
In [192]: a = np.array([3, 3, np.nan, 3, 3, np.nan])
In [193]: next((i for i, val in enumerate(a) if np.isnan(val)))
Out[193]: 2
In [194]: a[2] = 10
In [195]: next((i for i, val in enumerate(a) if np.isnan(val)))
Out[195]: 5

Is it faster to make a variable for the length of a string?

I am implementing a reverse(s) function in Python 2.7 and I made a code like this:
# iterative version 1
def reverse(s):
r = ""
for c in range(len(s)-1, -1, -1):
r += s[c];
return r
print reverse("Be sure to drink your Ovaltine")
But for each iteration, it gets the length of the string even though it's been deducted.
I made another version that
# iterative version 2
def reverse(s):
r = ""
l = len(s)-1
for c in range(l, -1, -1):
r += s[c];
return r
print reverse("Be sure to drink your Ovaltine")
This version remembers the length of the string and doesn't ask for it every iteration, is this faster for longer strings (like a string that has the length of 1024) than the first version or does it have no effect at all?
In [12]: %timeit reverse("Be sure to drink your Ovaltine")
100000 loops, best of 3: 2.53 µs per loop
In [13]: %timeit reverse1("Be sure to drink your Ovaltine")
100000 loops, best of 3: 2.55 µs per loop
reverse is your first method, reverse1 is the second.
As you can see from timing there is very little difference in the performance.
You can use Ipython to time your code with the above syntax, just def your functions and use %timeit and then your function and whatever parameters .
In the line
for c in range(len(s)-1, -1, -1):
len(s) is evaluated only once, and the result (minus one) passed as an argument to range. Therefore the two versions are almost identical - if anything, the latter may be (very) slightly slower, as it creates a new name to assign the result of the subtraction.

Limit loop with condition

I have a list of let's say 500,000 entries, with each being a tuple such as (val1, val2).
Currently, I am looping through the list and inside the loop, I have a condition such as:
if val2 == someval:
do_something()
break
However, I was wondering if there was a faster way to loop through elements on a certain condition, such as only looping through items where val2 == someval, rather than the entire list THEN doing the check.
What about taking it from the other side:
if someval in lst:
my_action(somewal)
The test of somewal membership in lst also requires a loop, but this runs in more optimized code in C, so it might be faster.
In [49]: x = 3
In [50]: %timeit x in [1, 2, 3]
10000000 loops, best of 3: 53.8 ns per loop
In [51]: %timeit x == 1 or x == 2 or x == 3
10000000 loops, best of 3: 85.5 ns per loop
In [52]: x = 1
In [53]: %timeit x in [1, 2, 3]
10000000 loops, best of 3: 38.5 ns per loop
In [54]: %timeit x == 1 or x == 2 or x == 3
10000000 loops, best of 3: 38.4 ns per loop
Here you can see, that for numbers, which are "soon" in the test, the time difference is neglectable, but for "later on" it is faster to test membership.
More realistic measurements case: having range of 500000 numbers, testing presence of a number in the middle:
In [64] lst = range(500000)
In [65]: %%timeit
250000 in lst
....:
100 loops, best of 3: 2.66 ms per loop
In [66]: %%timeit
for i in lst:
if i == 250000:
break
....:
100 loops, best of 3: 6.6 ms per loop
The time needed drops down to 40% with membership test x in lst
I'm not so sure there is a faster way. As I see it, you have to first find "val2" in the list, which in my experience requires a loop.
This code for example will iterate the loop and will print val1
only if val2 == someval.
for val1, val2 in some_list:
if val2 != someval:
continue
print val1
You seem to be asking two different questions. One in which the first time you see something equal to someval you break, the other where you only look through those that are equal to someval. For the latter, i.e.:
"However, I was wondering if there was a faster way to loop through elements on a certain condition, such as only looping through items where val2 == someval, rather than the entire list THEN doing the check."
You can do:
for i in filter(lambda t: t[1] == someval, val_list):
stuff
Or via list comprehension:
for i in [x for x in val_list if x[1] == someval]:
stuff
My guess is that one of these is faster.
there is no way to avoid the loop or the if; the answers suggesting there is a faster way are mistaken; filter and list comprehensions will not improve matters a single bit; in fact, unless you use generator expressions (which are lazily evaluated), comprehensions (as well as filter) will make this (potentially much) slower and memory consuming. And generator expressions will not improve the performance either.
there is no way to make it faster other than rewriting in a language such as C or Java or using PyPy or Cython. for x in ...: if x ...: do_smth() is already the fastest possible way. Of course depending on your data, you can actually build the data structure (that has 500 000 items) in a way that is always sorted, so you could potentially have to loop over only the beginning of the list. Or possibly collecting the items satisfying a certain condition into a separate list/set/whatnot, which will yield very good results later by completely avoiding the filtering and a full loop iteration.
You have to "see" all the elements in the list to able to decide if at any point val2 == someval and your list isn't sorted on the second value in the tuple, so looping over all the elements can't be avoided.
However, you can make sure the method you use to loop over the list is as efficient as possible. For instance, instead of using a for statement, you may use list comprehensions to filter out the values that satisfy val2 == someval and then do something if the returned list isn't empty. I say "may" because it really depends on the distribution of your data; whether it's useful to you to have all values for which val2 == someval holds true and performing some action etc.
If you're using Python 3.x then "list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2".

Fast check for NaN in NumPy

I'm looking for the fastest way to check for the occurrence of NaN (np.nan) in a NumPy array X. np.isnan(X) is out of the question, since it builds a boolean array of shape X.shape, which is potentially gigantic.
I tried np.nan in X, but that seems not to work because np.nan != np.nan. Is there a fast and memory-efficient way to do this at all?
(To those who would ask "how gigantic": I can't tell. This is input validation for library code.)
Ray's solution is good. However, on my machine it is about 2.5x faster to use numpy.sum in place of numpy.min:
In [13]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 244 us per loop
In [14]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 97.3 us per loop
Unlike min, sum doesn't require branching, which on modern hardware tends to be pretty expensive. This is probably the reason why sum is faster.
edit The above test was performed with a single NaN right in the middle of the array.
It is interesting to note that min is slower in the presence of NaNs than in their absence. It also seems to get slower as NaNs get closer to the start of the array. On the other hand, sum's throughput seems constant regardless of whether there are NaNs and where they're located:
In [40]: x = np.random.rand(100000)
In [41]: %timeit np.isnan(np.min(x))
10000 loops, best of 3: 153 us per loop
In [42]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop
In [43]: x[50000] = np.nan
In [44]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 239 us per loop
In [45]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.8 us per loop
In [46]: x[0] = np.nan
In [47]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 326 us per loop
In [48]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop
I think np.isnan(np.min(X)) should do what you want.
There are two general approaches here:
Check each array item for nan and take any.
Apply some cumulative operation that preserves nans (like sum) and check its result.
While the first approach is certainly the cleanest, the heavy optimization of some of the cumulative operations (particularly the ones that are executed in BLAS, like dot) can make those quite fast. Note that dot, like some other BLAS operations, are multithreaded under certain conditions. This explains the difference in speed between different machines.
import numpy as np
import perfplot
def min(a):
return np.isnan(np.min(a))
def sum(a):
return np.isnan(np.sum(a))
def dot(a):
return np.isnan(np.dot(a, a))
def any(a):
return np.any(np.isnan(a))
def einsum(a):
return np.isnan(np.einsum("i->", a))
b = perfplot.bench(
setup=np.random.rand,
kernels=[min, sum, dot, any, einsum],
n_range=[2 ** k for k in range(25)],
xlabel="len(a)",
)
b.save("out.png")
b.show()
Even there exist an accepted answer, I'll like to demonstrate the following (with Python 2.7.2 and Numpy 1.6.0 on Vista):
In []: x= rand(1e5)
In []: %timeit isnan(x.min())
10000 loops, best of 3: 200 us per loop
In []: %timeit isnan(x.sum())
10000 loops, best of 3: 169 us per loop
In []: %timeit isnan(dot(x, x))
10000 loops, best of 3: 134 us per loop
In []: x[5e4]= NaN
In []: %timeit isnan(x.min())
100 loops, best of 3: 4.47 ms per loop
In []: %timeit isnan(x.sum())
100 loops, best of 3: 6.44 ms per loop
In []: %timeit isnan(dot(x, x))
10000 loops, best of 3: 138 us per loop
Thus, the really efficient way might be heavily dependent on the operating system. Anyway dot(.) based seems to be the most stable one.
If you're comfortable with numba it allows to create a fast short-circuit (stops as soon as a NaN is found) function:
import numba as nb
import math
#nb.njit
def anynan(array):
array = array.ravel()
for i in range(array.size):
if math.isnan(array[i]):
return True
return False
If there is no NaN the function might actually be slower than np.min, I think that's because np.min uses multiprocessing for large arrays:
import numpy as np
array = np.random.random(2000000)
%timeit anynan(array) # 100 loops, best of 3: 2.21 ms per loop
%timeit np.isnan(array.sum()) # 100 loops, best of 3: 4.45 ms per loop
%timeit np.isnan(array.min()) # 1000 loops, best of 3: 1.64 ms per loop
But in case there is a NaN in the array, especially if it's position is at low indices, then it's much faster:
array = np.random.random(2000000)
array[100] = np.nan
%timeit anynan(array) # 1000000 loops, best of 3: 1.93 µs per loop
%timeit np.isnan(array.sum()) # 100 loops, best of 3: 4.57 ms per loop
%timeit np.isnan(array.min()) # 1000 loops, best of 3: 1.65 ms per loop
Similar results may be achieved with Cython or a C extension, these are a bit more complicated (or easily avaiable as bottleneck.anynan) but ultimatly do the same as my anynan function.
use .any()
if numpy.isnan(myarray).any()
numpy.isfinite maybe better than isnan for checking
if not np.isfinite(prop).all()
Related to this is the question of how to find the first occurrence of NaN. This is the fastest way to handle that that I know of:
index = next((i for (i,n) in enumerate(iterable) if n!=n), None)
Adding to #nico-schlömer and #mseifert 's answers, I computed the performance of a numba-test has_nan with early stops, compared to some of the functions that will parse the full array.
On my machine, for an array without nans, the break-even happens for ~10^4 elements.
import perfplot
import numpy as np
import numba
import math
def min(a):
return np.isnan(np.min(a))
def dot(a):
return np.isnan(np.dot(a, a))
def einsum(a):
return np.isnan(np.einsum("i->", a))
#numba.njit
def has_nan(a):
for i in range(a.size - 1):
if math.isnan(a[i]):
return True
return False
def array_with_missing_values(n, p):
""" Return array of size n, p : nans ( % of array length )
Ex : n=1e6, p=1 : 1e4 nan assigned at random positions """
a = np.random.rand(n)
p = np.random.randint(0, len(a), int(p*len(a)/100))
a[p] = np.nan
return a
#%%
perfplot.show(
setup=lambda n: array_with_missing_values(n, 0),
kernels=[min, dot, has_nan],
n_range=[2 ** k for k in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
)
What happens if the array has nans ? I investigated the impact of the nan-coverage of the array.
For arrays of length 1,000,000, has_nan becomes a better option is there are ~10^-3 % nans (so ~10 nans) in the array.
#%%
N = 1000000 # 100000
perfplot.show(
setup=lambda p: array_with_missing_values(N, p),
kernels=[min, dot, has_nan],
n_range=np.array([2 ** k for k in range(20)]) / 2**20 * 0.01,
logy=True,
xlabel=f"% of nan in array (N = {N})",
)
If in your application most arrays have nan and you're looking for ones without, then has_nan is the best approach.
Else; dot seems to be the best option.

Categories