If as the input you provide the (integer) power, what is the fastest way to create the corresponding power of ten? Here are four alternatives I could come up with, and the fastest way seems to be using an f-string:
from functools import partial
from time import time
import numpy as np
def fstring(power):
return float(f'1e{power}')
def asterisk(power):
return 10**power
methods = {
'fstring': fstring,
'asterisk': asterisk,
'pow': partial(pow, 10),
'np.pow': partial(np.power, 10, dtype=float)
}
# "dtype=float" is necessary because otherwise it will raise:
# ValueError: Integers to negative integer powers are not allowed.
# see https://stackoverflow.com/a/43287598/5472354
powers = [int(i) for i in np.arange(-10000, 10000)]
for name, method in methods.items():
start = time()
for i in powers:
method(i)
print(f'{name}: {time() - start}')
Results:
fstring: 0.008975982666015625
asterisk: 0.5190775394439697
pow: 0.4863283634185791
np.pow: 0.046906232833862305
I guess the f-string approach is the fastest because nothing is actually calculated, though it only works for integer powers of ten, whereas the other methods are more complicated operations that also work with any real number as the base and power. So is the f-string actually the best way to go about it?
You're comparing apples to oranges here. 10 ** n computes an integer (when n is non-negative), whereas float(f'1e{n}') computes a floating-point number. Those won't take the same amount of time, but they solve different problems so it doesn't matter which one is faster.
But it's worse than that, because there is the overhead of calling a function, which is included in your timing for all of your alternatives, but only some of them actually involve calling a function. If you write 10 ** n then you aren't calling a function, but if you use partial(pow, 10) then you have to call it as a function to get a result. So you're not actually comparing the speed of 10 ** n fairly.
Instead of rolling your own timing code, use the timeit library, which is designed for doing this properly. The results are in seconds for 1,000,000 repetitions (by default), or equivalently they are the average time in microseconds for one repetiton.
Here's a comparison for computing integer powers of 10:
>>> from timeit import timeit
>>> timeit('10 ** n', setup='n = 500')
1.09881673199925
>>> timeit('pow(10, n)', setup='n = 500')
1.1821871869997267
>>> timeit('f(n)', setup='n = 500; from functools import partial; f = partial(pow, 10)')
1.1401332350014854
And here's a comparison for computing floating-point powers of 10: note that computing 10.0 ** 500 or 1e500 is pointless because the result is simply an OverflowError or inf.
>>> timeit('10.0 ** n', setup='n = 200')
0.12391662099980749
>>> timeit('pow(10.0, n)', setup='n = 200')
0.17336435099969094
>>> timeit('f(n)', setup='n = 200; from functools import partial; f = partial(pow, 10.0)')
0.18887039500077663
>>> timeit('float(f"1e{n}")', setup='n = 200')
0.44305286100097874
>>> timeit('np.power(10.0, n, dtype=float)', setup='n = 200; import numpy as np')
1.491982370000187
>>> timeit('f(n)', setup='n = 200; from functools import partial; import numpy as np; f = partial(np.power, 10.0, dtype=float)')
1.6273324920002779
So the fastest of these options in both cases is the obvious one: 10 ** n for integers and 10.0 ** n for floats.
Another contender for the floats case, precompute all possible nonzero finite results and look them up:
0.0 if n < -323 else f[n] if n < 309 else inf
The preparation:
f = [10.0 ** i for i in [*range(309), *range(-323, 0)]]
inf = float('inf')
Benchmark with kaya3's exponent n = 200 as well as n = -200 as negative exponent with nonzero result and n = -5000 / n = 5000 as medium-size negative/positive exponents from your original range:
n = 200
487 ns 487 ns 488 ns float(f'1e{n}')
108 ns 108 ns 108 ns 10.0 ** n
128 ns 129 ns 130 ns 10.0 ** n if n < 309 else inf
72 ns 73 ns 73 ns 0.0 if n < -323 else f[n] if n < 309 else inf
n = -200
542 ns 544 ns 545 ns float(f'1e{n}')
109 ns 109 ns 110 ns 10.0 ** n
130 ns 130 ns 131 ns 10.0 ** n if n < 309 else inf
76 ns 76 ns 76 ns 0.0 if n < -323 else f[n] if n < 309 else inf
n = -5000
291 ns 291 ns 291 ns float(f'1e{n}')
99 ns 99 ns 100 ns 10.0 ** n
119 ns 120 ns 120 ns 10.0 ** n if n < 309 else inf
34 ns 34 ns 34 ns 0.0 if n < -323 else f[n] if n < 309 else inf
n = 5000
292 ns 293 ns 293 ns float(f'1e{n}')
error error error 10.0 ** n
33 ns 33 ns 33 ns 10.0 ** n if n < 309 else inf
53 ns 53 ns 53 ns 0.0 if n < -323 else f[n] if n < 309 else inf
Benchmark code (Try it online!):
from timeit import repeat
solutions = [
"float(f'1e{n}')",
'10.0 ** n',
'10.0 ** n if n < 309 else inf',
'0.0 if n < -323 else f[n] if n < 309 else inf',
]
for n in 200, -200, -5000, 5000:
print(f'{n = }')
setup = f'''
n = {n}
f = [10.0 ** i for i in [*range(309), *range(-323, 0)]]
inf = float('inf')
'''
for solution in solutions:
try:
ts = sorted(repeat(solution, setup))[:3]
except OverflowError:
ts = [None] * 3
print(*('%3d ns ' % (t * 1e3) if t else ' error ' for t in ts), solution)
print()
You could try it with a logarithmic approach using math.log and math.exp but the range of values will be limited (which you can handle with try/except).
This seems to be just as fast as fstring if not a bit faster.
import math
ln10 = math.log(10)
def mPow(power):
try:
return math.exp(ln10*power)
except:
return 0 if power<0 else math.inf
[EDIT] Given that we are constrained by the capabilities of floats, we might as well just prepare a list with the 617 possible powers of 10 (that can be held in a float) and get the answer by index:
import math
minP10,maxP10 = -308,308
powersOf10 = [10**i for i in range(maxP10+1)]+[10**i for i in range(minP10,0)]
def tenPower(power):
if power < minP10: return 0
if power > maxP10: return math.inf
return powersOf10[power] # negative indexes for powers -308...-1
This one is definitely faster than fstring
Related
I have one pretty large np.array a (10,000-50,000 elements, each coordinates (x,y)) and another larger np.array b (100,000-200,000 coordinates). I need to remove as quickly as possible the elements of a that are not present in b and leave only the elements of a that are present in b. All coordinates are integers. For example:
a = np.array([[2,5],[6,3],[4,2],[1,4]])
b = np.array([[2,7],[4,2],[1,5],[6,3]])
Desired output:
a
>> [6,3],[4,2]
What is the fastest way of doing this for arrays of the size I mentioned?
I am OK with solutions that use any other packages or imports too (e.g., converting to a base Python list or set, using Pandas, etc.) besides those within Numpy.
This appears to depend a lot on the array size and "sparseness" (likely due to hash table magic).
The answer from Get intersecting rows across two 2D numpy arrays is the so_8317022 function.
The takeaways seem to be (on my machine) that:
the Pandas approach has an edge with large sparse sets
set intersection is very, very fast with small array sizes (though admittedly it returns a set, not a numpy array)
the other Numpy answer can be faster than set intersection with larger array sizes.
from collections import defaultdict
import numpy as np
import pandas as pd
import timeit
import matplotlib.pyplot as plt
def pandas_merge(a, b):
return pd.DataFrame(a).merge(pd.DataFrame(b)).to_numpy()
def set_intersection(a, b):
return set(map(tuple, a.tolist())) & set(map(tuple, b.tolist()))
def so_8317022(a, b):
nrows, ncols = a.shape
dtype = {
"names": ["f{}".format(i) for i in range(ncols)],
"formats": ncols * [a.dtype],
}
C = np.intersect1d(a.view(dtype), b.view(dtype))
return C.view(a.dtype).reshape(-1, ncols)
def test_fn(f, a, b):
number, time_taken = timeit.Timer(lambda: f(a, b)).autorange()
return number / time_taken
def test(size, max_coord):
a = np.random.default_rng().integers(0, max_coord, size=(size, 2))
b = np.random.default_rng().integers(0, max_coord, size=(size, 2))
return {fn.__name__: test_fn(fn, a, b) for fn in (pandas_merge, set_intersection, so_8317022)}
series = []
datas = defaultdict(list)
for size in (100, 1000, 10000, 100000):
for max_coord in (50, 500, 5000):
print(size, max_coord)
series.append((size, max_coord))
for fn, result in test(size, max_coord).items():
datas[fn].append(result)
print("size", "sparseness", "func", "ops/sec")
for fn, values in datas.items():
for (size, max_coord), value in zip(series, values):
print(size, max_coord, fn, int(value))
The results on my machine are
size
sparseness
func
ops/sec
100
50
pandas_merge
895
100
500
pandas_merge
777
100
5000
pandas_merge
708
1000
50
pandas_merge
740
1000
500
pandas_merge
751
1000
5000
pandas_merge
660
10000
50
pandas_merge
513
10000
500
pandas_merge
460
10000
5000
pandas_merge
436
100000
50
pandas_merge
11
100000
500
pandas_merge
61
100000
5000
pandas_merge
49
100
50
set_intersection
42281
100
500
set_intersection
44050
100
5000
set_intersection
43584
1000
50
set_intersection
3693
1000
500
set_intersection
3234
1000
5000
set_intersection
3900
10000
50
set_intersection
453
10000
500
set_intersection
287
10000
5000
set_intersection
300
100000
50
set_intersection
47
100000
500
set_intersection
13
100000
5000
set_intersection
13
100
50
so_8317022
8927
100
500
so_8317022
9736
100
5000
so_8317022
7843
1000
50
so_8317022
698
1000
500
so_8317022
746
1000
5000
so_8317022
765
10000
50
so_8317022
89
10000
500
so_8317022
48
10000
5000
so_8317022
57
100000
50
so_8317022
10
100000
500
so_8317022
3
100000
5000
so_8317022
3
Not sure if this is the fastest way to do it, but if you turn it into a pandas index you can use its intersection method. Since it is using low-level c-code under the hood, the intersection step is probably pretty fast, but converting it into a pandas index may take some time
import numpy as np
import pandas as pd
a = np.array([[2, 5], [6, 3], [4, 2], [1, 4]])
b = np.array([[2, 7], [4, 2], [1, 5], [6, 3]])
df_a = pd.DataFrame(a).set_index([0, 1])
df_b = pd.DataFrame(b).set_index([0, 1])
intersection = df_a.index.intersection(df_b.index)
Result look like this
print(intersection.values)
[(6, 3) (4, 2)]
EDIT2:
Out of curiosity I made a comparison between the methods. Now with a larger list of indices. I have compared my first index method with a slightly improved method which does not require to create a dataframe first, but immediately creates the index, and then with the dataframe merge method proposed as well.
This is the code
from random import randint, seed
import time
import numpy as np
import pandas as pd
seed(0)
n_tuple = 100000
i_min = 0
i_max = 10
a = [[randint(i_min, i_max), randint(i_min, i_max)] for _ in range(n_tuple)]
b = [[randint(i_min, i_max), randint(i_min, i_max)] for _ in range(n_tuple)]
np_a = np.array(a)
np_b = np.array(b)
def method0(a_array, b_array):
index_a = pd.DataFrame(a_array).set_index([0, 1]).index
index_b = pd.DataFrame(b_array).set_index([0, 1]).index
return index_a.intersection(index_b).to_numpy()
def method1(a_array, b_array):
index_a = pd.MultiIndex.from_arrays(a_array.T)
index_b = pd.MultiIndex.from_arrays(b_array.T)
return index_a.intersection(index_b).to_numpy()
def method2(a_array, b_array):
df_a = pd.DataFrame(a_array)
df_b = pd.DataFrame(b_array)
return df_a.merge(df_b).to_numpy()
def method3(a_array, b_array):
set_a = {(_[0], _[1]) for _ in a_array}
set_b = {(_[0], _[1]) for _ in b_array}
return set_a.intersection(set_b)
for cnt, intersect in enumerate([method0, method1, method2, method3]):
t0 = time.time()
if cnt < 3:
intersection = intersect(np_a, np_b)
else:
intersection = intersect(a, b)
print(f"method{cnt}: {time.time() - t0}")
The output looks like:
method0: 0.1439347267150879
method1: 0.14012742042541504
method2: 4.740894317626953
method3: 0.05933070182800293
Conclusion: the merge method of dataframes (method2) is about 50 times slower than using intersections on the index. The version based on multiindex (method1) is only slightly faster than method0 (my first proposal)
EDIT2: As proposed by the comment of #AKX: if you do not use numpy but pure list and sets, you can again gain a speed up of about a factor 3. But it is clear you should not used the merge method.
I am trying to calculate Pollard rho number using python for very long integers such below one
65779646778470582047547160396995720887221575959770627441205850493179860146690755880473849736210807898494458426111244201404810495587574110361900128405354081638434164434968839614760264675889940272767106444249
I have tried to calculate on my intel core i9 10980HK CPU, which results for few minutes high load work without any success. I am trying to use numba with #njit decorator to connect RTX 2070 super (on laptop) but it gives below error.
- argument 0: Int value is too large:
Here the code:
import numpy as np
import datetime
def pgcd(a,b):
if b==0:
return a
else:
r=a%b
return pgcd(b,r)
def pollardrho(n):
f = lambda z: z*z+1
x, y, d = 1, 1, 1
c = 0
while d==1:
c += 1
x = f(x) % n
y = f(f(y)) % n
d = pgcd(y-x, n)
return d, c
def test_time(n):
t = datetime.datetime.now()
d, c = pollardrho(int(n))
tps = datetime.datetime.now() - t
print(tps, c, d)
file = open("some_powersmooths_large.txt", "r")
for line in file.readlines():
if not line.startswith("#"):
print(line)
print("\n")
test_time(line)
print("\n")
How can I handle this type of big number calculations.
Part 1 (of 2, see Part 2 below).
Numba works only with 64-bit integers at most, it has no big integer arithmetic, only Python has. Big integers will be supported in future versions as developers of Numba promise. You need big integer arithmetics because you have very large integers in your inputs and calculations.
One optimization suggestion for you is to use GMPY2 Python library. It is highly-optimized library of long arithmetics, considerably faster than regular Python implementation of long arithmetics. For very large integers for example it implements multiplication using Fast Fourier Transform which is fastest available algorithm of multiplication.
But GMPY2 can be a bit challenging to install. Most recent precompiled versions for Windows are available by this link. Download .whl file for your version of Python and install it through pip, e.g. for my Windows 64-bit Python 3.7 I downloaded and installed pip install gmpy2-2.0.8-cp37-cp37m-win_amd64.whl. For Linux it is easiest to install through sudo apt install -y python3-gmpy2.
After using GMPY2 your code will become as fast as possible, because this library code is almost fastest in the world. Even Numba (if it had long arithmetics) would not improve more. Only faster formulas and better algorithm can help to improve further, or smaller input integers.
But your example large integers is a way to large for your algorithm even with GMPY2. You have to choose smaller integer or faster algorithm. I've run your algorithm and number for 5 or more minutes and didn't get result. But still if before result would be in 1 hour with regular Python then after using GMPY2 it may be done in 10 minutes or faster.
Also not very sure but probably in your algorithm f(f(y)) % n should be equivalent to f(f(y) % n) % n which should be computed probably faster as it will do twice shorter multiplication. But this needs extra checking.
Also your large integer appeared to be prime, as proven by Primo elliptic curve based primality proving program, it proved primality of this integer in 3 seconds on my PC. Primo only proves primality (with 100% guarantee) but doesn't factor the number (splitting into divisors). Factoring numbers can be done by programs from this list, these programs implement fastest known factoring algorithms, if some links are dead then Google those programs names.
Just wrap all integers n into gmpy2.mpz(n). For example I improved your code a bit, wrapped into gmpy2.mpz() and also made a loop so that all divisors are printed. Also as an example I took not your large prime but a much smaller - first 25 digits of Pi, which is composite, all of its divisors are printed in 7 second on my PC:
Try it online!
import datetime, numpy as np, gmpy2
def num(n):
return gmpy2.mpz(n)
zero, one = num(0), num(1)
def pgcd(a, b):
if b == zero:
return a
else:
r = a % b
return pgcd(b, r)
def pollardrho(n):
f = lambda z: z * z + one
x, y, d = one, one, one
c = 0
while d == one:
c += 1
x = f(x) % n
y = f(f(y)) % n
d = pgcd(y - x, n)
return d, c
def test_time(n):
n = num(int(n))
divs = []
while n > 1:
t = datetime.datetime.now()
d, c = pollardrho(num(int(n)))
tps = datetime.datetime.now() - t
print(tps, c, d, flush = True)
divs.append(d)
assert n % d == 0, (n, d)
n = n // d
print('All divisors:\n', ' '.join(map(str, divs)), sep = '')
test_time(1415926535897932384626433)
#test_time(65779646778470582047547160396995720887221575959770627441205850493179860146690755880473849736210807898494458426111244201404810495587574110361900128405354081638434164434968839614760264675889940272767106444249)
Output:
0:00:00 2 7
0:00:00 10 223
0:00:00.000994 65 10739
0:00:00.001999 132 180473
0:00:07.278999 579682 468017117899
All divisors:
7 223 10739 180473 468017117899
Part 2
Reading Wikipedia articles (here and here), I decided to implement a faster version of Pollard-Rho algorithm.
My version implemented below looks more complex but does twice less divisions and multiplications, also on average does less iterations of loop in total.
This improvements result in running time of 3 minutes for my test case, compared to original OP's algorithm with running time of 7 minutes, on my laptop.
My algorithm is also randomized, meaning that instead of witness 1 or 2 it takes random witness in range [1, N - 2]. In rare cases it may fail as said in Wikipedia then I rerun algorithm with different witness. Also it uses Fermat primality test to check if the input number is prime, then doesn't search for any more divisors.
For tests I used input number p generated by code p = 1; for i in range(256): p *= random.randrange(2, 1 << 32), basically it is composed of 256 factors each 32-bits at most.
Also I improved both algorithms to output more statistics. One of statistics params is pow which shows the complexity of each step, pow of 0.25 tells that if divisor is d then current factoring step spent c = d^0.25 iterations to find this divisor d. As told in Wikipedia Pollard-Rho algorithm should have on average pow = 0.25, meaning that complexity (number of iterations) of finding any divisor d is around d^0.25.
In next code there are also other improvements like providing a lot of statistics on the way. And finding all factors in the loop.
My version of algorithm for my test case has average pow of 0.24, original previous version has 0.3. Smaller pow means doing less loop iterations on the average.
Also tested my version with and without GMPY2. Apparently GMPY2 gives not much improvement over regular Python big integer arithmetic, mainly because GMPY2 is more optimized for really big numbers (tens thousands of bits) (using Fast Fourier Transform multiplication, etc), while here number are not to big in my test. But still GMPY2 gives speedup around 1.35x times providing running time of 3 minutes compared to almost 4 minutes without GMPY2 for same algorithm. To test with or without gmpy2 you need just to change inside def num(n) function either to return gmpy2.mpz(n) or to return n.
Try it online!
import datetime, numpy as np, gmpy2, random, math
random.seed(0)
def num(n):
return gmpy2.mpz(n)
zero, one = num(0), num(1)
def gcd(a, b):
while b != zero:
a, b = b, a % b
return a
def pollard_rho_v0(n):
f = lambda z: z * z + one
n, x, y, d, c, t = num(n), one, one, one, 0, datetime.datetime.now()
while d == one:
c += 1
x = f(x) % n
y = f(f(y)) % n
d = gcd(y - x, n)
return d, {'i': c, 'n_bits': n.bit_length(), 'd_bits': round(math.log(d) / math.log(2), 2),
'pow': round(math.log(max(c, 1)) / math.log(d), 4), 'time': str(datetime.datetime.now() - t)}
def is_fermat_prp(n, trials = 32):
n = num(n)
for i in range(trials):
a = num((3, 5, 7)[i] if i < 3 else random.randint(2, n - 2))
if pow(a, n - 1, n) != 1:
return False
return True
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
# https://ru.wikipedia.org/wiki/%D0%A0%D0%BE-%D0%B0%D0%BB%D0%B3%D0%BE%D1%80%D0%B8%D1%82%D0%BC_%D0%9F%D0%BE%D0%BB%D0%BB%D0%B0%D1%80%D0%B4%D0%B0
def pollard_rho_v1(N):
AbsD = lambda a, b: a - b if a >= b else b - a
N, fermat_prp, t = num(N), None, datetime.datetime.now()
SecsPassed = lambda: (datetime.datetime.now() - t).total_seconds()
for j in range(8):
i, stage, y, x = 0, 2, num(1), num(random.randint(1, N - 2))
while True:
if (i & 0x3FF) == 0 and fermat_prp is None and (SecsPassed() >= 15 or j > 0):
fermat_prp = is_fermat_prp(N)
if fermat_prp:
r = N
break
r = gcd(N, AbsD(x, y))
if r != one:
break
if i == stage:
y = x
stage <<= one
x = (x * x + one) % N
i += 1
if r != N or fermat_prp:
return r, {'i': i, 'j': j, 'n_bits': N.bit_length(), 'd_bits': round(math.log(r) / math.log(2), 2),
'pow': round(math.log(max(i, 1)) / math.log(r), 4), 'fermat_prp': fermat_prp, 'time': str(datetime.datetime.now() - t)}
assert False, f'Pollard-Rho failed after {j + 1} trials! N = {N}'
def factor(n, *, ver = 1):
assert n > 0, n
n, divs, pows, tt = int(n), [], 0., datetime.datetime.now()
while n != 1:
d, stats = (pollard_rho_v0, pollard_rho_v1)[ver](n)
print(d, stats)
assert d > 1, (d, n)
divs.append(d)
assert n % d == 0, (d, n)
n = n // d
pows += min(1, stats['pow'])
print('All divisors:\n', ' '.join(map(str, divs)), sep = '')
print('Avg pow', round(pows / len(divs), 3), ', total time', datetime.datetime.now() - tt)
return divs
p = 1
for i in range(256):
p *= random.randrange(2, 1 << 32)
factor(p, ver = 1)
Output:
................
267890969 {'i': 25551, 'j': 0, 'n_bits': 245, 'd_bits': 28.0, 'pow': 0.523,
'fermat_prp': None, 'time': '0:00:02.363004'}
548977049 {'i': 62089, 'j': 0, 'n_bits': 217, 'd_bits': 29.03, 'pow': 0.5484,
'fermat_prp': None, 'time': '0:00:04.912002'}
3565192801 {'i': 26637, 'j': 0, 'n_bits': 188, 'd_bits': 31.73, 'pow': 0.4633,
'fermat_prp': None, 'time': '0:00:02.011999'}
1044630971 {'i': 114866, 'j': 0, 'n_bits': 156, 'd_bits': 29.96, 'pow': 0.5611,
'fermat_prp': None, 'time': '0:00:06.666996'}
3943786421 {'i': 60186, 'j': 0, 'n_bits': 126, 'd_bits': 31.88, 'pow': 0.4981,
'fermat_prp': None, 'time': '0:00:01.594000'}
3485918759 {'i': 101494, 'j': 0, 'n_bits': 94, 'd_bits': 31.7, 'pow': 0.5247,
'fermat_prp': None, 'time': '0:00:02.161004'}
1772239433 {'i': 102262, 'j': 0, 'n_bits': 63, 'd_bits': 30.72, 'pow': 0.5417,
'fermat_prp': None, 'time': '0:00:01.802996'}
2706462217 {'i': 0, 'j': 1, 'n_bits': 32, 'd_bits': 31.33, 'pow': 0.0,
'fermat_prp': True, 'time': '0:00:00.925801'}
All divisors:
258498 4 99792 121 245864 25 81 2 238008 70 39767 23358624 79 153 27 65 1566 2 31 13 57 1776 446 20 2 3409311 814 37 595384977 2 24 5 147 3738 4514 8372 7 38 237996 430 43 240 1183 10404 11 10234 30 2615625 1263 44590 240 3 101 231 2 79488 799236 2 88059 1578 432500 134 20956 101 3589 155 2471 91 7 6 100608 1995 33 9 181 48 5033 20 16 15 305 44 927 49 76 13 1577 46 144 292 65 2 111890 300 368 430705 6 368 381 1578812 4290 10 48 565 2 2 23606 23020 267 4186 5835 33 4 899 6288 3534 129064 34 315 36190 16900 6 60291 2 12 111631 463 2500 1405 1959 22 112 2 228 3 2192 2 28 321618 4 44 125924200164 9 17956 4224 2848 16 7 162 4 573 843 48 101 224460324 4 768 3 2 8 154 256 2 3 51 784 34 48 14 369 218 9 12 27 152 2 256 2 51 9 9411903 2 131 9 71 6 3 13307904 85608 35982 121669 93 3 3 121 7967 11 20851 19 289 4237 3481 289 89 11 11 121 841 5839 2071 59 29 17293 9367 110801 196219 2136917 631 101 3481 323 101 19 32129 29 19321 19 19 29 19 6113 509 193 1801 347 71 83 1373 191 239 109 1039 2389 1867 349 353 1566871 349 561971 199 1429 373 1231 103 1048871 83 1681 1481 3673 491 691 1709 103 49043 911 673 1427 4147027 569 292681 2153 6709 821 641 569 461 239 2111 2539 6163 3643 5881 2143 7229 593 4391 1531 937 1721 1873 3761 1229 919 178207 54637831 8317 17903 3631 6841 2131 4157 3467 2393 7151 56737 1307 10663 701 2522350423 4253 1303 13009 7457 271549 12391 36131 4943 6899 27077 4943 7723 4567 26959 9029 2063 6607 4721 14563 8783 38803 1889 1613 20479 16231 1847 41131 52201 37507 224351 13757 36299 3457 21739 107713 51169 17981 29173 2287 16253 386611 132137 9181 29123 740533 114769 2287 61553 21121 10501 47269 59077 224951 377809 499729 6257 5903 59999 126823 85199 29501 34589 518113 39409 411667 146603 1044091 312979 291569 158303 41777 115133 508033 154799 13184621 167521 3037 317711 206827 1254059 455381 152639 95531 1231201 494381 237689 163327 651331 351053 152311 103669 245683 1702901 46337 151339 6762257 57787 38959 366343 609179 219749 2058253 634031 263597 540517 1049051 710527 2343527 280967 485647 1107497 822763 862031 583139 482837 1586621 782107 371143 763549 10740361 1372963 62589077 1531627 31991 1206173 678901 4759373 5877959 178439 1736369 687083 53508439 99523 10456609 942943 2196619 376081 802453 10254457 2791597 3231757 2464793 66598351 1535867 16338167 1138639 882953 1483693 12624373 35717041 6427979 5653181 6421873 1434131 1258889 108462803 859667 64298779 261810191 5743483 32314969 5080721 8961767 68011043 7528799 2086957 41618389 19999663 118428929 45556487 40462109 22478363 29039737 17366957 77805557 12775951 50890837 22666991 14892133 691979 133920733 115526921 29092501 2332124099 16835209 101301479 29987047 160734341 35904857 12376361 17774983 2397907 525367681 245240591 48159641 45590383 87274531 69160309 256092673 7430783 588029137 91286513 75817271 393556847 1183839551 71513537 593809903 200299807 161799857 537099259 21510427 335791301 382965337 156133297 180373937 75136921 364790017 174932509 117559207 601612421 54539711 2107325149 566372699 102467207 321156893 1024847609 1250224901 1038888437 3029169139 345512147 4127597891 1043830063 267890969 548977049 3565192801 1044630971 3943786421 3485918759 1772239433 2706462217
Avg pow 0.238 , total time 0:03:48.193658
PS. Also decided to implement minimalistic but fast version of Pollard-Rho factorization algorithm, pure Pythonic, ready for copy-pasting into any project (for example of factoring first 25 digits of Pi):
Try it online!
def factor(n):
import itertools, math
if n <= 1:
return []
x = 2
for cycle in itertools.count(1):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d > 1:
return [d] + factor(n // d)
print(factor(1415926535897932384626433))
# [7, 223, 180473, 10739, 468017117899]
Given that the operation in pollardrho is very inefficient, I am not surprised that the operation takes a while.
However, I don't know that particular function, so I don't know if it could be made more efficient.
In Python, integers have an arbitrary length.
What this means is that they can be of any length, and Python itself will handle storing it properly using 64-bit integers (by spreading them over multiple of them).
(You can test this for yourself by for example creating an integer that cannot be stored in a 64-bit unsigned integer, like a = 2**64, and then checking the output of the a.bit_length() method, which should say 65)
So, theoretically speaking, you should be able to calculate any integer.
However, because you are using Numba, you are limited to integers that can actually be stored within a 64-bit unsigned integer due to the way Numba works.
The error you are getting is simply the number becoming too large to store in a 64-bit unsigned integer.
Bottom line: Without Numba, you can calculate that number just fine. With Numba, you cannot.
Of course, if you only want to know roughly what the number is, and not precisely, you can instead just use floats.
i have the data like this
ID 8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb LowerBound UpperBound
001 618 720 645 573 503 447 - -
002 62 80 67 94 81 65 - -
003 32 10 23 26 26 31 - -
004 22 13 1 28 19 25 - -
005 9 7 9 6 8 4 - -
I want to create two columns with lower bounds and upper bounds for each product using 95% confidence intervals. I know manual way of writing a function which loops through each product ID
import numpy as np
import scipy as sp
import scipy.stats
# Method copied from http://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data
def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * sp.stats.t._ppf((1+confidence)/2., n-1)
return m-h, m+h
Is there an efficient way in Pandas or (one liner kind of thing) ?
Of course, you want df.apply. Note you need to modify mean_confidence_interval to return pd.Series([m-h, m+h]).
df[['LowerBound','UpperBound']] = df.apply(mean_confidence_interval, axis=1)
Standard error of the mean is pretty straightforward to calculate so you can easily vectorize this:
import scipy.stats as ss
df.mean(axis=1) + ss.t.ppf(0.975, df.shape[1]-1) * df.std(axis=1)/np.sqrt(df.shape[1])
will give you the upper bound. Use - ss.t.ppf for the lower bound.
Also, pandas seems to have a sem method. If you have a large dataset, I don't suggest using apply over rows. It is pretty slow. Here are some timings:
df = pd.DataFrame(np.random.randn(100, 10))
%timeit df.apply(mean_confidence_interval, axis=1)
100 loops, best of 3: 18.2 ms per loop
%%timeit
dist = ss.t.ppf(0.975, df.shape[1]-1) * df.sem(axis=1)
mean = df.mean(axis=1)
mean - dist, mean + dist
1000 loops, best of 3: 598 µs per loop
Since you already created a function for calculating the confidence interval, simply apply it to each row of your data:
def mean_confidence_interval(data):
confidence = 0.95
m = data.mean()
se = scipy.stats.sem(data)
h = se * sp.stats.t._ppf((1 + confidence) / 2, data.shape[0] - 1)
return pd.Series((m - h, m + h))
interval = df.apply(mean_confidence_interval, axis=1)
interval.columns = ("LowerBound", "UpperBound")
pd.concat([df, interval],axis=1)
Suppose house sale figures are presented for a town in ranges:
< $100,000 204
$100,000 - $199,999 1651
$200,000 - $299,999 2405
$300,000 - $399,999 1972
$400,000 - $500,000 872
> $500,000 1455
I want to know which house-price bin a given percentile falls. Is there a way of using numpy's percentile function to do this? I can do it by hand:
import numpy as np
a = np.array([204., 1651., 2405., 1972., 872., 1455.])
b = np.cumsum(a)/np.sum(a) * 100
q = 75
len(b[b <= q])
4 # ie bin $300,000 - $399,999
But is there a way to use np.percentile instead?
You were almost there:
cs = np.cumsum(a)
bin_idx = np.searchsorted(cs, np.percentile(cs, 75))
At least for this case (and a couple others with larger a arrays), it's not any faster, though:
In [9]: %%timeit
...: b = np.cumsum(a)/np.sum(a) * 100
...: len(b[b <= 75])
...:
10000 loops, best of 3: 38.6 µs per loop
In [10]: %%timeit
....: cs = np.cumsum(a)
....: np.searchsorted(cs, np.percentile(cs, 75))
....:
10000 loops, best of 3: 125 µs per loop
So unless you want to check for multiple percentiles, I'd stick with what you have.
I'm trying to work out how to speed up a Python function which uses numpy. The output I have received from lineprofiler is below, and this shows that the vast majority of the time is spent on the line ind_y, ind_x = np.where(seg_image == i).
seg_image is an integer array which is the result of segmenting an image, thus finding the pixels where seg_image == i extracts a specific segmented object. I am looping through lots of these objects (in the code below I'm just looping through 5 for testing, but I'll actually be looping through over 20,000), and it takes a long time to run!
Is there any way in which the np.where call can be speeded up? Or, alternatively, that the penultimate line (which also takes a good proportion of the time) can be speeded up?
The ideal solution would be to run the code on the whole array at once, rather than looping, but I don't think this is possible as there are side-effects to some of the functions I need to run (for example, dilating a segmented object can make it 'collide' with the next region and thus give incorrect results later on).
Does anyone have any ideas?
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def correct_hot(hot_image, seg_image):
6 1 239810 239810.0 2.3 new_hot = hot_image.copy()
7 1 572966 572966.0 5.5 sign = np.zeros_like(hot_image) + 1
8 1 67565 67565.0 0.6 sign[:,:] = 1
9 1 1257867 1257867.0 12.1 sign[hot_image > 0] = -1
10
11 1 150 150.0 0.0 s_elem = np.ones((3, 3))
12
13 #for i in xrange(1,seg_image.max()+1):
14 6 57 9.5 0.0 for i in range(1,6):
15 5 6092775 1218555.0 58.5 ind_y, ind_x = np.where(seg_image == i)
16
17 # Get the average HOT value of the object (really simple!)
18 5 2408 481.6 0.0 obj_avg = hot_image[ind_y, ind_x].mean()
19
20 5 333 66.6 0.0 miny = np.min(ind_y)
21
22 5 162 32.4 0.0 minx = np.min(ind_x)
23
24
25 5 369 73.8 0.0 new_ind_x = ind_x - minx + 3
26 5 113 22.6 0.0 new_ind_y = ind_y - miny + 3
27
28 5 211 42.2 0.0 maxy = np.max(new_ind_y)
29 5 143 28.6 0.0 maxx = np.max(new_ind_x)
30
31 # 7 is + 1 to deal with the zero-based indexing, + 2 * 3 to deal with the 3 cell padding above
32 5 217 43.4 0.0 obj = np.zeros( (maxy+7, maxx+7) )
33
34 5 158 31.6 0.0 obj[new_ind_y, new_ind_x] = 1
35
36 5 2482 496.4 0.0 dilated = ndimage.binary_dilation(obj, s_elem)
37 5 1370 274.0 0.0 border = mahotas.borders(dilated)
38
39 5 122 24.4 0.0 border = np.logical_and(border, dilated)
40
41 5 355 71.0 0.0 border_ind_y, border_ind_x = np.where(border == 1)
42 5 136 27.2 0.0 border_ind_y = border_ind_y + miny - 3
43 5 123 24.6 0.0 border_ind_x = border_ind_x + minx - 3
44
45 5 645 129.0 0.0 border_avg = hot_image[border_ind_y, border_ind_x].mean()
46
47 5 2167729 433545.8 20.8 new_hot[seg_image == i] = (new_hot[ind_y, ind_x] + (sign[ind_y, ind_x] * np.abs(obj_avg - border_avg)))
48 5 10179 2035.8 0.1 print obj_avg, border_avg
49
50 1 4 4.0 0.0 return new_hot
EDIT I have left my original answer at the bottom for the record, but I have actually looked into your code in more detail over lunch, and I think that using np.where is a big mistake:
In [63]: a = np.random.randint(100, size=(1000, 1000))
In [64]: %timeit a == 42
1000 loops, best of 3: 950 us per loop
In [65]: %timeit np.where(a == 42)
100 loops, best of 3: 7.55 ms per loop
You could get a boolean array (that you can use for indexing) in 1/8 of the time you need to get the actual coordinates of the points!!!
There is of course the cropping of the features that you do, but ndimage has a find_objects function that returns enclosing slices, and appears to be very fast:
In [66]: %timeit ndimage.find_objects(a)
100 loops, best of 3: 11.5 ms per loop
This returns a list of tuples of slices enclosing all of your objects, in 50% more time thn it takes to find the indices of one single object.
It may not work out of the box as I cannot test it right now, but I would restructure your code into something like the following:
def correct_hot_bis(hot_image, seg_image):
# Need this to not index out of bounds when computing border_avg
hot_image_padded = np.pad(hot_image, 3, mode='constant',
constant_values=0)
new_hot = hot_image.copy()
sign = np.ones_like(hot_image, dtype=np.int8)
sign[hot_image > 0] = -1
s_elem = np.ones((3, 3))
for j, slice_ in enumerate(ndimage.find_objects(seg_image)):
hot_image_view = hot_image[slice_]
seg_image_view = seg_image[slice_]
new_shape = tuple(dim+6 for dim in hot_image_view.shape)
new_slice = tuple(slice(dim.start,
dim.stop+6,
None) for dim in slice_)
indices = seg_image_view == j+1
obj_avg = hot_image_view[indices].mean()
obj = np.zeros(new_shape)
obj[3:-3, 3:-3][indices] = True
dilated = ndimage.binary_dilation(obj, s_elem)
border = mahotas.borders(dilated)
border &= dilated
border_avg = hot_image_padded[new_slice][border == 1].mean()
new_hot[slice_][indices] += (sign[slice_][indices] *
np.abs(obj_avg - border_avg))
return new_hot
You would still need to figure out the collisions, but you could get about a 2x speed-up by computing all the indices simultaneously using a np.unique based approach:
a = np.random.randint(100, size=(1000, 1000))
def get_pos(arr):
pos = []
for j in xrange(100):
pos.append(np.where(arr == j))
return pos
def get_pos_bis(arr):
unq, flat_idx = np.unique(arr, return_inverse=True)
pos = np.argsort(flat_idx)
counts = np.bincount(flat_idx)
cum_counts = np.cumsum(counts)
multi_dim_idx = np.unravel_index(pos, arr.shape)
return zip(*(np.split(coords, cum_counts) for coords in multi_dim_idx))
In [33]: %timeit get_pos(a)
1 loops, best of 3: 766 ms per loop
In [34]: %timeit get_pos_bis(a)
1 loops, best of 3: 388 ms per loop
Note that the pixels for each object are returned in a different order, so you can't simply compare the returns of both functions to assess equality. But they should both return the same.
One thing you could do to same a little bit of time is to save the result of seg_image == i so that you don't need to compute it twice. You're computing it on lines 15 & 47, you could add seg_mask = seg_image == i and then reuse that result (It might also be good to separate out that piece for profiling purposes).
While there a some other minor things that you could do to eke out a little bit of performance, the root issue is that you're using a O(M * N) algorithm where M is the number of segments and N is the size of your image. It's not obvious to me from your code whether there is a faster algorithm to accomplish the same thing, but that's the first place I'd try and look for a speedup.