I am trying to calculate Pollard rho number using python for very long integers such below one
I have tried to calculate on my intel core i9 10980HK CPU, which results for few minutes high load work without any success. I am trying to use numba with #njit decorator to connect RTX 2070 super (on laptop) but it gives below error.
- argument 0: Int value is too large:
Here the code:
import numpy as np
import datetime
def pgcd(a,b):
if b==0:
return a
return pgcd(b,r)
def pollardrho(n):
f = lambda z: z*z+1
x, y, d = 1, 1, 1
c = 0
while d==1:
c += 1
x = f(x) % n
y = f(f(y)) % n
d = pgcd(y-x, n)
return d, c
def test_time(n):
t = datetime.datetime.now()
d, c = pollardrho(int(n))
tps = datetime.datetime.now() - t
print(tps, c, d)
file = open("some_powersmooths_large.txt", "r")
for line in file.readlines():
if not line.startswith("#"):
How can I handle this type of big number calculations.
Part 1 (of 2, see Part 2 below).
Numba works only with 64-bit integers at most, it has no big integer arithmetic, only Python has. Big integers will be supported in future versions as developers of Numba promise. You need big integer arithmetics because you have very large integers in your inputs and calculations.
One optimization suggestion for you is to use GMPY2 Python library. It is highly-optimized library of long arithmetics, considerably faster than regular Python implementation of long arithmetics. For very large integers for example it implements multiplication using Fast Fourier Transform which is fastest available algorithm of multiplication.
But GMPY2 can be a bit challenging to install. Most recent precompiled versions for Windows are available by this link. Download .whl file for your version of Python and install it through pip, e.g. for my Windows 64-bit Python 3.7 I downloaded and installed pip install gmpy2-2.0.8-cp37-cp37m-win_amd64.whl. For Linux it is easiest to install through sudo apt install -y python3-gmpy2.
After using GMPY2 your code will become as fast as possible, because this library code is almost fastest in the world. Even Numba (if it had long arithmetics) would not improve more. Only faster formulas and better algorithm can help to improve further, or smaller input integers.
But your example large integers is a way to large for your algorithm even with GMPY2. You have to choose smaller integer or faster algorithm. I've run your algorithm and number for 5 or more minutes and didn't get result. But still if before result would be in 1 hour with regular Python then after using GMPY2 it may be done in 10 minutes or faster.
Also not very sure but probably in your algorithm f(f(y)) % n should be equivalent to f(f(y) % n) % n which should be computed probably faster as it will do twice shorter multiplication. But this needs extra checking.
Also your large integer appeared to be prime, as proven by Primo elliptic curve based primality proving program, it proved primality of this integer in 3 seconds on my PC. Primo only proves primality (with 100% guarantee) but doesn't factor the number (splitting into divisors). Factoring numbers can be done by programs from this list, these programs implement fastest known factoring algorithms, if some links are dead then Google those programs names.
Just wrap all integers n into gmpy2.mpz(n). For example I improved your code a bit, wrapped into gmpy2.mpz() and also made a loop so that all divisors are printed. Also as an example I took not your large prime but a much smaller - first 25 digits of Pi, which is composite, all of its divisors are printed in 7 second on my PC:
Try it online!
import datetime, numpy as np, gmpy2
def num(n):
return gmpy2.mpz(n)
zero, one = num(0), num(1)
def pgcd(a, b):
if b == zero:
return a
r = a % b
return pgcd(b, r)
def pollardrho(n):
f = lambda z: z * z + one
x, y, d = one, one, one
c = 0
while d == one:
c += 1
x = f(x) % n
y = f(f(y)) % n
d = pgcd(y - x, n)
return d, c
def test_time(n):
n = num(int(n))
divs = []
while n > 1:
t = datetime.datetime.now()
d, c = pollardrho(num(int(n)))
tps = datetime.datetime.now() - t
print(tps, c, d, flush = True)
assert n % d == 0, (n, d)
n = n // d
print('All divisors:\n', ' '.join(map(str, divs)), sep = '')
0:00:00 2 7
0:00:00 10 223
0:00:00.000994 65 10739
0:00:00.001999 132 180473
0:00:07.278999 579682 468017117899
All divisors:
7 223 10739 180473 468017117899
Part 2
Reading Wikipedia articles (here and here), I decided to implement a faster version of Pollard-Rho algorithm.
My version implemented below looks more complex but does twice less divisions and multiplications, also on average does less iterations of loop in total.
This improvements result in running time of 3 minutes for my test case, compared to original OP's algorithm with running time of 7 minutes, on my laptop.
My algorithm is also randomized, meaning that instead of witness 1 or 2 it takes random witness in range [1, N - 2]. In rare cases it may fail as said in Wikipedia then I rerun algorithm with different witness. Also it uses Fermat primality test to check if the input number is prime, then doesn't search for any more divisors.
For tests I used input number p generated by code p = 1; for i in range(256): p *= random.randrange(2, 1 << 32), basically it is composed of 256 factors each 32-bits at most.
Also I improved both algorithms to output more statistics. One of statistics params is pow which shows the complexity of each step, pow of 0.25 tells that if divisor is d then current factoring step spent c = d^0.25 iterations to find this divisor d. As told in Wikipedia Pollard-Rho algorithm should have on average pow = 0.25, meaning that complexity (number of iterations) of finding any divisor d is around d^0.25.
In next code there are also other improvements like providing a lot of statistics on the way. And finding all factors in the loop.
My version of algorithm for my test case has average pow of 0.24, original previous version has 0.3. Smaller pow means doing less loop iterations on the average.
Also tested my version with and without GMPY2. Apparently GMPY2 gives not much improvement over regular Python big integer arithmetic, mainly because GMPY2 is more optimized for really big numbers (tens thousands of bits) (using Fast Fourier Transform multiplication, etc), while here number are not to big in my test. But still GMPY2 gives speedup around 1.35x times providing running time of 3 minutes compared to almost 4 minutes without GMPY2 for same algorithm. To test with or without gmpy2 you need just to change inside def num(n) function either to return gmpy2.mpz(n) or to return n.
Try it online!
import datetime, numpy as np, gmpy2, random, math
def num(n):
return gmpy2.mpz(n)
zero, one = num(0), num(1)
def gcd(a, b):
while b != zero:
a, b = b, a % b
return a
def pollard_rho_v0(n):
f = lambda z: z * z + one
n, x, y, d, c, t = num(n), one, one, one, 0, datetime.datetime.now()
while d == one:
c += 1
x = f(x) % n
y = f(f(y)) % n
d = gcd(y - x, n)
return d, {'i': c, 'n_bits': n.bit_length(), 'd_bits': round(math.log(d) / math.log(2), 2),
'pow': round(math.log(max(c, 1)) / math.log(d), 4), 'time': str(datetime.datetime.now() - t)}
def is_fermat_prp(n, trials = 32):
n = num(n)
for i in range(trials):
a = num((3, 5, 7)[i] if i < 3 else random.randint(2, n - 2))
if pow(a, n - 1, n) != 1:
return False
return True
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
# https://ru.wikipedia.org/wiki/%D0%A0%D0%BE-%D0%B0%D0%BB%D0%B3%D0%BE%D1%80%D0%B8%D1%82%D0%BC_%D0%9F%D0%BE%D0%BB%D0%BB%D0%B0%D1%80%D0%B4%D0%B0
def pollard_rho_v1(N):
AbsD = lambda a, b: a - b if a >= b else b - a
N, fermat_prp, t = num(N), None, datetime.datetime.now()
SecsPassed = lambda: (datetime.datetime.now() - t).total_seconds()
for j in range(8):
i, stage, y, x = 0, 2, num(1), num(random.randint(1, N - 2))
while True:
if (i & 0x3FF) == 0 and fermat_prp is None and (SecsPassed() >= 15 or j > 0):
fermat_prp = is_fermat_prp(N)
if fermat_prp:
r = N
r = gcd(N, AbsD(x, y))
if r != one:
if i == stage:
y = x
stage <<= one
x = (x * x + one) % N
i += 1
if r != N or fermat_prp:
return r, {'i': i, 'j': j, 'n_bits': N.bit_length(), 'd_bits': round(math.log(r) / math.log(2), 2),
'pow': round(math.log(max(i, 1)) / math.log(r), 4), 'fermat_prp': fermat_prp, 'time': str(datetime.datetime.now() - t)}
assert False, f'Pollard-Rho failed after {j + 1} trials! N = {N}'
def factor(n, *, ver = 1):
assert n > 0, n
n, divs, pows, tt = int(n), [], 0., datetime.datetime.now()
while n != 1:
d, stats = (pollard_rho_v0, pollard_rho_v1)[ver](n)
print(d, stats)
assert d > 1, (d, n)
assert n % d == 0, (d, n)
n = n // d
pows += min(1, stats['pow'])
print('All divisors:\n', ' '.join(map(str, divs)), sep = '')
print('Avg pow', round(pows / len(divs), 3), ', total time', datetime.datetime.now() - tt)
return divs
p = 1
for i in range(256):
p *= random.randrange(2, 1 << 32)
factor(p, ver = 1)
267890969 {'i': 25551, 'j': 0, 'n_bits': 245, 'd_bits': 28.0, 'pow': 0.523,
'fermat_prp': None, 'time': '0:00:02.363004'}
548977049 {'i': 62089, 'j': 0, 'n_bits': 217, 'd_bits': 29.03, 'pow': 0.5484,
'fermat_prp': None, 'time': '0:00:04.912002'}
3565192801 {'i': 26637, 'j': 0, 'n_bits': 188, 'd_bits': 31.73, 'pow': 0.4633,
'fermat_prp': None, 'time': '0:00:02.011999'}
1044630971 {'i': 114866, 'j': 0, 'n_bits': 156, 'd_bits': 29.96, 'pow': 0.5611,
'fermat_prp': None, 'time': '0:00:06.666996'}
3943786421 {'i': 60186, 'j': 0, 'n_bits': 126, 'd_bits': 31.88, 'pow': 0.4981,
'fermat_prp': None, 'time': '0:00:01.594000'}
3485918759 {'i': 101494, 'j': 0, 'n_bits': 94, 'd_bits': 31.7, 'pow': 0.5247,
'fermat_prp': None, 'time': '0:00:02.161004'}
1772239433 {'i': 102262, 'j': 0, 'n_bits': 63, 'd_bits': 30.72, 'pow': 0.5417,
'fermat_prp': None, 'time': '0:00:01.802996'}
2706462217 {'i': 0, 'j': 1, 'n_bits': 32, 'd_bits': 31.33, 'pow': 0.0,
'fermat_prp': True, 'time': '0:00:00.925801'}
All divisors:
258498 4 99792 121 245864 25 81 2 238008 70 39767 23358624 79 153 27 65 1566 2 31 13 57 1776 446 20 2 3409311 814 37 595384977 2 24 5 147 3738 4514 8372 7 38 237996 430 43 240 1183 10404 11 10234 30 2615625 1263 44590 240 3 101 231 2 79488 799236 2 88059 1578 432500 134 20956 101 3589 155 2471 91 7 6 100608 1995 33 9 181 48 5033 20 16 15 305 44 927 49 76 13 1577 46 144 292 65 2 111890 300 368 430705 6 368 381 1578812 4290 10 48 565 2 2 23606 23020 267 4186 5835 33 4 899 6288 3534 129064 34 315 36190 16900 6 60291 2 12 111631 463 2500 1405 1959 22 112 2 228 3 2192 2 28 321618 4 44 125924200164 9 17956 4224 2848 16 7 162 4 573 843 48 101 224460324 4 768 3 2 8 154 256 2 3 51 784 34 48 14 369 218 9 12 27 152 2 256 2 51 9 9411903 2 131 9 71 6 3 13307904 85608 35982 121669 93 3 3 121 7967 11 20851 19 289 4237 3481 289 89 11 11 121 841 5839 2071 59 29 17293 9367 110801 196219 2136917 631 101 3481 323 101 19 32129 29 19321 19 19 29 19 6113 509 193 1801 347 71 83 1373 191 239 109 1039 2389 1867 349 353 1566871 349 561971 199 1429 373 1231 103 1048871 83 1681 1481 3673 491 691 1709 103 49043 911 673 1427 4147027 569 292681 2153 6709 821 641 569 461 239 2111 2539 6163 3643 5881 2143 7229 593 4391 1531 937 1721 1873 3761 1229 919 178207 54637831 8317 17903 3631 6841 2131 4157 3467 2393 7151 56737 1307 10663 701 2522350423 4253 1303 13009 7457 271549 12391 36131 4943 6899 27077 4943 7723 4567 26959 9029 2063 6607 4721 14563 8783 38803 1889 1613 20479 16231 1847 41131 52201 37507 224351 13757 36299 3457 21739 107713 51169 17981 29173 2287 16253 386611 132137 9181 29123 740533 114769 2287 61553 21121 10501 47269 59077 224951 377809 499729 6257 5903 59999 126823 85199 29501 34589 518113 39409 411667 146603 1044091 312979 291569 158303 41777 115133 508033 154799 13184621 167521 3037 317711 206827 1254059 455381 152639 95531 1231201 494381 237689 163327 651331 351053 152311 103669 245683 1702901 46337 151339 6762257 57787 38959 366343 609179 219749 2058253 634031 263597 540517 1049051 710527 2343527 280967 485647 1107497 822763 862031 583139 482837 1586621 782107 371143 763549 10740361 1372963 62589077 1531627 31991 1206173 678901 4759373 5877959 178439 1736369 687083 53508439 99523 10456609 942943 2196619 376081 802453 10254457 2791597 3231757 2464793 66598351 1535867 16338167 1138639 882953 1483693 12624373 35717041 6427979 5653181 6421873 1434131 1258889 108462803 859667 64298779 261810191 5743483 32314969 5080721 8961767 68011043 7528799 2086957 41618389 19999663 118428929 45556487 40462109 22478363 29039737 17366957 77805557 12775951 50890837 22666991 14892133 691979 133920733 115526921 29092501 2332124099 16835209 101301479 29987047 160734341 35904857 12376361 17774983 2397907 525367681 245240591 48159641 45590383 87274531 69160309 256092673 7430783 588029137 91286513 75817271 393556847 1183839551 71513537 593809903 200299807 161799857 537099259 21510427 335791301 382965337 156133297 180373937 75136921 364790017 174932509 117559207 601612421 54539711 2107325149 566372699 102467207 321156893 1024847609 1250224901 1038888437 3029169139 345512147 4127597891 1043830063 267890969 548977049 3565192801 1044630971 3943786421 3485918759 1772239433 2706462217
Avg pow 0.238 , total time 0:03:48.193658
PS. Also decided to implement minimalistic but fast version of Pollard-Rho factorization algorithm, pure Pythonic, ready for copy-pasting into any project (for example of factoring first 25 digits of Pi):
Try it online!
def factor(n):
import itertools, math
if n <= 1:
return []
x = 2
for cycle in itertools.count(1):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d > 1:
return [d] + factor(n // d)
# [7, 223, 180473, 10739, 468017117899]
Given that the operation in pollardrho is very inefficient, I am not surprised that the operation takes a while.
However, I don't know that particular function, so I don't know if it could be made more efficient.
In Python, integers have an arbitrary length.
What this means is that they can be of any length, and Python itself will handle storing it properly using 64-bit integers (by spreading them over multiple of them).
(You can test this for yourself by for example creating an integer that cannot be stored in a 64-bit unsigned integer, like a = 2**64, and then checking the output of the a.bit_length() method, which should say 65)
So, theoretically speaking, you should be able to calculate any integer.
However, because you are using Numba, you are limited to integers that can actually be stored within a 64-bit unsigned integer due to the way Numba works.
The error you are getting is simply the number becoming too large to store in a 64-bit unsigned integer.
Bottom line: Without Numba, you can calculate that number just fine. With Numba, you cannot.
Of course, if you only want to know roughly what the number is, and not precisely, you can instead just use floats.
I'm trying to implement RSA algorithm in python, but sometimes when I execute the code the result of the maths are too large for decimal codes and when I try to decrypt the cipher context the result is not the original string. I don't have ideia what is going wrong, please answer me if the code is wrong or there is one way to tranform the large decimal codes to lower.
This is the code:
import random
def primes(min, max):
for i in range(min, max):
if i % 2 == 0 or i % 3 == 0 or i % 5 == 0 or i % 7 == 0:
yield i
# print(i)
def gdc(n1, n2):
while n2:
n2, n1 = n1 % n2, n2
return n1
def private_keys(n, phi):
for i in range(2, phi):
if gdc(n, i) == 1 and gdc(phi, i) == 1:
yield i
def public_keys(phi, private_key, maximum):
for i in range(phi + 1, phi + maximum):
if i * private_key % phi == 1:
yield i
def encrypt_string(key, n, string):
cipher_context = ''
for word in string:
word_ascii = ord(word)
encrypted_word_ascii = word_ascii ** key % n
print(f'{word_ascii} {encrypted_word_ascii}')
cipher_context += chr(encrypted_word_ascii)
return cipher_context
def decrypt_string(key, n, cipher_context):
plain_text = ''
for encrypted_word in cipher_context:
encrypted_ascii_word = ord(encrypted_word)
ascii_word = encrypted_ascii_word ** key % n
print(f'{encrypted_ascii_word} {ascii_word}')
plain_text += chr(ascii_word)
return plain_text
minimum = 10
maximum = 200
generated_primes = [i for i in primes(minimum, maximum)]
prime1 = random.choice(generated_primes)
prime2 = random.choice(generated_primes)
print(f'Primo 1 (p): {prime1}')
print(f'Primo 2 (q): {prime2}')
n = prime1 * prime2
print(f'n: {n}')
phi = (prime1 - 1) * (prime2 - 1)
print(f'phi: {phi}')
generated_private_keys = [i for i in private_keys(n, phi)]
# print(generated_private_keys)
private_key = random.choice(generated_private_keys)
print(f'Chave privada: {private_key}')
maximum_pub = 100000
generated_public_keys = []
while generated_public_keys == []:
generated_public_keys = [i for i in public_keys(phi, private_key, maximum_pub)]
maximum_pub *= 10
public_key = random.choice(generated_public_keys)
print(f'Chave publica: {public_key}')
text = 'Testando exemplo de criptografia RSA'
cipher_context = encrypt_string(public_key, n, text)
print(frase_criptografada.encode('utf-16', 'surrogatepass'))
plain_text = decrypt_string(private_key, n, cipher_context)
Exemple of strange output:
Primo 1 (p): 169
Primo 2 (q): 89
n: 15041
phi: 14784
Chave privada: 2557
Chave publica: 71509
Testando exemplo de criptografia RSA
84 2294
101 12893
115 10294
116 8826
97 6974
110 2528
100 8836
111 3010
32 981
101 12893
120 12756
101 12893
109 10392
112 6807
108 5035
111 3010
32 981
100 8836
101 12893
32 981
99 8549
114 10813
105 6683
112 6807
116 8826
111 3010
103 13610
114 10813
97 6974
102 9670
105 6683
97 6974
32 981
82 9195
83 9482
65 9971
2294 9340
12893 8200
10294 12842
8826 4744
6974 8196
2528 7052
8836 8199
3010 3582
981 13916
12893 8200
12756 2434
12893 8200
10392 11679
6807 13996
5035 3579
3010 3582
981 13916
8836 8199
12893 8200
981 13916
8549 99
10813 7056
6683 5890
6807 13996
8826 4744
3010 3582
13610 5888
10813 7056
6974 8196
9670 13986
6683 5890
6974 8196
981 13916
9195 5867
9482 13967
9971 10478
⑼ ㈪ኈ ᮌ 㙜 ং 㚬㙜 㙜cᮐᜂ㚬ኈᜀᮐ 㚢ᜂ 㙜᛫㚏⣮
There are several problems here.
Your prime number generator is broken. In the example you gave, 169 is equal to 132, so this value of p is of no use.
Conventionally, the public exponent (e) is a fixed small number like 257 or 65537. There's no need to choose a number that's larger than phi, and having a value of e (71509) that's greater than n (15041) makes no sense whatsoever.
You should calculate the private exponent (d) by using the extended Euclidean algorithm. I'm not convinced that the method you're using will give reliable results. (Strictly speaking, you should be using Carmichael's totient function in any case.)
Obviously encrypting individual characters provides no security. To work with larger values of p and q, you'll need to use modular exponentiation. Fortunately, this is built in to Python: (c ** d) % n can be replaced with pow(c, d, n), which is much faster and won't crash for large values of d. The sympy Python library has a lot of functions that you might find useful, including randprime() and gcdex() (extended Euclidean algorithm). Just type in pip install sympy at the command line to install it, if it isn't already available.
I have been working on a programming challenge, problem here, which basically states:
Given integer array, you are to iterate through all pairs of neighbor
elements, starting from beginning - and swap members of each pair
where first element is greater than second.
And then return the amount of swaps made and the checksum of the final answer. My program seemingly does both the sorting and the checksum according to how it wants. But my final answer is off for everything but the test input they gave.
So: 1 4 3 2 6 5 -1
Results in the correct output: 3 5242536 with my program.
But something like:
2 96 7439 92999 240 70748 3 842 74 706 4 86 7 463 1871 7963 904 327 6268 20955 92662 278 57 8 5912 724 70916 13 388 1 697 99666 6924 2 100 186 37504 1 27631 59556 33041 87 9 45276 -1
Results in: 39 1291223 when the correct answer is 39 3485793.
Here's what I have at the moment:
# Python 2.7
def check_sum(data):
data = [str(x) for x in str(data)[::]]
numbers = len(data)
result = 0
for number in range(numbers):
result += int(data[number])
result *= 113
result %= 10000007
def bubble_in_array(data):
numbers = data[:-1]
numbers = [int(x) for x in numbers]
swap_count = 0
for x in range(len(numbers)-1):
if numbers[x] > numbers[x+1]:
temp = numbers[x+1]
numbers[x+1] = numbers[x]
numbers[x] = temp
swap_count += 1
raw_number = int(''.join([str(x) for x in numbers]))
print('%s %s') % (str(swap_count), check_sum(raw_number))
Does anyone have any idea where I am going wrong?
The issue is with your way of calculating Checksum. It fails when the array has numbers with more than one digit. For example:
2 96 7439 92999 240 70748 3 842 74 706 4 86 7 463 1871 7963 904 327 6268 20955 92662 278 57 8 5912 724 70916 13 388 1 697 99666 6924 2 100 186 37504 1 27631 59556 33041 87 9 45276 -1
You are calculating Checksum for 2967439240707483842747064867463187179639043276268209559266227857859127247091613388169792999692421001863750412763159556330418794527699666
digit by digit while you should calculate the Checksum of [2, 96, 7439, 240, 70748, 3, 842, 74, 706, 4, 86, 7, 463, 1871, 7963, 904, 327, 6268, 20955, 92662, 278, 57, 8, 5912, 724, 70916, 13, 388, 1, 697, 92999, 6924, 2, 100, 186, 37504, 1, 27631, 59556, 33041, 87, 9, 45276, 99666]
The fix:
# Python 2.7
def check_sum(data):
result = 0
for number in data:
result += number
result *= 113
result %= 10000007
def bubble_in_array(data):
numbers = [int(x) for x in data[:-1]]
swap_count = 0
for x in xrange(len(numbers)-1):
if numbers[x] > numbers[x+1]:
numbers[x+1], numbers[x] = numbers[x], numbers[x+1]
swap_count += 1
print('%d %d') % (swap_count, check_sum(numbers))
More notes:
To swap two variables in Python, you dont need to use a temp variable, just use a,b = b,a.
In python 2.X, use xrange instead of range.
The situation is as follows:
I have a 2D numpy array. Its shape is (1002, 1004). Each element contains a value between 0 and Inf. What I now want to do is determine the first 1000 maximum values and store the corresponding indices in to a list named x and a list named y. This is because I want to plot the maximum values and the indices actually correspond to real time x and y position of the value.
What I have so far is:
x = numpy.zeros(500)
y = numpy.zeros(500)
for idx in range(500):
x[idx] = numpy.unravel_index(full.argmax(), full.shape)[0]
y[idx] = numpy.unravel_index(full.argmax(), full.shape)[1]
full[full == full.max()] = 0.
print os.times()
Here full is my 2D numpy array. As can be seen from the for loop, I only determine the first 500 maximum values at the moment. This however already takes about 5 s. For the first 1000 maximum values, the user time should actually be around 0.5 s. I've noticed that a very time consuming part is setting the previous maximum value to 0 each time. How can I speed things up?
Thank you so much!
If you have numpy 1.8, you can use the argpartition function or method.
Here's a script that calculates x and y:
import numpy as np
# Create an array to work with.
full = np.random.randint(1, 99, size=(8, 8))
# Get the indices for the largest `num_largest` values.
num_largest = 8
indices = (-full).argpartition(num_largest, axis=None)[:num_largest]
# OR, if you want to avoid the temporary array created by `-full`:
# indices = full.argpartition(full.size - num_largest, axis=None)[-num_largest:]
x, y = np.unravel_index(indices, full.shape)
print("x =", x)
print("y =", y)
print("Largest values:", full[x, y])
print("Compare to: ", np.sort(full, axis=None)[-num_largest:])
[[67 93 18 84 58 87 98 97]
[48 74 33 47 97 26 84 79]
[37 97 81 69 50 56 68 3]
[85 40 67 85 48 62 49 8]
[93 53 98 86 95 28 35 98]
[77 41 4 70 65 76 35 59]
[11 23 78 19 16 28 31 53]
[71 27 81 7 15 76 55 72]]
x = [0 2 4 4 0 1 4 0]
y = [6 1 7 2 7 4 4 1]
Largest values: [98 97 98 98 97 97 95 93]
Compare to: [93 95 97 97 97 98 98 98]
You could loop through the array as #Inspired suggests, but looping through NumPy arrays item-by-item tends to lead to slower-performing code than code which uses NumPy functions since the NumPy functions are written in C/Fortran, while the item-by-item loop tends to use Python functions.
So, although sorting is O(n log n), it may be quicker than a Python-based one-pass O(n) solution. Below np.unique performs the sort:
import numpy as np
def nlargest_indices(arr, n):
uniques = np.unique(arr)
threshold = uniques[-n]
return np.where(arr >= threshold)
full = np.random.random((1002,1004))
x, y = nlargest_indices(full, 10)
print(full[x, y])
# [ 2 7 217 267 299 683 775 825 853]
# [645 621 132 242 556 439 621 884 367]
Here is a timeit benchmark comparing nlargest_indices (above) to
def nlargest_indices_orig(full, n):
full = full.copy()
x = np.zeros(n)
y = np.zeros(n)
for idx in range(n):
x[idx] = np.unravel_index(full.argmax(), full.shape)[0]
y[idx] = np.unravel_index(full.argmax(), full.shape)[1]
full[full == full.max()] = 0.
return x, y
In [97]: %timeit nlargest_indices_orig(full, 500)
1 loops, best of 3: 5 s per loop
In [98]: %timeit nlargest_indices(full, 500)
10 loops, best of 3: 133 ms per loop
For timeit purposes I needed to copy the array inside nlargest_indices_orig, lest full get mutated by the timing loop.
Benchmarking the copying operation:
def base(full, n):
full = full.copy()
In [102]: %timeit base(full, 500)
100 loops, best of 3: 4.11 ms per loop
shows this added about 4ms to the 5s benchmark for nlargest_indices_orig.
Warning: nlargest_indices and nlargest_indices_orig may return different results if arr contains repeated values.
nlargest_indices finds the n largest values in arr and then returns the x and y indices corresponding to the locations of those values.
nlargest_indices_orig finds the n largest values in arr and then returns one x and y index for each large value. If there is more than one x and y corresponding to the same large value, then some locations where large values occur may be missed.
They also return indices in a different order, but I suppose that does not matter for your purpose of plotting.
If you want to know the indices of the n max/min values in the 2d array, my solution (for largest is)
indx = divmod((-full).argpartition(num_largest,axis=None)[:3],full.shape[0])
This finds the indices of the largest values from the flattened array and then determines the index in the 2d array based on the remainder and mod.
Nevermind. Benchmarking shows the unravel method is twice as fast at least for num_largest = 3.
I'm afraid that the most time-consuming part is recalculating maximum. In fact, you have to calculate maximum of 1002*1004 numbers 500 times which gives you 500 million comparisons.
Probably you should write your own algorithm to find the solution in one pass: keep only 1000 greatest numbers (or their indices) somewhere while scanning your 2D array (without modifying the source array). I think that some sort of a binary heap (have a look at heapq) would suit for the storage.
I'm trying to work out how to speed up a Python function which uses numpy. The output I have received from lineprofiler is below, and this shows that the vast majority of the time is spent on the line ind_y, ind_x = np.where(seg_image == i).
seg_image is an integer array which is the result of segmenting an image, thus finding the pixels where seg_image == i extracts a specific segmented object. I am looping through lots of these objects (in the code below I'm just looping through 5 for testing, but I'll actually be looping through over 20,000), and it takes a long time to run!
Is there any way in which the np.where call can be speeded up? Or, alternatively, that the penultimate line (which also takes a good proportion of the time) can be speeded up?
The ideal solution would be to run the code on the whole array at once, rather than looping, but I don't think this is possible as there are side-effects to some of the functions I need to run (for example, dilating a segmented object can make it 'collide' with the next region and thus give incorrect results later on).
Does anyone have any ideas?
Line # Hits Time Per Hit % Time Line Contents
5 def correct_hot(hot_image, seg_image):
6 1 239810 239810.0 2.3 new_hot = hot_image.copy()
7 1 572966 572966.0 5.5 sign = np.zeros_like(hot_image) + 1
8 1 67565 67565.0 0.6 sign[:,:] = 1
9 1 1257867 1257867.0 12.1 sign[hot_image > 0] = -1
11 1 150 150.0 0.0 s_elem = np.ones((3, 3))
13 #for i in xrange(1,seg_image.max()+1):
14 6 57 9.5 0.0 for i in range(1,6):
15 5 6092775 1218555.0 58.5 ind_y, ind_x = np.where(seg_image == i)
17 # Get the average HOT value of the object (really simple!)
18 5 2408 481.6 0.0 obj_avg = hot_image[ind_y, ind_x].mean()
20 5 333 66.6 0.0 miny = np.min(ind_y)
22 5 162 32.4 0.0 minx = np.min(ind_x)
25 5 369 73.8 0.0 new_ind_x = ind_x - minx + 3
26 5 113 22.6 0.0 new_ind_y = ind_y - miny + 3
28 5 211 42.2 0.0 maxy = np.max(new_ind_y)
29 5 143 28.6 0.0 maxx = np.max(new_ind_x)
31 # 7 is + 1 to deal with the zero-based indexing, + 2 * 3 to deal with the 3 cell padding above
32 5 217 43.4 0.0 obj = np.zeros( (maxy+7, maxx+7) )
34 5 158 31.6 0.0 obj[new_ind_y, new_ind_x] = 1
36 5 2482 496.4 0.0 dilated = ndimage.binary_dilation(obj, s_elem)
37 5 1370 274.0 0.0 border = mahotas.borders(dilated)
39 5 122 24.4 0.0 border = np.logical_and(border, dilated)
41 5 355 71.0 0.0 border_ind_y, border_ind_x = np.where(border == 1)
42 5 136 27.2 0.0 border_ind_y = border_ind_y + miny - 3
43 5 123 24.6 0.0 border_ind_x = border_ind_x + minx - 3
45 5 645 129.0 0.0 border_avg = hot_image[border_ind_y, border_ind_x].mean()
47 5 2167729 433545.8 20.8 new_hot[seg_image == i] = (new_hot[ind_y, ind_x] + (sign[ind_y, ind_x] * np.abs(obj_avg - border_avg)))
48 5 10179 2035.8 0.1 print obj_avg, border_avg
50 1 4 4.0 0.0 return new_hot
EDIT I have left my original answer at the bottom for the record, but I have actually looked into your code in more detail over lunch, and I think that using np.where is a big mistake:
In [63]: a = np.random.randint(100, size=(1000, 1000))
In [64]: %timeit a == 42
1000 loops, best of 3: 950 us per loop
In [65]: %timeit np.where(a == 42)
100 loops, best of 3: 7.55 ms per loop
You could get a boolean array (that you can use for indexing) in 1/8 of the time you need to get the actual coordinates of the points!!!
There is of course the cropping of the features that you do, but ndimage has a find_objects function that returns enclosing slices, and appears to be very fast:
In [66]: %timeit ndimage.find_objects(a)
100 loops, best of 3: 11.5 ms per loop
This returns a list of tuples of slices enclosing all of your objects, in 50% more time thn it takes to find the indices of one single object.
It may not work out of the box as I cannot test it right now, but I would restructure your code into something like the following:
def correct_hot_bis(hot_image, seg_image):
# Need this to not index out of bounds when computing border_avg
hot_image_padded = np.pad(hot_image, 3, mode='constant',
new_hot = hot_image.copy()
sign = np.ones_like(hot_image, dtype=np.int8)
sign[hot_image > 0] = -1
s_elem = np.ones((3, 3))
for j, slice_ in enumerate(ndimage.find_objects(seg_image)):
hot_image_view = hot_image[slice_]
seg_image_view = seg_image[slice_]
new_shape = tuple(dim+6 for dim in hot_image_view.shape)
new_slice = tuple(slice(dim.start,
None) for dim in slice_)
indices = seg_image_view == j+1
obj_avg = hot_image_view[indices].mean()
obj = np.zeros(new_shape)
obj[3:-3, 3:-3][indices] = True
dilated = ndimage.binary_dilation(obj, s_elem)
border = mahotas.borders(dilated)
border &= dilated
border_avg = hot_image_padded[new_slice][border == 1].mean()
new_hot[slice_][indices] += (sign[slice_][indices] *
np.abs(obj_avg - border_avg))
return new_hot
You would still need to figure out the collisions, but you could get about a 2x speed-up by computing all the indices simultaneously using a np.unique based approach:
a = np.random.randint(100, size=(1000, 1000))
def get_pos(arr):
pos = []
for j in xrange(100):
pos.append(np.where(arr == j))
return pos
def get_pos_bis(arr):
unq, flat_idx = np.unique(arr, return_inverse=True)
pos = np.argsort(flat_idx)
counts = np.bincount(flat_idx)
cum_counts = np.cumsum(counts)
multi_dim_idx = np.unravel_index(pos, arr.shape)
return zip(*(np.split(coords, cum_counts) for coords in multi_dim_idx))
In [33]: %timeit get_pos(a)
1 loops, best of 3: 766 ms per loop
In [34]: %timeit get_pos_bis(a)
1 loops, best of 3: 388 ms per loop
Note that the pixels for each object are returned in a different order, so you can't simply compare the returns of both functions to assess equality. But they should both return the same.
One thing you could do to same a little bit of time is to save the result of seg_image == i so that you don't need to compute it twice. You're computing it on lines 15 & 47, you could add seg_mask = seg_image == i and then reuse that result (It might also be good to separate out that piece for profiling purposes).
While there a some other minor things that you could do to eke out a little bit of performance, the root issue is that you're using a O(M * N) algorithm where M is the number of segments and N is the size of your image. It's not obvious to me from your code whether there is a faster algorithm to accomplish the same thing, but that's the first place I'd try and look for a speedup.
I’m using the API mpmath to compute the following sum
Let us consider the serie u0, u1, u2 defined by:
u0 = 3/2 = 1,5
u1 = 5/3 = 1,6666666…
un+1 = 2003 - 6002/un + 4000/un un-1
The serie converges on 2, but with rounding problem it seems to converge on 2000.
n Calculated value Rounded off exact value
2 1,800001 1,800000000
3 1,890000 1,888888889
4 3,116924 1,941176471
5 756,3870306 1,969696970
6 1996,761549 1,984615385
7 1999,996781 1,992248062
8 1999,999997 1,996108949
9 2000,000000 1,998050682
10 2000,000000 1,999024390
My code :
from mpmath import *
mp.dps = 50
for i in range (2,11):
print u
my bad results :
I tried to perform with some others functions (fdiv…) or to change the precision: same bad result
What’s wrong with this code ?
How to change my code to find the value 2.0 ??? with the formula :
un+1 = 2003 - 6002/un + 4000/un un-1
Using the decimal module, you can see the series also has a solution converging at 2000:
from decimal import Decimal, getcontext
getcontext().prec = 100
u0=Decimal(3) / Decimal(2)
u1=Decimal(5) / Decimal(3)
u=[u0, u1]
for i in range(100):
un1 = 2003 - 6002/u[-1] + 4000/(u[-1]*u[-2])
print un1
The recurrence relation has multiple fixed points (one at 2 and the other at 2000):
>>> u = [Decimal(2), Decimal(2)]
>>> 2003 - 6002/u[-1] + 4000/(u[-1]*u[-2])
>>> u = [Decimal(2000), Decimal(2000)]
>>> 2003 - 6002/u[-1] + 4000/(u[-1]*u[-2])
The solution at 2 is an unstable fixed-point. The attractive fixed-point is at 2000.
The convergence gets very close to two and when the round-off causes the value to slightly exceed two, that difference gets amplified again and again until hitting 2000.
Your (non-linear) recurrence sequence has three fixed points: 1, 2 and 2000. The values 1 and 2 are close to each other compared to 2000, which is usually an indication of unstable fixed points because they are "almost" double roots.
You need to do some maths in order to diverge less early. Let v(n) be a side sequence:
v(n) = (1+2^n)u(n)
The following holds true:
v(n+1) = (1+2^(n+1)) * (2003v(n)v(n-1) - 6002(1+2^n)v(n-1) + 4000(1+2^n)(1+2^n-1)) / (v(n)v(n-1))
You can then simply compute v(n) and deduce u(n) from u(n) = v(n)/(1+2^n):
#!/usr/bin/env python
from mpmath import *
mp.dps = 50
v0 = mpf(3)
v1 = mpf(5)
for i in range (2,25):
vn1 = (1+2**i) * (2003*v[i-1]*v[i-2] \
- 6002*(1+2**(i-1))*v[i-2] \
+ 4000*(1+2**(i-1))*(1+2**(i-2))) \
/ (v[i-1]*v[i-2])
print u
And the result:
Note that this will still diverge eventually. In order to really converge, you need to compute v(n) with arbitrary precision. But this is now a lot easier since all the values are integers.
You calculate your initial values to 53-bits of precision and then assign that rounded value to the high-precision mpf variable. You should use u0=mpf(3)/mpf(2) and u1=mpf(5)/mpf(3). You'll stay close to 2 for a few more interations, but you'll still end up converging at 2000. This is due to rounding error. One alternative is to compute with fractions. I used gmpy and the following code converges to 2.
from __future__ import print_function
import gmpy
u = [gmpy.mpq(3,2), gmpy.mpq(5,3)]
for i in range(2,300):
temp = (2003 - 6002/u[-1] + 4000/(u[-1]*u[-2]))
for i in u: print(gmpy.mpf(i,300))
If you compute with infinite precision then you get 2 otherwise you get 2000:
import itertools
from fractions import Fraction
def series(u0=Fraction(3, 2), u1=Fraction(5, 3)):
yield u0
yield u1
while u0 != u1:
un = 2003 - 6002/u1 + 4000/(u1*u0)
yield un
u1, u0 = un, u1
for i, u in enumerate(itertools.islice(series(), 100)):
err = (2-u)/2 # relative error
print("%d\t%.2g" % (i, err))
0 0.25
1 0.17
2 0.1
3 0.056
4 0.029
5 0.015
6 0.0077
7 0.0039
8 0.0019
9 0.00097
10 0.00049
11 0.00024
12 0.00012
13 6.1e-05
14 3.1e-05
15 1.5e-05
16 7.6e-06
17 3.8e-06
18 1.9e-06
19 9.5e-07
20 4.8e-07
21 2.4e-07
22 1.2e-07
23 6e-08
24 3e-08
25 1.5e-08
26 7.5e-09
27 3.7e-09
28 1.9e-09
29 9.3e-10
30 4.7e-10
31 2.3e-10
32 1.2e-10
33 5.8e-11
34 2.9e-11
35 1.5e-11
36 7.3e-12
37 3.6e-12
38 1.8e-12
39 9.1e-13
40 4.5e-13
41 2.3e-13
42 1.1e-13
43 5.7e-14
44 2.8e-14
45 1.4e-14
46 7.1e-15
47 3.6e-15
48 1.8e-15
49 8.9e-16
50 4.4e-16
51 2.2e-16
52 1.1e-16
53 5.6e-17
54 2.8e-17
55 1.4e-17
56 6.9e-18
57 3.5e-18
58 1.7e-18
59 8.7e-19
60 4.3e-19
61 2.2e-19
62 1.1e-19
63 5.4e-20
64 2.7e-20
65 1.4e-20
66 6.8e-21
67 3.4e-21
68 1.7e-21
69 8.5e-22
70 4.2e-22
71 2.1e-22
72 1.1e-22
73 5.3e-23
74 2.6e-23
75 1.3e-23
76 6.6e-24
77 3.3e-24
78 1.7e-24
79 8.3e-25
80 4.1e-25
81 2.1e-25
82 1e-25
83 5.2e-26
84 2.6e-26
85 1.3e-26
86 6.5e-27
87 3.2e-27
88 1.6e-27
89 8.1e-28
90 4e-28
91 2e-28
92 1e-28
93 5e-29
94 2.5e-29
95 1.3e-29
96 6.3e-30
97 3.2e-30
98 1.6e-30
99 7.9e-31
Well, as casevh said, I just added the mpf function in first initials terms in my code :
and the value converge for 16 steps to the correct value 2.0 before diverged again (see below).
So, even with a good python library for arbitrary-precision floating-point arithmetic and some basics operations the result can become totally false and it is not algorithmic, mathematical or recurrence problem as I read sometimes.
So it is necessary to remain watchful and critic !!! ( I’m very afraid about the mpmath.lerchphi(z, s, a) function ;-)
2 1.8000000000000000000000000000000000000000000000022 3
1.8888888888888888888888888888888888888888888913205 4 1.9411764705882352941176470588235294117647084569125 5 1.9696969696969696969696969696969696969723495083846 6 1.9846153846153846153846153846153846180779422496889 7 1.992248062015503875968992248062018218070968279944 8 1.9961089494163424124513618677070049064461141667961 9 1.998050682261208576998050684991268132991329645551 10 1.9990243902439024390243929766241359876402781522945 11 1.9995119570522205954151303455889283862002420414092 12 1.9997559189650964147435086295745928366095548127257 13 1.9998779445868451615169464386495752584786229236677 14 1.9999389685715481608370784691478769380770569091713 15 1.9999694860884747554701272066241108169217231319376 16 1.9999874767910784720428384947047783821702386000249 17 2.0027277350948824117795762659330557916802871427763 18 4.7316350177463946015607576536159982430500337286276 19 1156.6278675611076227796014310764287933259776352198 20 1998.5416721291457644804673979070312813731252347786 21 1999.998540608689366669273522363692463645090555294 22 1999.9999985406079725746311606572627439743947878652
The exact solution to your recurrence relation (with initial values u_0 = 3/2, u_1 = 5/3) is easily verified to be
u_n = (2^(n+1) + 1) / (2^n + 1). (*)
The problem you're seeing is that although the solution is such that
lim_{n -> oo} u_n = 2,
this limit is a repelling fixed point of your recurrence relation. That is, any departure from the correct values of u_{n-1}, u{n-2}, for some n, will result in further values diverging from the correct limit. Consequently, unless your implementation of the recurrence relation correctly represents every u_n value exactly, it can be expected to exhibit eventual divergence from the correct limit, converging to the incorrect value of 2000 that just happens to be the only attracting fixed point of your recurrence relation.
(*) In fact, u_n = (2^(n+1) + 1) / (2^n + 1) is the solution to any recurrence relation of the form
u_n = C + (7 - 3C)/u_{n-1} + (2C - 6)/(u_{n-1} u_{n-2})
with the same initial values as given above, where C is an arbitrary constant. If I haven't made a mistake finding the roots of the characteristic polynomial, this will have the set of fixed points {1, 2, C - 3}\{0}. The limit 2 can be either a repelling fixed point or an attracting fixed point, depending on the value of C. E.g., for C = 2003 the set of fixed points is {1, 2, 2000} with 2 being a repellor, whereas for C = 3 the fixed points are {1, 2} with 2 being an attractor.