I'm working on a microcontroller that does not support floating point math. Integer math only. As such, there is no sqrt() function and I can't import any math modules. The MCU is running a subset of python that supports eight Python data types: None, integer, Boolean, string, function, tuple, byte list, and iterator. Also, the MCU can't do floor division (//).
My problem is that I need to calculate the magnitude of 3 signed integers.
mag = sqrt(x**2+y**2+z**2)
FWIW, the values can only be in the range of +/-1024 and I just need a close approximation. Does anyone have a pattern for solving this problem?
Note that the largest possible sum is 3*1024**2, so the largest possible square root is 1773 (floor - or 1774 rounded).
So you could simply take 0 as a starting guess, and repeatedly add 1 until the square exceeds the sum. That can't take more than about 1770 iterations.
Of course that's probably too slow. A straightforward binary search can cut that to 11 iterations, and doesn't require division (I'm assuming the MCU can shift right by 1 bit, which is the same as floor-division by 2).
EDIT
Here's some code, for a binary search returning the floor of the true square root:
def isqrt(n):
if n <= 1:
return n
lo = 0
hi = n >> 1
while lo <= hi:
mid = (lo + hi) >> 1
sq = mid * mid
if sq == n:
return mid
elif sq < n:
lo = mid + 1
result = mid
else:
hi = mid - 1
return result
To check, run:
from math import sqrt
assert all(isqrt(i) == int(sqrt(i)) for i in range(3*1024**2 + 1))
That checks all possible inputs given what you said - and since binary search is notoriously tricky to get right in all cases, it's good to check every case! It doesn't take long on a "real" machine ;-)
PROBABLY IMPORTANT
To guard against possible overflow, and speed it significantly, change the initialization of lo and hi to this:
hi = 1
while hi * hi <= n:
hi <<= 1
lo = hi >> 1
Then the runtime becomes proportional to the number of bits in the result, greatly speeding smaller results. Indeed, for sloppy enough definitions of "close", you could stop right there.
FOR POSTERITY ;-)
Looks like the OP doesn't actually need square roots at all. But for someone who may, and can't afford division, here's a simplified version of the code, also removing multiplications from the initialization. Note: I'm not using .bit_length() because lots of deployed Python versions don't support that.
def isqrt(n):
if n <= 1:
return n
hi, hisq = 2, 4
while hisq <= n:
hi <<= 1
hisq <<= 2
lo = hi >> 1
while hi - lo > 1:
mid = (lo + hi) >> 1
if mid * mid <= n:
lo = mid
else:
hi = mid
assert lo + 1 == hi
assert lo**2 <= n < hi**2
return lo
from math import sqrt
assert all(isqrt(i) == int(sqrt(i)) for i in range(3*1024**2 + 1))
there is a algorithm to calculate it, but it use floor division, without that this is what come to my mind
def isqrt_linel(n):
x = 0
while (x+1)**2 <= n:
x+=1
return x
by the way, the algorithm that I know use the Newton method:
def isqrt(n):
#https://en.wikipedia.org/wiki/Integer_square_root
#https://gist.github.com/bnlucas/5879594
if n>=0:
if n == 0:
return 0
a, b = divmod(n.bit_length(), 2)
x = 2 ** (a + b)
while True:
y = (x + n // x) >> 1
if y >= x:
return x
x = y
else:
raise ValueError("negative number")
Related
I tried to write my own power() function in python but when i compared it with python inbuilt pow() function in output and speed. I found that my code is 6-7 times slower and last 3-4 digits of output is different as compared to output of inbuilt pow() function for floating point numbers. I am totally new to python and unable to find the explanation. plz help
Note:- I have used binomial expansion for calculating fractional power and Binary Exponent for integral power
Here is my code-
def power(x, n):
if not(isinstance(x, complex) or isinstance(n, complex)):
res = 1
if n == 0 and x != 0:
return 1
if n > 0 and x == 0:
return 0
if n < 0 and x == 0:
return "Zero Division Error"
if n == 0 and x == 0:
return "Indeterminate Form"
if n == 1:
return x
if n > 0 and n < 1:
return fpower(x,n)
if n > 1 and n < 2:
return x*fpower(x,n-1)
if n == -1:
return 1/x
if n < 0 and n > -1:
return fpower(x,n)
if n < -1 and n > -2:
return 1/x*fpower(x,n+1)
if n >= 2:
f_p = n - n // 1
t_x = x
n //= 1
if n%2:
res = x
n //= 2
else:
res = 1
n //= 2
while n != 1 and n > 1:
if n%2:
res *= x*x
x *= x
n //= 2
res *= x*x
if f_p == 0:
return res
elif f_p < 1:
return res*fpower(t_x,f_p)
if n <= -2:
f_p = n+(-n // 1)
t_x = x
n = -n // 1
if n%2:
res = 1/x
n = n // 2
else:
res = 1
n = n // 2
while n != 1 and n > 1:
if n%2:
res *= 1/x*1/x
x *= x
n //= 2
res *= 1/x*1/x
if f_p == 0:
return res
elif f_p > -1:
return res*fpower(t_x,f_p)
#function to calculate fractional power
def fpower(x, n):
pwr = 0
sign = 1
t_n = n
if x < 0:
x *= -1
sign = -1
while x > 2:
x = x/2
pwr += 1
if x >1:
pwr *= n
n *= -1
x = 1/x-1
elif x < 0.5 and x != 0:
x = sign*1/x
return fpower(x,-n)
elif x != 0:
x -= 1
res = 0
step = 1
coeff = 1
i = 0
while step > 1e-20:
step = coeff*power(x,i)
res += step
coeff *= (n-i)/(i+1)
i += 1
if step < 0:
step *= -1
mp = res*power(2,pwr)
if sign < 0:
pi = 3.141592653589793
real = mp*cos(t_n*pi)
img = mp*sin(t_n*pi)
if img != 0:
return complex(real,img)
else:
return real
else:
return mp
Output Comparison:
Inbuilt fn -
pow(89,99.354)
4.7829376579139805e+193
Own fn -
power(89,99.354)
4.7829376579139765e+193
Speed Comparison:
Inbuilt fn time -
pow(89,99.354)
0.00026869773864746094
Own fn time-
power(89,99.354)
0.0023398399353027344
There are a few things that can be said here. First of all, many Python standard library functions will be implemented in C, not in Python, so they will be faster. This makes sense, as the standard library will be called by a lot of code lots of times, so it pays off to implement such often-used code in a faster language.
Along with that goes additional optimization: this code is called so often that, besides implementing it in C, it makes sense to optimize this code. Compared to baseline Python, this includes writing "faster" C, but also what you get out of the box from your C compiler.
Some cursory research doesn't suggest this is the case, but exponentiation, as a common math operation, could also have some hardware acceleration that requires you to use specific intructions of the architecture you're running on - which you won't easily (if at all) be able to utilize in pure Python.
Then there's the matter of algorithms; once we leave the language difference aside, there are different ways to compute exponentiation, and without contrary evidence you should not assume that you have chosen the most efficient one. From you pointing out your algorithms it seems you have carefully chosen which ones to use, but still - when you're posing aquestion like this, don't simply assume you made an equally good choice as the standard library of a *looks at Wikipedia* 30 year old language.
Finally, don't forget that benchmarking methodology also plays a role in questions like this in general. a 6-7 times difference won't go away by changing your methodology, but it's worth remembering that testing your code against a very small number of inputs may not be representative of your code's performance in general, and different inputs will be representative for the built-in pow.
That's my thoughts on performance; as for accuracy, floating point errors can simply add up. As soon as you perform more than one operation on a decimal number, this will happen. (And remember, in general the first operation is representing the number in binary (IEEE 754) floating point format.) Wikipedia gives the example if squaring 0.1; 0.1 can not be represented precisely in binary, so you get your first rounding error here. As a result, squaring it will give you a less precise result than the best approximation for 0.01. As far as I can tell, you're not trying to comensate for roundign errors, and you're using a loop, so noticeable rounding errors are inevitable.
I was curious if any of you could come up with a more streamline version of code to calculate Brown numbers. as of the moment, this code can do ~650! before it moves to a crawl. Brown Numbers are calculated thought the equation n! + 1 = m**(2) Where M is an integer
brownNum = 8
import math
def squareNum(n):
x = n // 2
seen = set([x])
while x * x != n:
x = (x + (n // x)) // 2
if x in seen: return False
seen.add(x)
return True
while True:
for i in range(math.factorial(brownNum)+1,math.factorial(brownNum)+2):
if squareNum(i) is True:
print("pass")
print(brownNum)
print(math.factorial(brownNum)+1)
break
else:
print(brownNum)
print(math.factorial(brownNum)+1)
brownNum = brownNum + 1
continue
break
print(input(" "))
Sorry, I don't understand the logic behind your code.
I don't understand why you calculate math.factorial(brownNum) 4 times with the same value of brownNum each time through the while True loop. And in the for loop:
for i in range(math.factorial(brownNum)+1,math.factorial(brownNum)+2):
i will only take on the value of math.factorial(brownNum)+1
Anyway, here's my Python 3 code for a brute force search of Brown numbers. It quickly finds the only 3 known pairs, and then proceeds to test all the other numbers under 1000 in around 1.8 seconds on this 2GHz 32 bit machine. After that point you can see it slowing down (it hits 2000 around the 20 second mark) but it will chug along happily until the factorials get too large for your machine to hold.
I print progress information to stderr so that it can be separated from the Brown_number pair output. Also, stderr doesn't require flushing when you don't print a newline, unlike stdout (at least, it doesn't on Linux).
import sys
# Calculate the integer square root of `m` using Newton's method.
# Returns r: r**2 <= m < (r+1)**2
def int_sqrt(m):
if m <= 0:
return 0
n = m << 2
r = n >> (n.bit_length() // 2)
while True:
d = (n // r - r) >> 1
r += d
if -1 <= d <= 1:
break
return r >> 1
# Search for Browns numbers
fac = i = 1
while True:
if i % 100 == 0:
print('\r', i, file=sys.stderr, end='')
fac *= i
n = fac + 1
r = int_sqrt(n)
if r*r == n:
print('\nFound', i, r)
i += 1
You might want to:
pre calculate your square numbers, instead of testing for them on the fly
pre calculate your factorial for each loop iteration num_fac = math.factorial(brownNum) instead of multiple calls
implement your own, memoized, factorial
that should let you run to the hard limits of your machine
one optimization i would make would be to implement a 'wrapper' function around math.factorial that caches previous values of factorial so that as your brownNum increases, factorial doesn't have as much work to do. this is known as 'memoization' in computer science.
edit: found another SO answer with similar intention: Python: Is math.factorial memoized?
You should also initialize the square root more closely to the root.
e = int(math.log(n,4))
x = n//2**e
Because of 4**e <= n <= 4**(e+1) the square root will be between x/2 and x which should yield quadratic convergence of the Heron formula from the first iteration on.
I am using binary search on Python to solve the following problem: you have a list of n positive integers: a0, a1, a2, ... an-1, in increasing order.
Now, your friend is going to ask you m questions, each of the form, "Here is a positive integer B. Is B a part of the list?"
If B is in the list a, you will say "Yes".
Your task is to output the number of times you say yes for any given inputs.
1 ≤ n ≤ 10^5, 1 ≤ m ≤ 10^5 and 1 ≤ A, B ≤ 10^9
I wrote up the following code:
n = int(raw_input())
a = [int(x) for x in raw_input().split()]
m = int(raw_input())
answer = 0
lo = 0
hi = len(a) - 1
end = False
for i in range(0, m):
B = int(raw_input())
while (lo <= hi):
mid = int((lo + hi) / 2)
if B == a[mid]:
answer = answer + 1
break
elif B < a[mid]:
hi == mid - 1
elif B > a[mid]:
lo == mid + 1
print answer
I tested it out in terminal, and it just never outputs an answer, instead, I just keep writing in numbers (even letters) into the terminal endlessly. Input for n, a, m, and the first value of B has been successful since terminal gives me error message if I type a letter, but after the first 4 lines, it just doesn't respond to whatever I type, until I used ctrl Z to break out of Python.
Would anyone please address why this is the case? I have tested out this program by hand as well, and it should have worked.
Thank you.
One problem with the code is as commented by #JohnnyMopp: you should use = for assignment, not == the equality operator.
Another problem is that the values for hi and low are not reset after each binary search. You should move the lines that initialise those variables inside the for loop:
answer = 0
for i in range(0, m):
B = int(raw_input())
lo = 0
hi = len(a) - 1
while (lo <= hi):
mid = int((lo + hi) / 2)
if B == a[mid]:
answer = answer + 1
break
elif B < a[mid]:
hi = mid - 1
elif B > a[mid]:
lo = mid + 1
print answer
A better idea would be to write the binary search as a function.
Another way of achieving this without writing your own binary search is to use bisect.bisect():
import bisect
def bisect_in(l, v):
return v == l[bisect.bisect(l, v)-1]
count = 0
for i in range(m):
B = int(raw_input())
count += bisect_in(a, B)
print count
for given x < 10^15, quickly and accurately determine the maximum integer p such that 2^p <= x
Here are some things I've tried:
First I tried this but it's not accurate for large numbers:
>>> from math import log
>>> x = 2**3
>>> x
8
>>> p = int(log(x, 2))
>>> 2**p == x
True
>>> x = 2**50
>>> p = int(log(x, 2))
>>> 2**p == x #not accurate for large numbers?
False
I could try something like:
p = 1
i = 1
while True:
if i * 2 > n:
break
i *= 2
p += 1
not_p = n - p
Which would take up to 50 operations if p was 50
I could pre-compute all the powers of 2 up until 2^50, and use binary search to find p. This would take around log(50) operations but seems a bit excessive and ugly?
I found this thread for C based solutions: Compute fast log base 2 ceiling
However It seems a bit ugly and I wasn't exactly sure how to convert it to python.
In Python >= 2.7, you can use the .bit_length() method of integers:
def brute(x):
# determine max p such that 2^p <= x
p = 0
while 2**p <= x:
p += 1
return p-1
def easy(x):
return x.bit_length() - 1
which gives
>>> brute(0), brute(2**3-1), brute(2**3)
(-1, 2, 3)
>>> easy(0), easy(2**3-1), easy(2**3)
(-1, 2, 3)
>>> brute(2**50-1), brute(2**50), brute(2**50+1)
(49, 50, 50)
>>> easy(2**50-1), easy(2**50), easy(2**50+1)
(49, 50, 50)
>>>
>>> all(brute(n) == easy(n) for n in range(10**6))
True
>>> nums = (max(2**x+d, 0) for x in range(200) for d in range(-50, 50))
>>> all(brute(n) == easy(n) for n in nums)
True
You specify in comments your x is an integer, but for anyone coming here where their x is already a float, then math.frexp() would be pretty fast at extracting log base 2:
log2_slow = int(floor(log(x, 2)))
log2_fast = frexp(x)[1]-1
The C function that frexp() calls just grabs and tweaks the exponent. Some more 'splainin:
The subscript[1] is because frexp() returns a tuple (significand, exponent).
The subtract-1 accounts for the significand being in the range [0.5,1.0). For example 250 is stored as 0.5x251.
The floor() is because you specified 2^p <= x, so p == floor(log(x,2)).
(Derived from another answer.)
Be careful! The accepted answer returns floor(log(n, 2)), NOT ceil(log(n, 2)) like the title of the question implies!
If you came here for a clog2 implementation, do this:
def clog2(x):
"""Ceiling of log2"""
if x <= 0:
raise ValueError("domain error")
return (x-1).bit_length()
And for completeness:
def flog2(x):
"""Floor of log2"""
if x <= 0:
raise ValueError("domain error")
return x.bit_length() - 1
You could try the log2 function from numpy, which appears to work for powers up to 2^62:
>>> 2**np.log2(2**50) == 2**50
True
>>> 2**np.log2(2**62) == 2**62
True
Above that (at least for me) it fails due to the limtiations of numpy's internal number types, but that will handle data in the range you say you're dealing with.
Works for me, Python 2.6.5 (CPython) on OSX 10.7:
>>> x = 2**50
>>> x
1125899906842624L
>>> p = int(log(x,2))
>>> p
50
>>> 2**p == x
True
It continues to work at least for exponents up to 1e9, by which time it starts to take quite a while to do the math. What are you actually getting for x and p in your test? What version of Python, on what OS, are you running?
With respect to "not accurate for large numbers" your challenge here is that the floating point representation is indeed not as precise as you need it to be (49.999999999993 != 50.0). A great reference is "What Every Computer Scientist Should Know About Floating-Point Arithmetic."
The good news is that the transformation of the C routine is very straightforward:
def getpos(value):
if (value == 0):
return -1
pos = 0
if (value & (value - 1)):
pos = 1
if (value & 0xFFFFFFFF00000000):
pos += 32
value = value >> 32
if (value & 0x00000000FFFF0000):
pos += 16
value = value >> 16
if (value & 0x000000000000FF00):
pos += 8
value = value >> 8
if (value & 0x00000000000000F0):
pos += 4
value = value >> 4
if (value & 0x000000000000000C):
pos += 2
value = value >> 2
if (value & 0x0000000000000002):
pos += 1
value = value >> 1
return pos
Another alternative is that you could round to the nearest integer, instead of truncating:
log(x,2)
=> 49.999999999999993
round(log(x,2),1)
=> 50.0
I needed to calculate the upper bound power of two (to figure out how many bytes of entropy was needed to generate a random number in a given range using the modulus operator).
From a rough experiment I think the calculation below gives the minimum integer p such that val < 2^p
It's probably about as fast as you can get, and uses exclusively bitwise integer arithmetic.
def log2_approx(val):
from math import floor
val = floor(val)
approx = 0
while val != 0:
val &= ~ (1<<approx)
approx += 1
return approx
Your slightly different value would be calculated for a given n by
log2_approx(n) - 1
...maybe. But in any case, the bitwise arithmetic could give you a clue how to do this fast.
I want to generate the digits of the square root of two to 3 million digits.
I am aware of Newton-Raphson but I don't have much clue how to implement it in C or C++ due to lack of biginteger support. Can somebody point me in the right direction?
Also, if anybody knows how to do it in python (I'm a beginner), I would also appreciate it.
You could try using the mapping:
a/b -> (a+2b)/(a+b) starting with a= 1, b= 1. This converges to sqrt(2) (in fact gives the continued fraction representations of it).
Now the key point: This can be represented as a matrix multiplication (similar to fibonacci)
If a_n and b_n are the nth numbers in the steps then
[1 2] [a_n b_n]T = [a_(n+1) b_(n+1)]T
[1 1]
which now gives us
[1 2]n [a_1 b_1]T = [a_(n+1) b_(n+1)]T
[1 1]
Thus if the 2x2 matrix is A, we need to compute An which can be done by repeated squaring and only uses integer arithmetic (so you don't have to worry about precision issues).
Also note that the a/b you get will always be in reduced form (as gcd(a,b) = gcd(a+2b, a+b)), so if you are thinking of using a fraction class to represent the intermediate results, don't!
Since the nth denominators is like (1+sqrt(2))^n, to get 3 million digits you would likely need to compute till the 3671656th term.
Note, even though you are looking for the ~3.6 millionth term, repeated squaring will allow you to compute the nth term in O(Log n) multiplications and additions.
Also, this can easily be made parallel, unlike the iterative ones like Newton-Raphson etc.
EDIT: I like this version better than the previous. It's a general solution that accepts both integers and decimal fractions; with n = 2 and precision = 100000, it takes about two minutes. Thanks to Paul McGuire for his suggestions & other suggestions welcome!
def sqrt_list(n, precision):
ndigits = [] # break n into list of digits
n_int = int(n)
n_fraction = n - n_int
while n_int: # generate list of digits of integral part
ndigits.append(n_int % 10)
n_int /= 10
if len(ndigits) % 2: ndigits.append(0) # ndigits will be processed in groups of 2
decimal_point_index = len(ndigits) / 2 # remember decimal point position
while n_fraction: # insert digits from fractional part
n_fraction *= 10
ndigits.insert(0, int(n_fraction))
n_fraction -= int(n_fraction)
if len(ndigits) % 2: ndigits.insert(0, 0) # ndigits will be processed in groups of 2
rootlist = []
root = carry = 0 # the algorithm
while root == 0 or (len(rootlist) < precision and (ndigits or carry != 0)):
carry = carry * 100
if ndigits: carry += ndigits.pop() * 10 + ndigits.pop()
x = 9
while (20 * root + x) * x > carry:
x -= 1
carry -= (20 * root + x) * x
root = root * 10 + x
rootlist.append(x)
return rootlist, decimal_point_index
As for arbitrary big numbers you could have a look at The GNU Multiple Precision Arithmetic Library (for C/C++).
For work? Use a library!
For fun? Good for you :)
Write a program to imitate what you would do with pencil and paper. Start with 1 digit, then 2 digits, then 3, ..., ...
Don't worry about Newton or anybody else. Just do it your way.
Here is a short version for calculating the square root of an integer a to digits of precision. It works by finding the integer square root of a after multiplying by 10 raised to the 2 x digits.
def sqroot(a, digits):
a = a * (10**(2*digits))
x_prev = 0
x_next = 1 * (10**digits)
while x_prev != x_next:
x_prev = x_next
x_next = (x_prev + (a // x_prev)) >> 1
return x_next
Just a few caveats.
You'll need to convert the result to a string and add the decimal point at the correct location (if you want the decimal point printed).
Converting a very large integer to a string isn't very fast.
Dividing very large integers isn't very fast (in Python) either.
Depending on the performance of your system, it may take an hour or longer to calculate the square root of 2 to 3 million decimal places.
I haven't proven the loop will always terminate. It may oscillate between two values differing in the last digit. Or it may not.
The nicest way is probably using the continued fraction expansion [1; 2, 2, ...] the square root of two.
def root_two_cf_expansion():
yield 1
while True:
yield 2
def z(a,b,c,d, contfrac):
for x in contfrac:
while a > 0 and b > 0 and c > 0 and d > 0:
t = a // c
t2 = b // d
if not t == t2:
break
yield t
a = (10 * (a - c*t))
b = (10 * (b - d*t))
# continue with same fraction, don't pull new x
a, b = x*a+b, a
c, d = x*c+d, c
for digit in rdigits(a, c):
yield digit
def rdigits(p, q):
while p > 0:
if p > q:
d = p // q
p = p - q * d
else:
d = (10 * p) // q
p = 10 * p - q * d
yield d
def decimal(contfrac):
return z(1,0,0,1,contfrac)
decimal((root_two_cf_expansion()) returns an iterator of all the decimal digits. t1 and t2 in the algorithm are minimum and maximum values of the next digit. When they are equal, we output that digit.
Note that this does not handle certain exceptional cases such as negative numbers in the continued fraction.
(This code is an adaptation of Haskell code for handling continued fractions that has been floating around.)
Well, the following is the code that I wrote. It generated a million digits after the decimal for the square root of 2 in about 60800 seconds for me, but my laptop was sleeping when it was running the program, it should be faster that. You can try to generate 3 million digits, but it might take a couple days to get it.
def sqrt(number,digits_after_decimal=20):
import time
start=time.time()
original_number=number
number=str(number)
list=[]
for a in range(len(number)):
if number[a]=='.':
decimal_point_locaiton=a
break
if a==len(number)-1:
number+='.'
decimal_point_locaiton=a+1
if decimal_point_locaiton/2!=round(decimal_point_locaiton/2):
number='0'+number
decimal_point_locaiton+=1
if len(number)/2!=round(len(number)/2):
number+='0'
number=number[:decimal_point_locaiton]+number[decimal_point_locaiton+1:]
decimal_point_ans=int((decimal_point_locaiton-2)/2)+1
for a in range(0,len(number),2):
if number[a]!='0':
list.append(eval(number[a:a+2]))
else:
try:
list.append(eval(number[a+1]))
except IndexError:
pass
p=0
c=list[0]
x=0
ans=''
for a in range(len(list)):
while c>=(20*p+x)*(x):
x+=1
y=(20*p+x-1)*(x-1)
p=p*10+x-1
ans+=str(x-1)
c-=y
try:
c=c*100+list[a+1]
except IndexError:
c=c*100
while c!=0:
x=0
while c>=(20*p+x)*(x):
x+=1
y=(20*p+x-1)*(x-1)
p=p*10+x-1
ans+=str(x-1)
c-=y
c=c*100
if len(ans)-decimal_point_ans>=digits_after_decimal:
break
ans=ans[:decimal_point_ans]+'.'+ans[decimal_point_ans:]
total=time.time()-start
return ans,total
Python already supports big integers out of the box, and if that's the only thing holding you back in C/C++ you can always write a quick container class yourself.
The only problem you've mentioned is a lack of big integers. If you don't want to use a library for that, then are you looking for help writing such a class?
Here's a more efficient integer square root function (in Python 3.x) that should terminate in all cases. It starts with a number much closer to the square root, so it takes fewer steps. Note that int.bit_length requires Python 3.1+. Error checking left out for brevity.
def isqrt(n):
x = (n >> n.bit_length() // 2) + 1
result = (x + n // x) // 2
while abs(result - x) > 1:
x = result
result = (x + n // x) // 2
while result * result > n:
result -= 1
return result