Finding digits in powers of 2 fast

Finding digits in powers of 2 fast - python

The task is to search every power of two below 2^10000, returning the index of the first power in which a string is contained. For example if the given string to search for is "7" the program will output 15, as 2^15 is the first power to contain 7 in it.
I have approached this with a brute force attempt which times out on ~70% of test cases.
for i in range(1,9999):
if search in str(2**i):
print i
break
How would one approach this with a time limit of 5 seconds?

Try not to compute 2^i at each step.
pow = 1
for i in xrange(1,9999):
if search in str(pow):
print i
break
pow *= 2
You can compute it as you go along. This should save a lot of computation time.
Using xrange will prevent a list from being built, but that will probably not make much of a difference here.
in is probably implemented as a quadratic string search algorithm. It may (or may not, you'd have to test) be more efficient to use something like KMP for string searching.

A faster approach could be computing the numbers directly in decimal
def double(x):
carry = 0
for i, v in enumerate(x):
d = v*2 + carry
if d > 99999999:
x[i] = d - 100000000
carry = 1
else:
x[i] = d
carry = 0
if carry:
x.append(carry)
Then the search function can become
def p2find(s):
x = [1]
for y in xrange(10000):
if s in str(x[-1])+"".join(("00000000"+str(y))[-8:]
for y in x[::-1][1:]):
return y
double(x)
return None
Note also that the digits of all powers of two up to 2^10000 are just 15 millions, and searching the static data is much faster. If the program must not be restarted each time then
def p2find(s, digits = []):
if len(digits) == 0:
# This precomputation happens only ONCE
p = 1
for k in xrange(10000):
digits.append(str(p))
p *= 2
for i, v in enumerate(digits):
if s in v: return i
return None
With this approach the first check will take some time, next ones will be very very fast.

Compute every power of two and build a suffix tree using each string. This is linear time in the size of all the strings. Now, the lookups are basically linear time in the length of each lookup string.
I don't think you can beat this for computational complexity.

There are only 10000 numbers. You don't need any complex algorithms. Simply calculated them in advance and do search. This should take merely 1 or 2 seconds.
powers_of_2 = [str(1<<i) for i in range(10000)]
def search(s):
for i in range(len(powers_of_2)):
if s in powers_of_2[i]:
return i

Try this
twos = []
twoslen = []
two = 1
for i in xrange(10000):
twos.append(two)
twoslen.append(len(str(two)))
two *= 2
tens = []
ten = 1
for i in xrange(len(str(two))):
tens.append(ten)
ten *= 10
s = raw_input()
l = len(s)
n = int(s)
for i in xrange(len(twos)):
for j in xrange(twoslen[i]):
k = twos[i] / tens[j]
if k < n: continue
if (k - n) % tens[l] == 0:
print i
exit()
The idea is to precompute every power of 2, 10 and and also to precompute the number of digits for every power of 2. In this way the problem is reduces to finding the minimum i for which there exist a j such that after removing the last j digits from 2 ** i you obtain a number which ends with n or expressed as a formula (2 ** i / 10 ** j - n) % 10 ** len(str(n)) == 0.

A big problem here is that converting a binary integer to decimal notation takes time quadratic in the number of bits (at least in the straightforward way Python does it). It's actually faster to fake your own decimal arithmetic, as #6502 did in his answer.
But it's very much faster to let Python's decimal module do it - at least under Python 3.3.2 (I don't know how much C acceleration is built in to Python decimal versions before that). Here's code:
class S:
def __init__(self):
import decimal
decimal.getcontext().prec = 4000 # way more than enough for 2**10000
p2 = decimal.Decimal(1)
full = []
for i in range(10000):
s = "%s<%s>" % (p2, i)
##assert s == "%s<%s>" % (str(2**i), i)
full.append(s)
p2 *= 2
self.full = "".join(full)
def find(self, s):
import re
pat = s + "[^<>]*<(\d+)>"
m = re.search(pat, self.full)
if m:
return int(m.group(1))
else:
print(s, "not found!")
and sample usage:
>>> s = S()
>>> s.find("1")
0
>>> s.find("2")
1
>>> s.find("3")
5
>>> s.find("65")
16
>>> s.find("7")
15
>>> s.find("00000")
1491
>>> s.find("666")
157
>>> s.find("666666")
2269
>>> s.find("66666666")
66666666 not found!
s.full is a string with a bit over 15 million characters. It looks like this:
>>> print(s.full[:20], "...", s.full[-20:])
1<0>2<1>4<2>8<3>16<4 ... 52396298354688<9999>
So the string contains each power of 2, with the exponent following a power enclosed in angle brackets. The find() method constructs a regular expression to search for the desired substring, then look ahead to find the power.
Playing around with this, I'm convinced that just about any way of searching is "fast enough". It's getting the decimal representations of the large powers that sucks up the vast bulk of the time. And the decimal module solves that one.

Related

generate unique string of length n without prefilled dictionary

I have an application that is kind of like a URL shortener and need to generate unique URL whenever a user requests.
For this I need a function to map an index/number to a unique string of length n with two requirements:
Two different numbers can not generate same string.
In other words as long as i,j<K: f(i) != f(j). K is the number of possible strings = 26^n. (26 is number of characters in English)
Two strings generated by number i and i+1 don't look similar most of the times. For example they are not abcdef1 and abcdef2. (So that users can not predict the pattern and the next IDs)
This is my current code in Python:
chars = "abcdefghijklmnopqrstuvwxyz"
for item in itertools.product(chars, repeat=n):
print("".join(item))
# For n = 7 generates:
# aaaaaaa
# aaaaaab
# aaaaaac
# ...
The problem with this code is there is no index that I can use to generate unique strings on demand by tracking that index. For example generate 1 million unique strings today and 2 million tomorrow without looping through or collision with the first 1 million.
The other problem with this code is that the strings that are created after each other look very similar and I need them to look random.
One option is to populate a table/dictionary with millions of strings, shuffle them and keep track of index to that table but it takes a lot of memory.
An option is also to check the database of existing IDs after generating a random string to make sure it doesn't exist but the problem is as I get closer to the K (26^n) the chance of collision increases and it wouldn't be efficient to make a lot of check_if_exist queries against the database.
Also if n was long enough I could use UUID with small chance of collision but in my case n is 7.

I'm going to outline a solution for you that is going to resist casual inspection even by a knowledgeable person, though it probably IS NOT cryptographically secure.
First, your strings and numbers are in a one-to-one map. Here is some simple code for that.
alphabet = 'abcdefghijklmnopqrstuvwxyz'
len_of_codes = 7
char_to_pos = {}
for i in range(len(alphabet)):
char_to_pos[alphabet[i]] = i
def number_to_string(n):
chars = []
for _ in range(len_of_codes):
chars.append(alphabet[n % len(alphabet)])
n = n // len(alphabet)
return "".join(reversed(chars))
def string_to_number(s):
n = 0
for c in s:
n = len(alphabet) * n + char_to_pos[c]
return n
So now your problem is how to take an ascending stream of numbers and get an apparently random stream of numbers out of it instead. (Because you know how to turn those into strings.) Well, there are lots of tricks for primes, so let's find a decent sized prime that fits in the range that you want.
def is_prime (n):
for i in range(2, n):
if 0 == n%i:
return False
elif n < i*i:
return True
if n == 2:
return True
else:
return False
def last_prime_before (n):
for m in range(n-1, 1, -1):
if is_prime(m):
return m
print(last_prime_before(len(alphabet)**len_of_codes)
With this we find that we can use the prime 8031810103. That's how many numbers we'll be able to handle.
Now there is an easy way to scramble them. Which is to use the fact that multiplication modulo a prime scrambles the numbers in the range 1..(p-1).
def scramble1 (p, k, n):
return (n*k) % p
Picking a random number to scramble by, int(random.random() * 26**7) happened to give me 3661807866, we get a sequence we can calculate with:
for i in range(1, 5):
print(number_to_string(scramble1(8031810103, 3661807866, i)))
Which gives us
lwfdjoc
xskgtce
jopkctb
vkunmhd
This looks random to casual inspection. But will be reversible for any knowledgeable someone who puts modest effort in. They just have to guess the prime and algorithm that we used, look at 2 consecutive values to get the hidden parameter, then look at a couple of more to verify it.
Before addressing that, let's figure out how to take a string and get the number back. Thanks to Fermat's little theorem we know for p prime and 1 <= k < p that (k * k^(p-2)) % p == 1.
def n_pow_m_mod_k (n, m, k):
answer = 1
while 0 < m:
if 1 == m % 2:
answer = (answer * n) % k
m = m // 2
n = (n * n) % k
return answer
print(n_pow_m_mod_k(3661807866, 8031810103-2, 8031810103))
This gives us 3319920713. Armed with that we can calculate scramble1(8031810103, 3319920713, string_to_number("vkunmhd")) to find out that vkunmhd came from 4.
Now let's make it harder. Let's generate several keys to be scrambling with:
import random
p = 26**7
for i in range(5):
p = last_prime_before(p)
print((p, int(random.random() * p)))
When I ran this I happened to get:
(8031810103, 3661807866)
(8031810097, 3163265427)
(8031810091, 7069619503)
(8031809963, 6528177934)
(8031809917, 991731572)
Now let's scramble through several layers, working from smallest prime to largest (this requires reversing the sequence):
def encode (n):
for p, k in [
(8031809917, 991731572)
, (8031809963, 6528177934)
, (8031810091, 7069619503)
, (8031810097, 3163265427)
, (8031810103, 3661807866)
]:
n = scramble1(p, k, n)
return number_to_string(n)
This will give a sequence:
ehidzxf
shsifyl
gicmmcm
ofaroeg
And to reverse it just use the same trick that reversed the first scramble (reversing the primes so I am unscrambling in the order that I started with):
def decode (s):
n = string_to_number(s)
for p, k in [
(8031810103, 3319920713)
, (8031810097, 4707272543)
, (8031810091, 5077139687)
, (8031809963, 192273749)
, (8031809917, 5986071506)
]:
n = scramble1(p, k, n)
return n
TO BE CLEAR I do NOT promise that this is cryptographically secure. I'm not a cryptographer, and I'm aware enough of my limitations that I know not to trust it.
But I do promise that you'll have a sequence of over 8 billion strings that you are able to encode/decode with no obvious patterns.
Now take this code, scramble the alphabet, regenerate the magic numbers that I used, and choose a different number of layers to go through. I promise you that I personally have absolutely no idea how someone would even approach the problem of figuring out the algorithm. (But then again I'm not a cryptographer. Maybe they have some techniques to try. I sure don't.)

How about :
from random import Random
n = 7
def f(i):
myrandom = Random()
myrandom.seed(i)
alphabet = "123456789"
return "".join([myrandom.choice(alphabet) for _ in range(7)])
# same entry, same output
assert f(0) == "7715987"
assert f(0) == "7715987"
assert f(0) == "7715987"
# different entry, different output
assert f(1) == "3252888"
(change the alphabet to match your need)
This "emulate" a UUID, since you said you could accept a small chance of collision. If you want to avoid collision, what you really need is a perfect hash function (https://en.wikipedia.org/wiki/Perfect_hash_function).

you can try something based on the sha1 hash
#!/usr/bin/python3
import hashlib
def generate_link(i):
n = 7
a = "abcdefghijklmnopqrstuvwxyz01234567890"
return "".join(a[x%36] for x in hashlib.sha1(str(i).encode('ascii')).digest()[-n:])

This is a really simple example of what I outlined in this comment. It just offsets the number based on i. If you want "different" strings, don't use this, because if num is 0, then you will get abcdefg (with n = 7).
alphabet = "abcdefghijklmnopqrstuvwxyz"
# num is the num to convert, i is the "offset"
def num_to_char(num, i):
return alphabet[(num + i) % 26]
# generate the link
def generate_link(num, n):
return "".join([num_to_char(num, i) for i in range(n)])
generate_link(0, 7) # "abcdefg"
generate_link(0, 7) # still "abcdefg"
generate_link(0, 7) # again, "abcdefg"!
generate_link(1, 7) # now it's "bcdefgh"!
You would just need to change the num + i to some complicated and obscure math equation.

How to make the length of a list "len(list)" a variable to be used for further calculations? Python3

Is it possible to make the result from len(factors) be assigned as a variable? What I have so far is h = int(len(factors)), however i'm not sure if this actually does anything. My code below is attempting to take an integer 'r' and represent 'r' in the form (2^k)*t+1. This part of the code below is dealing with finding this product of powers of two and some other odd integer (2^k)*t.
It could be that I am going about this the wrong way, but from my research and trial and error, I have finally got this to work so far. But now more issues arise when extracting certain values.
from math import *
def executeproth():
r = input("Number to test:")
n = int(r)-1
d = 2
factors = []
while n % 2 == 0:
factors.append(d)
n = int(n/d)
h = int(len(factors))
print(n, factors, h)
# k = eval(2**h)
return factors
executeproth()
For example an input of 29 yields the following:
Number to test:29
14 [2] 1
7 [2, 2] 2
So in this instance, t=7, k=2, so we would have 29=(2^2)*7+1.
What I want to do is now take the third lines values, namely the '2', and use this for further calculations. But the commented out line # k = eval(2**h) throws the error as follows:
TypeError: eval() arg 1 must be a string, bytes or code object
So from what I can understand, the thing I am trying to evaluate is not in the correct form. I also wonder if the problem arises due to the nature of the while loop that keeps feeding values back in and creating multiples lists, as shown, and hence multiple values of h len(factors).
How would one print only the results of the 'final' iteration in the while loop? i.e. 7 [2,2] 2

Here this should fulfil your requirement,I don't think you really need to evaluate k.
Also this addresses the second part of your question too, to print the final result of the loop.
And it is as Gregory pointed out that convert explicitly to int only when needed and eval is for strings, your expression was already in integer terms.
def executeproth():
r = input("Number to test:")
n = int(r) - 1
d = 2
factors = []
while n % 2 == 0:
factors.append(d)
n = n // d
h = len(factors)
#print(n, factors, h)
else:
print"{} = ( 2 ^ {} ) * {} + 1".format(r,h,n)
return factors
executeproth()

First of all, you don't need to explicitly convert a value to an int just to use it in an expression in general. You do need it when processing the input since input() returns a string.
It is more idiomatic to use integer division a // b instead of int(a/b) in python 3.
Finally, eval is for evaluating strings, not expressions. Expressions are always evaluated.
from math import *
def executeproth():
r = input("Number to test:")
n = int(r)-1
d = 2
factors = []
while n % 2 == 0:
factors.append(d)
n = n // d
h = len(factors)
print(n, factors, h)
k = 2**h
# does the same thing but is less efficient
# k = eval("2**h")
return factors
executeproth()

As others have said you don't need eval here. In fact, you should generally avoid using eval since it can be dangerous. And in most situations where you do need to evaluate an expression in string form you can generally get by with the much safer ast.literal_eval. However, at this stage of your learning it's unlikely that you will encounter many situations where you need to work with such advanced features of the language.
Anyway, here are a few more improvements to your code.
You don't need to import the math module since you aren't using any of the functions or constants defined in it. But when you do need to import a module it's best to avoid the from module_name import * form since that pollutes your namespace with all of the names defined in the module.
You don't need to store those 2s in a list - just count them.
It's better to do your input (and input validation) in the outer layers of your program rather than doing it deep in the functions that perform your calculations.
Python provides various augmented assignment operators that you can use when you want to perform a simple operation on a value and store the result back under the original name. Eg count += 1 adds 1 to count, saving the result in count.
Python allows you to return multiple objects as a tuple, so you can return the final value of n and the count of the number of factors of 2 that you've found.
def executeproth(r):
n = r - 1
count = 0
if r != 0:
while n % 2 == 0:
count += 1
n //= 2
return n, count
r = int(input("Number to test: "))
n, count = executeproth(r)
k = 2 ** count
print("{0} = {1} * 2 ** {2} + 1".format(r, n, count))
#v = n*k + 1
#if v != r:
# print("Error!")
The if r != 0: prevents infinite looping if r is zero.
I've also added a (commented-out) test at the end. It's a good idea to do simple tests like that to make sure we're getting what we expect. Writing useful tests is an important part of program development.
Typical output:
Number to test: 0
0 = -1 * 2 ** 0 + 1
Number to test: 29
29 = 7 * 2 ** 2 + 1
Number to test: 57
57 = 7 * 2 ** 3 + 1

Fastest way to generate number like 66666 when the number of digits is given

I have an interesting problem where I want to generate a big number (~30000 digits) but it has to be all identical digits, like 66666666666666.......
So far I have done this by:
def fillWithSixes(digits):
result = 0
for i in range(digits):
result *= 10
result += 6
return result
However, this is very inefficient, and was wondering if there is any better way? Answer in cpp or java is okay too.
Edit:
Let's not just solve for 666666..... I want it to be generic for any number. How about 7777777777.... or 44444........ or 55555...?
String operations are worse, the increase from current complexity of O(n) to O(n^2).

You may use the formula 666...666 = 6/9*(10**n-1), where n is the number of digits.
So, in Python, you would write that as
n = int(input())
a = 6 * (10**n - 1) // 9
print(a)

You can use ljust or rjust:
number = 6
amount_of_times_to_repeat = 30000
big_number = int("".ljust(amount_of_times_to_repeat, str(number)))
print big_number
In one single line:
print int("".ljust(30000, str(6)))
Or:
new_number = int("".ljust(30000, str(6)))

The fastest method to generate such numbers with 100000+ digits is decimal.Decimal():
from decimal import Decimal as D
d = D('6' * n)
Measurements show that 6 * (10**n - 1) // 9 is O(n*log n) while D('6' * n) is O(n). Though for small n (less than ~10000), the former can be faster.
Decimal internal representation stores decimal digits directly. If you need to print the numbers latter; str(Decimal) is much faster than str(int).

Memory management for python scripts

So I'm trying to solve some problems from the Euler project in python. I'm currently working on Problem 92, square digit chains. Basically the idea is that if you take any integer, and square its component digits recursively (e.g. 42 = 42 + 22 = 20, then 22 + 02 = 4, etc.), you always end up either at 1 or 89.
I am trying to write a program that can compute how many numbers, in a range 1 to 10K, will end up in 89 and how many will end up in 1. I am not trying to store which integers end up where, only how many. The goal is to be able to do that for the largest K possible. (This is a challenge from Hackerrank for those curious).
In other to do for large number within my lifetime, I need to use caching. But then that's a balancing act between caching (which eventually takes up lots of RAM) and computing time.
My problem is that I eventually run out of memory. So I have tried to cap the length of the cache that I am using. However, I still run out of memory. I cannot seem to be able to find what is causing me to run out of memory.
I am running it on pycharm on ubuntu 14.04 LTS.
My question:
Is there a way to check what is taking up my RAM? Is there some tool (or script) that can allow me to basically monitor memory use by variables within my program? Or an wrong in assuming that if I run out of RAM, it is necessarily because some variable in my program is too large? I have to admit I am not all that clear on the fine details of memory use within a program....
EDIT: I run out of mem when K = 8, so for integers up to 108, which is not so large. Also, I did testing before 108 (so 107, which terminates but takes some time and uses more memory than smaller computation). And it doesn't seem that capping my cache size variables makes a differences.....

I would suggest testing various cache sizes to see if it is actually beneficial to have as large a cache as possible.
If you take any 10-digit number and compute the sum of squares of its digits, the sum will be at most 10*9*9 = 810. Thus, if you cache the result for numbers 1 to 810, then you should be able to process all numbers with between 4 and 10 digits without recursion.
In this way, I have processed the first 10^8 numbers in around 6 minutes with memory usage staying constant at roughly 10 MB.

This is a variation of Mathias Rav's excellent idea but keeps your idea of using a recursive function with memozation. The idea is to use a helper function to do the heavy lifting and have the main function just do the first step of the iteration. The very first step reduces the problem size to one for which caching is useful. The cache remains small. I was able to do all numbers up to 10**8 in about 10 minutes (the overhead due to the recursion makes this solution less efficient than Mathias' solution):
cache = {}
def helper(n):
if n == 1 or n == 89:
return n
elif n in cache:
return cache[n]
else:
ss = sum(int(d)**2 for d in str(n))
v = helper(ss)
cache[n] = v
return v
def f(n):
ss = sum(int(d)**2 for d in str(n))
return helper(ss)
def freq89(n):
total = 0
for i in range(1,n+1):
if f(i) == 89: total += 1
return total/n

This is an extended comment on the answers by Mathias Rav and John Coleman. I was going to make this a community wiki answer. John Coleman said not to do so, so I'm not.
I'll start with John Coleman's answer.
cache = {}
def helper(n):
if n == 1 or n == 89:
return n
elif n in cache:
return cache[n]
else:
ss = sum(int(d)**2 for d in str(n))
v = helper(ss)
cache[n] = v
return v
def f(n):
ss = sum(int(d)**2 for d in str(n))
return helper(ss)
A small thing that will speed things up a bit is to avoid that first if in helper(n) by initializing cache to {1:some_value, 89:some_other_value}. The obvious initialization is {1:1, 89:89}. A less obvious, but ultimately faster initialization is {1:False, 89:True}. This enables changing if f(i) == 89: total += 1 to if f(i): total += 1.
Another small thing that might help is to get rid of the recursion. That's not the case here. To get rid of the recursion, we'd have to do something along the lines of
def helper(n):
l = []
while n not in cache :
l.append(n)
n = sum(int(d)**2 for d in str(n))
v = cache[n]
for k in l :
cache[k] = v
return v
The problem is that almost all of the numbers encountered by f(n) will already be in the cache thanks to how helper is called from f(n). Getting rid of the recursion needlessly creates an empty list that needs to be garbage collected.
The big issue with John Coleman's answer is the calculation of the sum of the square of the digits via sum(int(d)**2 for d in str(n)). While very pythonic, this is extremely expensive. I'll start by changing the variable ss in helper and in f into a function:
def ss(n):
return sum(int(d)**2 for d in str(n))
This alone does nothing for performance. In fact, it hurts performance. Function calls are expensive in python. By making this a function, we can do some non-pythonic things by replacing the string operations with integer arithmetic:
def ss(n):
s = 0
while n != 0:
d = n % 10
n = n // 10
s += d**2
return s
The speedup here is quite significant; I get a 30% reduction in computation time. That's not great. There's another problem, the use of the exponentiation operator. In almost any language but Fortran and Matlab, using d*d is much faster than is d**2. That's certainly the case in python. That simple change almost halves the execution time from that already significant 30% reduction.
Putting this all together yields
cache = {1:False, 89:True}
def ss (n):
s = 0
while n != 0:
d = n % 10
n = n // 10
s += d*d
return s
def helper(n):
if n in cache:
return cache[n]
else:
v = helper(ss(n))
cache[n] = v
return v
def f(n):
return helper(ss(n))
def freq89(n):
total = 0
for i in range(1,n+1):
if f(i): total += 1
return total/n
print (freq89(int(1e7)))
I have yet to take advantage of Mathias Rav's answer. In this case, it will make sense to get rid of the recursion. It will also help to embed the loop over the initial range inside of the function that initializes the cache (function calls are expensive in python).
N = int(1e7)
cache = {1:False, 89:True}
def ss(n):
s = 0
while n != 0:
d = n % 10
n //= 10
s += d*d
return s
def initialize_cache(maxsum):
for n in range(1,maxsum+1):
l = []
while n not in cache:
l.append(n)
n = ss(n)
v = cache[n]
for k in l:
cache[k] = v
def freq89(n):
total = 0
for i in range(1,n):
if cache[ss(i)]:
total += 1
return total/n
maxsum = 81*len(str(N-1))
initialize_cache(maxsum)
print (freq89(N))
The above takes about 16.5 seconds (on my computer) to calculate the ratio for numbers between 1 (inclusive) and 10000000 (exclusive) on my computer. This is almost three times faster than the initial version (44.7 seconds). It takes a bit over three minutes for the above to calculate calculate the ratio for numbers between 1 (inclusive) and 1e8 (exclusive).
It turns out I'm not done. There's no need to calculate the sum of the squares of the digits of (for example) 12345679 digit by digit when the program just did that for 12345678. A shortcut that reduces the calculation time for nine out of ten use cases pays off. The function ss(n) becomes a bit more complex:
prevn = 0
prevd = 0
prevs = 0
def ss(n):
global prevn, prevd, prevs
d = n % 10
if (n == prevn+1) and (d == prevd+1):
s = prevs + 2*prevd + 1
prevs = s
prevn = n
prevd = d
return s
s = 0
prevn = n
prevd = d
while n != 0:
d = n % 10
n //= 10
s += d*d
prevs = s
return s
With this, calculating the ratio for numbers up to (but not including) 1e7 takes 6.6 seconds, 68 seconds for numbers up to but not including 1e8.

Project Euler #25: Keep getting Overflow error (result to large) - is it to do with calculating fibonacci number?

I'm working on solving the Project Euler problem 25:
What is the first term in the Fibonacci sequence to contain 1000
digits?
My piece of code works for smaller digits, but when I try a 1000 digits, i get the error:
OverflowError: (34, 'Result too large')
I'm thinking it may be on how I compute the fibonacci numbers, but i've tried several different methods, yet i get the same error.
Here's my code:
'''
What is the first term in the Fibonacci sequence to contain 1000 digits
'''
def fibonacci(n):
phi = (1 + pow(5, 0.5))/2 #Golden Ratio
return int((pow(phi, n) - pow(-phi, -n))/pow(5, 0.5)) #Formula: http://bit.ly/qDumIg
n = 0
while len(str(fibonacci(n))) < 1000:
n += 1
print n
Do you know what may the cause of this problem and how i could alter my code avoid this problem?
Thanks in advance.

The problem here is that only integers in Python have unlimited length, floating point values are still calculated using normal IEEE types which has a maximum precision.
As such, since you're using an approximation, using floating point calculations, you will get that problem eventually.
Instead, try calculating the Fibonacci sequence the normal way, one number (of the sequence) at a time, until you get to 1000 digits.
ie. calculate 1, 1, 2, 3, 5, 8, 13, 21, 34, etc.
By "normal way" I mean this:
/ 1 , n < 3
Fib(n) = |
\ Fib(n-2) + Fib(n-1) , n >= 3
Note that the "obvious" approach given the above formulas is wrong for this particular problem, so I'll post the code for the wrong approach just to make sure you don't waste time on that:
def fib(n):
if n <= 3:
return 1
else:
return fib(n-2) + fib(n-1)
n = 1
while True:
f = fib(n)
if len(str(f)) >= 1000:
print("#%d: %d" % (n, f))
exit()
n += 1
On my machine, the above code starts going really slow at around the 30th fibonacci number, which is still only 6 digits long.
I modified the above recursive approach to output the number of calls to the fib function for each number, and here are some values:
#1: 1
#10: 67
#20: 8361
#30: 1028457
#40: 126491971
I can reveal that the first Fibonacci number with 1000 digits or more is the 4782th number in the sequence (unless I miscalculated), and so the number of calls to the fib function in a recursive approach will be this number:
1322674645678488041058897524122997677251644370815418243017081997189365809170617080397240798694660940801306561333081985620826547131665853835988797427277436460008943552826302292637818371178869541946923675172160637882073812751617637975578859252434733232523159781720738111111789465039097802080315208597093485915332193691618926042255999185137115272769380924184682248184802491822233335279409301171526953109189313629293841597087510083986945111011402314286581478579689377521790151499066261906574161869200410684653808796432685809284286820053164879192557959922333112075826828349513158137604336674826721837135875890203904247933489561158950800113876836884059588285713810502973052057892127879455668391150708346800909439629659013173202984026200937561704281672042219641720514989818775239313026728787980474579564685426847905299010548673623281580547481750413205269166454195584292461766536845931986460985315260676689935535552432994592033224633385680958613360375475217820675316245314150525244440638913595353267694721961
And that is just for the 4782th number. The actual value is the sum of all those values for all the fibonacci numbers from 1 up to 4782. There is no way this will ever complete.
In fact, if we would give the code 1 year of running time (simplified as 365 days), and assuming that the machine could make 10.000.000.000 calls every second, the algorithm would get as far as to the 83rd number, which is still only 18 digits long.

Actually, althought the advice given above to avoid floating-point numbers is generally good advice for Project Euler problems, in this case it is incorrect. Fibonacci numbers can be computed by the formula F_n = phi^n / sqrt(5), so that the first fibonacci number greater than a thousand digits can be computed as 10^999 < phi^n / sqrt(5). Taking the logarithm to base ten of both sides -- recall that sqrt(5) is the same as 5^(1/2) -- gives 999 < n log_10(phi) - 1/2 log_10(5), and solving for n gives (999 + 1/2 log_10(5)) / log_10(phi) < n. The left-hand side of that equation evaluates to 4781.85927, so the smallest n that gives a thousand digits is 4782.

You can use the sliding window trick to compute the terms of the Fibonacci sequence iteratively, rather than using the closed form (or doing it recursively as it's normally defined).
The Python version for finding fib(n) is as follows:
def fib(n):
a = 1
b = 1
for i in range(2, n):
b = a + b
a = b - a
return b
This works when F(1) is defined as 1, as it is in Project Euler 25.
I won't give the exact solution to the problem here, but the code above can be reworked so it keeps track of n until a sentry value (10**999) is reached.

An iterative solution such as this one has no trouble executing. I get the answer in less than a second.
def fibonacci():
current = 0
previous = 1
while True:
temp = current
current = current + previous
previous = temp
yield current
def main():
for index, element in enumerate(fibonacci()):
if len(str(element)) >= 1000:
answer = index + 1 #starts from 0
break
print(answer)

import math as m
import time
start = time.time()
fib0 = 0
fib1 = 1
n = 0
k = 0
count = 1
while k<1000 :
n = fib0 + fib1
k = int(m.log10(n))+1
fib0 = fib1
fib1 = n
count += 1
print n
print count
print time.time()-start
takes 0.005388 s on my pc. did nothing fancy just followed simple code.
Iteration will always be better. Recursion was taking to long for me as well.
Also used a math function for calculating the number of digits in a number instead of taking the number in a list and iterating through it. Saves a lot of time

Here is my very simple solution
list = [1,1,2]
for i in range(2,5000):
if len(str(list[i]+list[i-1])) == 1000:
print (i + 2)
break
else:
list.append(list[i]+list[i-1])
This is sort of a "rogue" way of doing it, but if you change the 1000 to any number except one, it gets it right.

You can use the datatype Decimal. This is a little bit slower but you will be able to have arbitrary precision.
So your code:
'''
What is the first term in the Fibonacci sequence to contain 1000 digits
'''
from Decimal import *
def fibonacci(n):
phi = (Decimal(1) + pow(Decimal(5), Decimal(0.5))) / 2 #Golden Ratio
return int((pow(phi, Decimal(n))) - pow(-phi, Decimal(-n)))/pow(Decimal(5), Decimal(0.5)))
n = 0
while len(str(fibonacci(n))) < 1000:
n += 1
print n

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.