How to further optimize calculating all the cross sums? - python

I had some spare time yesterday and somehow thought about calculating cross sums.
My goal is to calculate all the sums up to a given number n. Don't ask why - it's just for fun and to learn stuff.
So for n = 11 I want my result to look something like this: [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2]
This is my code:
def dynamicCheckSumList(upperLimit):
dynamicChecksumList = []
for i in range(0, 10):
dynamicChecksumList.append(i)
for i in range(10, upperLimit+1):
length = getIntegerPlaces(i)
size = 10**(length-1)
firstNumber = i // size
ancestor = i-(firstNumber*size)
newChecksum = firstNumber + dynamicChecksumList[ancestor]
dynamicChecksumList.append(newChecksum)
return dynamicChecksumList
At first I create my empty list and then populate the numbers 0-9 with their respective trivial sums.
Then I look at all numbers above 9 until the upper limit. Get their length. I then continue with finding out the first digit of the number. After that I calculate the number without that leading digit. For example: If my i is 5432 I will get 432. Since I already saved the cross sum for 432 I can just add that cross sum to my leading digit and I'm basically done.
def getIntegerPlaces(theNumber):
if theNumber <= 999999999999997:
return int(math.log10(theNumber)) + 1
else:
counter = 15
while theNumber >= 10**counter:
counter += 1
return counter
The second function is something I found here at a question of something asking on how to calculate the number of digits in a given number.
Is there any way in here (I guess there will be) to speed up things?
Also appreciated would be tips on how to save on memory. Just for fun I tried to set n to 1 billion. And my memory (16GB) kind of exploded ;)

def digitSums2(n):
n = (n + 9) // 10 * 10 # round up to a multiple of 10
result = bytearray(range(10))
for decade in range(1, n//10):
r_decade = result[decade]
for digit in range(10):
result.append(r_decade + digit)
return result
There are two primary differences are:
bytearray uses a single byte per calculated value, which saves a lot of memory. It only allows numbers up to 255, but it is sufficient for numbers that have less than 26 digits.
Peeling of the last digit is much easier then peeling of the first one.
This should be about as fast as possible in python. Be careful with printing results, since it can take more time than calculation itself (especially if you do in-memory copies).

Related

calculating the sum of digits in a list in python

I am new to math problems in python but basically i have the following code:
a = list(range(1,10000))
str(a)
sum_of_digits = sum(int(digit) for digit in str(a[9998]))
print(sum_of_digits)
this allows me to calculate the sum of the digits of a given number in the list a. but instead of feeding numbers into this one by one, i want an efficient way to calculate the sum of the digits of all the numbers in a and print them all out at once. I can't seem to figure out a solution but i know the answer is probably simple. any help is appreciated!
edit: i didnt know this post would get this much attention, for those wanting more clarification i basically want to know which digits in the list of range 1,9999 has a sum of 34 or more. i think everyone thought i simply wanted to take the sum of digits of each list element and then compile a total sum. in any case, that method helped me solve the actual problem
A good, straightforward way to do this is to use the modulo % operator, along with floor division \\:
total_sum = 0
for num in a:
sum_of_digits = 0
while (num != 0):
sum_of_digits = sum_of_digits + (num % 10)
num = num//10
total_sum = total_sum + sum_of_digits
print total_sum
Here, the expression n % 10 returns the remainder of dividing n by 10, or in other words, it returns the digit in the units place of that number. What the while loop is doing is repeatedly dividing the number by 10, then adding the number in the units place to the total.
Note that the \\ (floor division) is important here, as it gets rid of any decimal value in the number, which is needed for modulo % to work properly.
Note: This solution is massively more efficient than any algorithm which relies on str().
i want an efficient way to calculate the sum of the digits of all the numbers in a
If you truly want an efficient way, do not calculate the sum of the digit sum of all the individual numbers. Instead, calculate the total digit sum of the entire range1 at once.
For example, in the range up to and including 123, we do not have to write out all the individual numbers to see that the last digit will cycle through the numbers 1-9 a total of 12 times, plus the numbers 1-3 once. The middle digit cycles through 1-9 once, showing each 10 times, and then another 10 times 1 and 4 times 2. And for the first digit, only the 1 appears 24 times. Thus, the total is 45*12 + 1+2+3 + 45*10 + 10 + 8 + 24 = 1038.
You can put this into a recursive formula using "a bit" of modulo magic.
def dsum(n, f=1, p=1):
if n:
d, r = divmod(n, 10)
k = (45*d + sum(range(r)))*f + r*p
return dsum(d, f*10, p + f*r) + k
return 0
This yields the same results as the "naive" approach, but with a running time of O(log n) instead of O(n) it can be used to calculate the digit sum of ridiculously large ranges of numbers.
>>> n = 1234567
>>> sum(int(c) for i in range(1, n+1) for c in str(i))
32556016
>>> dsum(n)
32556016
>>> dsum(12345678901234567890)
1047782339654778234045
1) This is assuming your list is always a range of numbers starting at 1 up to some upper bound, although this would also work for a range not starting at 1 by calculating the digit sum for the upper bound and then subtracting the digit sum for the lower bound. If the list is not a range, then there's no way around calculating the digit sum for all the individual numbers, though.
Try this:
sum(int(i) for j in range(1,10000) for i in str(j))
It is the same, but works slowly:
lst = []
for j in range(10000):
for i in str(j):
lst.append(int(i))
print(sum(lst))

Euler Project problem #12 Python code gives weird results

I was trying to solve problem number 12 of Project Euler. This is the problem:
The sequence of triangle numbers is generated by adding the natural
numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7
= 28. The first ten terms would be:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
Let us list the factors of the first seven triangle numbers:
1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28
We can see that 28 is the first triangle number to have over five
divisors.
What is the value of the first triangle number to have over five
hundred divisors?
I defined two functions to do the job:
1) allfactor(x): This gives us all the factors of a given number in a list form. Example: allfactor(10) gives us [1, 2, 5, 10]
2)TriangularNo(x): This gives us the nth Triangular number. Example TriangularNo(5) gives us 5
Here is the complete code which I wrote:
facs=[]
def allfacof(x):
for i in range(1,int(x/2)+1):
if x%i==0:
facs.append(i)
else:
pass
facs.append(x)
return(facs)
def TriangularNo(x):
no=0
for i in range(1,x+1):
no=no+i
return(no)
a=0 # a will tell us the number of iterations
while True:
a+=1
N=TriangularNo(a)
length=(len(allfacof(N)))
if int(length)>=500:
print(N)
break
else:
pass
When I run this code I get 1378 as the output which is clearly wrong because len(allfacof(1378)) turns out to be 8 and not 500 as demanded in the question.
Notice in the while loop, I use if int(length)>=500: So this means that when my code runs, length somehow gets the value = 500 but when I run the function separately it says that it's length is 8.
I am just not able to find out the error. Please help me
The problem is you are using facs as a global variable and you are only appending to the item. You should make it a member of allfacof() so that it clears out after each value.
If you look into facs then you will find it equals
1, 1, 3, 1, 2, 3, 6, 1, 2, 5, 10 ...
Although moving facs into all_factors_of() solves your immediate problem, the next problem with this code is performance. Let's consider triangle number generation first. The optimization that #Voo suggests:
def TriangularNo(n):
return n * (n + 1) / 2
is fine if we're looking for arbitrary triangle numbers -- but we're not. We're looking for sequential triangle numbers, so in this case the formula slows down our code! When going sequentially, you only need do a couple of additions to get the next triangle number -- but using the formula, you need to do an addition, a multiplication and a division! More expensive if you're going sequentially. Since we are going sequentially, this seems a perfect use of a Python generator:
def triangular_number_generator():
triangle = number = 1
while True:
yield triangle
number += 1
triangle += number
Which makes clear the two additions needed to get to the next triangle number. Now let's consider your factorization function:
Your factorization function loses performance in the way that it produces factors in order. But we're only concerned with the number of factors, order doesn't matter. So when we factor 28, we can add 1 and 28 to the factors list at the same time. Ditto 2 and 14 -- making 14 our new upper limit. Similarly 4 and 7 where 7 becomes the new upper limit. So we collect factors faster and quickly reduce the upper limit that we need to check. Here's the rest of the code:
def factors_of(number):
divisor = 1
limit = number
factors = []
while divisor <= limit:
if number % divisor == 0:
factors.append(divisor)
remainder = number // divisor
if remainder != divisor:
factors.append(remainder)
limit = remainder - 1
divisor += 1
return factors
triangular = triangular_number_generator()
number = next(triangular)
factors = factors_of(number)
while len(factors) <= 200:
number = next(triangular)
factors = factors_of(number)
print(number)
How does it compare? If we run your fixed code with a lower limit of > 200 factors, it takes about a minute to come up with the answer (2031120). The above code takes about 1/3 of a second. Now consider how long it'll take both to reach > 500 factors. Finally, to meet the stated goal:
What is the value of the first triangle number to have over five
hundred divisors?
this comparison in your original code:
if int(length)>=500:
would instead need to be:
if length > 500:
Though the way the count of factors jumps, it makes no difference for 500. But for smaller limits, for testing, it can make a difference.

Counting how many times I used one equation until it reaches a specific result in python

import math
#entrada
x=int(input("Put a number here:"))
#processo
num1=int(math.sqrt(x))
num2=round(num1,0)
num3=num2**2
remaining=x-num3
#saída
print("The remaining is:",remaining)
I made this code to get the remaining of a perfect square, for any "int" number, now I want to improve on this code so it keeps doing the equation using the last answer(stored in "remaining")is "0"(and stop calculating after that).
And then, after it finishes calculating, I want to count how many times I used the equation until it reaches "0"
I know this is hard to understand put ill try with one example:
For the number 87, the remaining will be 6 in this code because 87-(9²)=6, I want to use this result(6)and make the same equation and then the result will be (2) because 6-(2²)=2, then doing it again (1), then stop once it returns (0).
After that, I want to count how many times the equation was used to reach (0), in this example, it would be 4 (87 6)(6 2)(2 1)(1 0). And I want to print that counting...in this case(4)
I know it is a lot to ask to help me in this(it's a big request), but I'm just staring programming now(10 days ago)and I couldn't find what I wanted anywhere else in the internet. Thanks for the help. Also, if there is any way to make mine original code better tell me please.
I think you need something like this:
def count_squares(x):
count = 0
remaining = 1
while remaining:
min_square = (int(x**0.5) // 1) **2
remaining = x - min_square
count +=1
print('X = {}, remaining = {}, count = {}'.format(x, remaining, count))
x = remaining
return count
print(count_squares(87))
Explaining:
** operator — for exponentiation.
// operator — for floor division. In this case it is similiar to "int" and "round" bound, that you used for calculating num2, because "//1" will throw away all digits after dot. By the way int() function is not necessary in this case at all.
we will exit from while loop as soon as remaining value will be equal to zero, because zero integer value is interpreted as false.
format is method of strings used to do formatting(surprisingly). All {} will be filled with arguments passed to the "format" method. There are other ways to do formatting in python( % operator, and formatted strings).
Output is:
X = 87, remaining = 6, Count = 1
X = 6, remaining = 2, Count = 2
X = 2, remaining = 1, Count = 3
X = 1, remaining = 0, Count = 4
4

Project Euler 104: Need help in understanding the solution

Project Euler Q104 (https://projecteuler.net/problem=104) is as such:
The Fibonacci sequence is defined by the recurrence relation:
Fn = Fn−1 + Fn−2, where F1 = 1 and F2 = 1. It turns out that F541,
which contains 113 digits, is the first Fibonacci number for which the
last nine digits are 1-9 pandigital (contain all the digits 1 to 9,
but not necessarily in order). And F2749, which contains 575 digits,
is the first Fibonacci number for which the first nine digits are 1-9
pandigital.
Given that Fk is the first Fibonacci number for which the first nine
digits AND the last nine digits are 1-9 pandigital, find k.
And I wrote this simple code in Python:
def fibGen():
a,b = 1,1
while True:
a,b = b,a+b
yield a
k = 0
fibG = fibGen()
while True:
k += 1
x = str(fibG.next())
if (set(x[-9:]) == set("123456789")):
print x #debugging print statement
if(set(x[:9]) == set("123456789")):
break
print k
However, it was taking well.. forever.
After leaving it running for 30 mins, puzzled, I gave up and checked the solution.
I came across this code in C#:
long fn2 = 1;
long fn1 = 1;
long fn;
long tailcut = 1000000000;
int n = 2;
bool found = false;
while (!found) {
n++;
fn = (fn1 + fn2) % tailcut;
fn2 = fn1;
fn1 = fn;
if (IsPandigital(fn)) {
double t = (n * 0.20898764024997873 - 0.3494850021680094);
if (IsPandigital((long)Math.Pow(10, t - (long)t + 8)))
found = true;
}
}
Which.. I could barely understand. I tried it out in VS, got the correct answer and checked the thread for help.
I found these two, similar looking answers in Python then.
One here, http://blog.dreamshire.com/project-euler-104-solution/
And one from the thread:
from math import sqrt
def isPandigital(s):
return set(s) == set('123456789')
rt5=sqrt(5)
def check_first_digits(n):
def mypow( x, n ):
res=1.0
for i in xrange(n):
res *= x
# truncation to avoid overflow:
if res>1E20: res*=1E-10
return res
# this is an approximation for large n:
F = mypow( (1+rt5)/2, n )/rt5
s = '%f' % F
if isPandigital(s[:9]):
print n
return True
a, b, n = 1, 1, 1
while True:
if isPandigital( str(a)[-9:] ):
print a
# Only when last digits are
# pandigital check the first digits:
if check_first_digits(n):
break
a, b = b, a+b
b=b%1000000000
n += 1
print n
These worked pretty fast, under 1 minute!
I really need help understanding these solutions. I don't really know the meaning or the reason behind using stuff like log. And though I could easily do the first 30 questions, I cannot understand these tougher ones.
How is the best way to solve this question and how these solutions are implementing it?
These two solutions work on the bases that as fibonacci numbers get bigger, the ratio between two consecutive terms gets closer to a number known as the Golden Ratio, (1+sqrt(5))/2, roughly 1.618. If you have one (large) fibonacci number, you can easily calculate the next, just by multiplying it by that number.
We know from the question that only large fibonacci numbers are going to satisfy the conditions, so we can use that to quickly calculate the parts of the sequence we're interested in.
In your implementation, to calculate fib(n), you need to calculate fib(n-1), which needs to calculate fib(n-2) , which needs to calculate fib(n-3) etc, and it needs to calculate fib(n-2), which calculates fib(n-3) etc. That's a huge number of function calls when n is big. Having a single calculation to know what number comes next is a huge speed increase. A computer scientist would call the first method O(n^2)*: to calculate fib(n), you need n^2 sub calculations. Using the golden mean, the fibonacci sequence becomes (approximately, but close enouigh for what we need):
(using phi = (1+sqrt(5))/2)
1
1*phi
1*phi*phi = pow(phi, 2)
1*phi*phi*phi = pow(phi, 3)
...
1*phi*...*phi = pow(phi, n)
\ n times /
So, you can do an O(1) calculation: fib(n): return round(pow(golden_ratio, n)/(5**0.5))
Next, there's a couple of simplifications that let you use smaller numbers.
If I'm concerned about the last nine digits of a number, what happens further up isn't all that important, so I can throw anything after the 9th digit from the right away. That's what b=b%1000000000 or fn = (fn1 + fn2) % tailcut; are doing. % is the modulus operator, which says, if I divide the left number by the right, what's the remainder?
It's easiest to explain with equivalent code:
def mod(a,b):
while a > b:
a -= b
return a
So, there's a quick addition loop that adds together the last nine digits of fibonacci numbers, waiting for them to be pandigital. If it is, it calculates the whole value of the fibonacci number, and check the first nine digits.
Let me know if I need to cover anything in more detail.
* https://en.wikipedia.org/wiki/Big_O_notation

Project Euler #2 in Python

Background
I am stuck on this problem:
Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
I tried to discover if the issue was my Fibonacci number generator, the code which gets the even numbers, or even the way that I add the numbers to no avail.
Code
I decided to store the numbers in lists. Here, I create them.
list_of_numbers = [] #Holds all the fibs
even_fibs = [] #Holds only even fibs
Then, I created my generator. This is a potential area of issues.
x,y = 0,1 #sets x to 0, y to 1
while x+y <= 4000000: #Gets numbers till 4 million
list_of_numbers.append(y)
x, y = y, x+y #updates the fib sequence
Then, I created some code to check if a number is even, and then add it to the even_fibs list. This is another weakpoint in the code.
coord = 0
for number in range(len(list_of_numbers)):
test_number = list_of_numbers [coord]
if (test_number % 2) == 0:
even_fibs.append(test_number)
coord+=1
Lastly, I display the information.
print "Normal: ", list_of_numbers #outputs full sequence
print "\nEven Numbers: ", even_fibs #outputs even numbers
print "\nSum of Even Numbers: ", sum(even_fibs) #outputs the sum of even numbers
Question
I know that this is a terrible way to ask a question, but what is wrong? Please don't give me the answer - just point out the problematic section.
You're stopping when the sum of the next two values in the sequence is greater than 4,000,000. You're meant to consider all values in the sequence up to 4,000,000.

Categories