sum of cubes using formula - python

Given
\sum_{k=0}^{n}k^3 = \frac{n^2(n+1)^2}{4}
I need to compute the left hand side. I should make a list of the first 1000 integer cubes and sum them.
Then I should compute the right hand side and to the same.
Also, I am supposed to compare the computational time for the above methods.
What I've done so far is:
import time
start = time.clock()
list = []
for n in range(0,1001):
list.append(n**3)
print(list)
print("List of the first 1000 integer cubes is:",list, "and their sum is:", sum(list))
stop = time.clock()
print("Computation time: ",stop-start, "seconds.")
a=0
for n in range (0,1001):
a=(int)(n*(n+1)/2)
print ("Sum of the first 1000 integer cubes is:",a*a)
First part for the left hand side works fine, but the problem is the right hand side.
When I type n=4, I will get the same result for the both sides, but problem occurs when n is big, because I get that one side is bigger than the other, i.e. they are not same.
Also, can you help me create a list for the right hand side, I've tried doing something like this:
a=[]
for n in range (0,10):
a.append(int)(n**2(n+1)**2/4)
But it doesn't work.
For the computational time, I think I am supposed to set one more timer for the right hand side, but then again I am not sure how to compare them.
Any advice is welcome,

Ok, I think what you tried in a for loop is completely wrong. If you want to prove the formula "the hard way" (non-inductive way it is) - you need to simply sum cubes of n first integers and compare it against your formula - calculated one time (not summed n times).
So evaluate:
n=100
s=0
for i in range(1,n+1):
s+=i**3
###range evaluates from 1 till n if you specify it till n+1
s_ind=(n**2)*((n+1)**2)/4
print(s==s_ind)
Otherwise if you are supposed to use induction - it's all on paper i.e. base condition a_1=1^3=((1)*(2)^2)/4=1 and step condition i.e. a_n-a_(n-1)=sum(i^3)[i=1,...,n]-sum(j^3)[i=1,...,n-1]=n^3 (and that it also holds assuming a_n=your formula here (which spoiler alert - it does ;) )
Hope this helps.

Related

Optimize permutations search loop (can't use itertools) that is extremely slow. Any suggestions?

This is a game where you have 12 cards and you pick you until you choose 3 from the same group. I am attempting to find the probability of choosing each group. The script that I have created works, but it is extremely slow. My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes. I am just trying to figure out why. Any ideas would be greatly appreciated.
from collections import Counter
import pandas as pd
from datetime import datetime
weight = pd.read_excel('V01Weights.xlsx')
Weight looks like the following:
Symb Weight
Grand 170000
Grand 170000
Grand 105
Major 170000
Major 170000
Major 215
Minor 150000
Minor 150000
Minor 12000
Bonus 105000
Bonus 105000
Bonus 105000
Max Picks represents the total number of different "cards". Total Picks represents the max number of user choices. This is because after 8 choices, you are guaranteed to have 2 of each type so on the 9th pick, you are guaranteed to have 3 matching.
TotalPicks = 9
MaxPicks = 12
This should have been named PickedProbabilities.
Picks = {0:0,1:0,2:0,3:0}
This is my simple version of the timeit class because I don't like the timeit class
def Time_It(function):
start =datetime.now()
x = function()
finish = datetime.now()
TotalTime = finish - start
Minutes = int(TotalTime.seconds/60)
Seconds = TotalTime.seconds % 60
print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')
return(x)
Given x(my picks in order) I find the probability. These picks are done without replacement
def Get_Prob(x,weight):
prob = 1
weights = weight.iloc[:,1]
for index in x:
num = weights[index]
denom = sum(weights)
prob *= num/denom
weights.drop(index, inplace = True)
# print(weights)
return(prob)
This is used to determine if there are duplicates in my loop because that is not allowed
def Is_Allowed(x):
return(len(x) == len(set(x)))
This determines if a win is present in all of the cards present thus far.
def Is_Win(x):
global Picks
WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
IsWin = False
for index,item in enumerate(WinTypes):
# print(index)
if set(item).issubset(set(x)):
IsWin = True
Picks[index] += Get_Prob(x,weight)
# print(Picks[index])
print(sum(Picks.values()))
break
return(IsWin)
This is my main function that cycles through all of the cards. I attempted to do this using recursion but I eventually gave up. I can't use itertools to create all of the permutations because for example [0,1,2,3,4] will be created by itertools but this is not possible because once you get 3 matching, the game ends.
def Cycle():
for a in range(MaxPicks):
x = [a]
for b in range(MaxPicks):
x = [a,b]
if Is_Allowed(x):
for c in range(MaxPicks):
x = [a,b,c]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for d in range(MaxPicks):
x = [a,b,c,d]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for e in range(MaxPicks):
x = [a,b,c,d,e]
if Is_Allowed(x):
if Is_Win(x):
continue
for f in range(MaxPicks):
x = [a,b,c,d,e,f]
if Is_Allowed(x):
if Is_Win(x):
continue
for g in range(MaxPicks):
x = [a,b,c,d,e,f,g]
if Is_Allowed(x):
if Is_Win(x):
continue
for h in range(MaxPicks):
x = [a,b,c,d,e,f,g,h]
if Is_Allowed(x):
if Is_Win(x):
continue
for i in range(MaxPicks):
if Is_Allowed(x):
if Is_Win(x):
continue
Calls the main function
x = Time_It(Cycle)
print(x)
writes the probabilities to a text file
with open('result.txt','w') as file:
# file.write(pickle.dumps(x))
for item in x:
file.write(str(item) + ',' + str(x[item]) + '\n')
My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes.
Two easy optimizations:
1) In-line the function calls like Is_Allowed() because Python have a lot of function call overhead (such as creating a new stackframe and argument tuples).
2) Run the code in using pypy which is really good at optimizing functions like this one.
Ok, this time I hope I got your problem right:)
There are two insights (I guess you have them, just for the sake of the completeness) needed in order to speed up your program algorithmically:
The probabilities for the sequence (card_1, card_2) and (card_2, card_1) are not equal, so we cannot use the results from the urn problem, and it looks like we need to try out all permutations.
However, given a set of cards we picked so far, we don't really need the information in which sequence they where picked - it is all the same for the future course of the game. So it is enough to use dynamic programming and calculate the probabilities for every subset to be traversed during the game (thus we need to check 2^N instead of N! states).
For a set of picked cards set the probability to pick a card i in the next turn is:
norm:=sum Wi for i in set
P(i|set)=Wi/norm if i not in set else 0.0
The recursion for calculating P(set) - the probability that a set of picked card occured during the game is:
set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set
However this should be done only for set_without_i for which the game not ended yet, i.e. no group has 3 cards picked.
This can be done by means of recursion+memoization or, as my version does, by using bottom-up dynamic programming. It also uses binary representation of integers for representations of sets and (most important part!) returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]:
#calculates probability to end the game with 3 cards of a type
N=12
#set representation int->list
def decode_set(encoded):
decoded=[False]*N
for i in xrange(N):
if encoded&(1<<i):
decoded[i]=True
return decoded
weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]
def get_probs(decoded_set):
denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]
def end_group(encoded_set):
for i in xrange(4):
whole_group = 7<<(3*i) #7=..000111, 56=00111000 and so on
if (encoded_set & whole_group)==whole_group:
return i
return None
#MAIN: dynamic program:
MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX
#we always start with the empty set:
probs[0]=1.0
#building bottom-up
for current_set in xrange(MAX):
if end_group(current_set) is None: #game not ended yet!
decoded_set=decode_set(current_set)
trans_probs=get_probs(decoded_set)
for i, is_set in enumerate(decoded_set):
if not is_set:
new_set=current_set | (1<<i)
probs[new_set]+=probs[current_set]*trans_probs[i]
#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
group_won=end_group(current_set)
if group_won is not None:
group_probs[group_won]+=probs[current_set]
print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)
Some explanation of the "tricks" used in code:
A pretty standard trick is to use integer's binary representation to encode a set. Let's say we have objects [a,b,c], so we could represent the set {b,c} as 110, which would mean a (first in the list corresponds to 0- the lowest digit) - not in the set, b(1) in the set, c(1) in the set. However, 110 read as integer it is 6.
The current_set - for loop simulates the game and best understood while playing. Let's play with two cards [a,b] with weights [2,1].
We start the game with an empty set, 0 as integer, so the probability vector (given set, its binary representation and as integer mapped onto probability):
probs=[{}=00=0->1.0, 01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]
We process the current_set=0, there are two possibilities 66% to take card a and 33% to take cardb, so the probabilities become after the processing:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]
Now we process the current_set=1={a} the only possibility is to take b so we will end with set {a,b}. So we need to update its (3={a,b}) probability via our formula and we get:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]
In the next step we process 2, and given set {b} the only possibility is to pick card a, so probability of set {a,b} needs to be updated again
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]
We can get to {a,b} on two different paths - this could be seen in our algorithm. The probability to go through set {a,b} at some point in our game is obviously 1.0.
Another important thing: all paths that leads to {a,b} are taken care of before we process this set (it would be the next step).
Edit: I misunderstood the original problem, the here presented solution is for the following problem:
Given 4 groups with 3 different cards with a different score for every card, we pick up cards as long as we don't have picked 3 cards from the same group. What is the expected score(sum of scores of picked cards) in the end of the game.
I leave the solution as it is, because it was such a joy to work it out after so many probability-theory-less years and I just cannot delete it:)
See my other answer for handling of the original problem
There are two possibilities to improve the performance: making the code faster (and before starting this, one should profile in order to know which part of the program should be optimized, otherwise the time is spent optimizing things that don't count) or improving the algorithm. I propose to do the second.
Ok, this problem seems to be more complex as at the first site. Let's start with some observations.
All you need to know is the expected number of the picked cards at the end of the game:
If Pi is the probability that the card i is picked somewhere during the game, then we are looking for the expected value of the score E(Score)=P1*W1+P2*W2+...Pn*Wn. However, if we look at the cards of a group, we can state that because of the symmetry the probabilities for the cards of this group are the same, e.g. P1=P2=P3=:Pgrand in your case. Thus our expectation can be calculated:
E(Score)=3*Pgrand*(W1+W2+W3)/3+...+3*Pbonus*(W10+W11+W12)/3
We call averageWgrand:=(W1+W2+W3)/3 and note that E(#grand)=3*Pgrand - the expected number of picked grand card at the end of the game. With this our formula becomes:
E(Score)=E(#grand)*averageWgrand+...+E(#bonus)*averageWbonus
In your example we can go even further: The number of cards in every group is equal, so because of the symmetry we can claim: E(#grand)=E(#minor)=E(#major)=E(#grand)=:(E#group). For the sake of simplicity, in the following we consider only this special case (but the outlined solution could be extended also to the general case). This lead to the following simplification:
E(Score)=4*E(#group)(averageWgrand+...+averageWbonus)/4
We call averageW:=(averageWgrand+...+averageWbonus)/4 and note that E(#cards)=4*E(#grand) is the expected number of picked card at the end of the game.
Thus, E(Score)=E(#cards)*averageW, so our task is reduced to calculating the expected value of the number of cards at the end of the game:
E(#cards)=P(1)*1+P(2)*2+...P(n)*n
where P(i) denotes the probability, that the game ends with exact i cards. The probabilities P(1),P(2) and P(k), k>9 are easy to see - they are 0.
Calculation of the probability of ending the game with i picked cards -P(i):
Let's play a slightly different game: we pick exactly i cards and win if and only if:
There is exactly one group with 3 cards picked. We call this group full_group.
The last picked (i-th) card was from the full_group.
It is easy to see, that the probability to win this game P(win) is exactly the probability we are looking for - P(i). Once again we can use the symmetry, because all groups are equal (P(win, full=grand) means the probability that we what and that the full_group=grand):
P(win)=P(win, grand)+P(win, minor)+P(win, major)+P(win, bonus)
=4*P(win, grand)
P(win, grand) is the probability that:
after picking i-1 cards the number of picked grand cards is 2, i.e. `#grand=2' and
after picking i-1 cards, for every group the number of picked cards is less than 3 and
we pick a grand-card in the last round. Given the first two constraints hold, this (conditional) probability is 1/(n-i+1) (there are n-i+1 cards left and only one of them is "right").
From the urn problem we know the probability for
P(#grand=u, #minor=x, #major=y, #bonus=z) = binom(3,u)*binom(3,x)*binom(3,y)*binom(3,z)/binom(12, u+x+y+z)
with binom(n,k)=n!/k!/(n-k)!. Thus P(win, grand) can be calculated as:
P(win, grand) = 1/(n-i+1)*sum P(#grand=2, #minor=x, #major=y, #bonus=z)
where x<=2, y<=2, z<=2 and 2+x+y+z=i-1
And now the code:
import math
def binom(n,k):
return math.factorial(n)//math.factorial(k)//math.factorial(n-k)
#expected number of cards:
n=12 #there are 12 cards
probs=[0]*n
for minor in xrange(3):
for major in xrange(3):
for bonus in xrange(3):
i = 3 + minor +major +bonus
P_urn = binom(3,2)*binom(3,minor)*binom(3,major)*binom(3,bonus)/float(binom(n, n-i+1))
P_right_last_card = 1.0/(n-i+1)
probs[i]+=4*P_urn*P_right_last_card #factor 4 from symmetry
print "Expected number of cards:", sum((prob*card_cnt for card_cnt, prob in enumerate(probs)))
As result I get 6.94285714286 as the expected number of cards in the end of the game. And very fast - almost instantly. Not sure whether it is right though...
Conclusion:
Obviously, if you like to handle a more general case (more groups, number cards in a group different) you have to extend the code (recursion, memoization of binom) and the theory.
But the most crucial part: with this approach you (almost) don't care in which order the cards were picked - and thus the number of states you have to inspect is down by factor of (k-1)! where k is the maximal possible number of cards in the end of the game. In your example k=9 and thus the approach is faster by factor 40000 (I don't even consider the speed-up from the exploited symmetry, because it might not be possible in general case).

Code finding the first triangular number with more than 500 divisors will not finish running

Okay, so I'm working on Euler Problem 12 (find the first triangular number with a number of factors over 500) and my code (in Python 3) is as follows:
factors = 0
y=1
def factornum(n):
x = 1
f = []
while x <= n:
if n%x == 0:
f.append(x)
x+=1
return len(f)
def triangle(n):
t = sum(list(range(1,n)))
return t
while factors<=500:
factors = factornum(triangle(y))
y+=1
print(y-1)
Basically, a function goes through all the numbers below the input number n, checks if they divide into n evenly, and if so add them to a list, then return the length in that list. Another generates a triangular number by summing all the numbers in a list from 1 to the input number and returning the sum. Then a while loop continues to generate a triangular number using an iterating variable y as the input for the triangle function, and then runs the factornum function on that and puts the result in the factors variable. The loop continues to run and the y variable continues to increment until the number of factors is over 500. The result is then printed.
However, when I run it, nothing happens - no errors, no output, it just keeps running and running. Now, I know my code isn't the most efficient, but I left it running for quite a bit and it still didn't produce a result, so it seems more likely to me that there's an error somewhere. I've been over it and over it and cannot seem to find an error.
I'd merely request that a full solution or a drastically improved one isn't given outright but pointers towards my error(s) or spots for improvement, as the reason I'm doing the Euler problems is to improve my coding. Thanks!
You have very inefficient algorithm.
If you ask for pointers rather than full solution, main pointers are:
There is a more efficient way to calculate next triangular number. There is an explicit formula in the wiki. Also if you generate sequence of all numbers it is just more efficient to add next n to the previous number. (Sidenote list in sum(list(range(1,n))) makes no sense to me at all. If you want to use this approach anyway, sum(xrange(1,n) will probably be much more efficient as it doesn't require materialization of the range)
There are much more efficient ways to factorize numbers
There is a more efficient way to calculate number of factors. And it is actually called after Euler: see Euler's totient function
Generally Euler project problems (as in many other programming competitions) are not supposed to be solvable by sheer brute force. You should come up with some formula and/or more efficient algorithm first.
As far as I can tell your code will work, but it will take a very long time to calculate the number of factors. For 150 factors, it takes on the order of 20 seconds to run, and that time will grow dramatically as you look for higher and higher number of factors.
One way to reduce the processing time is to reduce the number of calculations that you're performing. If you analyze your code, you're calculating n%1 every single time, which is an unnecessary calculation because you know every single integer will be divisible by itself and one. Are there any other ways you can reduce the number of calculations? Perhaps by remembering that if a number is divisible by 20, it is also divisible by 2, 4, 5, and 10?
I can be more specific, but you wanted a pointer in the right direction.
From the looks of it the code works fine, it`s just not the best approach. A simple way of optimizing is doing until the half the number, for example. Also, try thinking about how you could do this using prime factors, it might be another solution. Best of luck!
First you have to def a factor function:
from functools import reduce
def factors(n):
step = 2 if n % 2 else 1
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(pow(n,0.5) + 1)) if n % i
== 0)))
This will create a set and put all of factors of number n into it.
Second, use while loop until you get 500 factors:
a = 1
x = 1
while len(factors(a)) < 501:
x += 1
a += x
This loop will stop at len(factors(a)) = 500.
Simple print(a) and you will get your answer.

Sum of primes below 2,000,000 in python

I am attempting problem 10 of Project Euler, which is the summation of all primes below 2,000,000. I have tried implementing the Sieve of Erasthotenes using Python, and the code I wrote works perfectly for numbers below 10,000.
However, when I attempt to find the summation of primes for bigger numbers, the code takes too long to run (finding the sum of primes up to 100,000 took 315 seconds). The algorithm clearly needs optimization.
Yes, I have looked at other posts on this website, like Fastest way to list all primes below N, but the solutions there had very little explanation as to how the code worked (I am still a beginner programmer) so I was not able to actually learn from them.
Can someone please help me optimize my code, and clearly explain how it works along the way?
Here is my code:
primes_below_number = 2000000 # number to find summation of all primes below number
numbers = (range(1, primes_below_number + 1, 2)) # creates a list excluding even numbers
pos = 0 # index position
sum_of_primes = 0 # total sum
number = numbers[pos]
while number < primes_below_number and pos < len(numbers) - 1:
pos += 1
number = numbers[pos] # moves to next prime in list numbers
sum_of_primes += number # adds prime to total sum
num = number
while num < primes_below_number:
num += number
if num in numbers[:]:
numbers.remove(num) # removes multiples of prime found
print sum_of_primes + 2
As I said before, I am new to programming, therefore a thorough explanation of any complicated concepts would be deeply appreciated. Thank you.
As you've seen, there are various ways to implement the Sieve of Erasthotenes in Python that are more efficient than your code. I don't want to confuse you with fancy code, but I can show how to speed up your code a fair bit.
Firstly, searching a list isn't fast, and removing elements from a list is even slower. However, Python provides a set type which is quite efficient at performing both of those operations (although it does chew up a bit more RAM than a simple list). Happily, it's easy to modify your code to use a set instead of a list.
Another optimization is that we don't have to check for prime factors all the way up to primes_below_number, which I've renamed to hi in the code below. It's sufficient to just go to the square root of hi, since if a number is composite it must have a factor less than or equal to its square root.
We don't need to keep a running total of the sum of the primes. It's better to do that at the end using Python's built-in sum() function, which operates at C speed, so it's much faster than doing the additions one by one at Python speed.
# number to find summation of all primes below number
hi = 2000000
# create a set excluding even numbers
numbers = set(xrange(3, hi + 1, 2))
for number in xrange(3, int(hi ** 0.5) + 1):
if number not in numbers:
#number must have been removed because it has a prime factor
continue
num = number
while num < hi:
num += number
if num in numbers:
# Remove multiples of prime found
numbers.remove(num)
print 2 + sum(numbers)
You should find that this code runs in a a few seconds; it takes around 5 seconds on my 2GHz single-core machine.
You'll notice that I've moved the comments so that they're above the line they're commenting on. That's the preferred style in Python since we prefer short lines, and also inline comments tend to make the code look cluttered.
There's another small optimization that can be made to the inner while loop, but I let you figure that out for yourself. :)
First, removing numbers from the list will be very slow. Instead of this, make a list
primes = primes_below_number * True
primes[0] = False
primes[1] = False
Now in your loop, when you find a prime p, change primes[k*p] to False for all suitable k. (You wouldn't actually do multiply, you'd continually add p, of course.)
At the end,
primes = [n for n i range(primes_below_number) if primes[n]]
This should be a great deal faster.
Second, you can stop looking once your find a prime greater than the square root of primes_below_number, since a composite number must have a prime factor that doesn't exceed its square root.
Try using numpy, should make it faster. Replace range by xrange, it may help you.
Here's an optimization for your code:
import itertools
primes_below_number = 2000000
numbers = list(range(3, primes_below_number, 2))
pos = 0
while pos < len(numbers) - 1:
number = numbers[pos]
numbers = list(
itertools.chain(
itertools.islice(numbers, 0, pos + 1),
itertools.ifilter(
lambda n: n % number != 0,
itertools.islice(numbers, pos + 1, len(numbers))
)
)
)
pos += 1
sum_of_primes = sum(numbers) + 2
print sum_of_primes
The optimization here is because:
Removed the sum to outside the loop.
Instead of removing elements from a list we can just create another one, memory is not an issue here (I hope).
When creating the new list we create it by chaining two parts, the first part is everything before the current number (we already checked those), and the second part is everything after the current number but only if they are not divisible by the current number.
Using itertools can make things faster since we'd be using iterators instead of looping through the whole list more than once.
Another solution would be to not remove parts of the list but disable them like #saulspatz said.
And here's the fastest way I was able to find: http://www.wolframalpha.com/input/?i=sum+of+all+primes+below+2+million 😁
Update
Here is the boolean method:
import itertools
primes_below_number = 2000000
numbers = [v % 2 != 0 for v in xrange(primes_below_number)]
numbers[0] = False
numbers[1] = False
numbers[2] = True
number = 3
while number < primes_below_number:
n = number * 3 # We already excluded even numbers
while n < primes_below_number:
numbers[n] = False
n += number
number += 1
while number < primes_below_number and not numbers[number]:
number += 1
sum_of_numbers = sum(itertools.imap(lambda index_n: index_n[1] and index_n[0] or 0, enumerate(numbers)))
print(sum_of_numbers)
This executes in seconds (took 3 seconds on my 2.4GHz machine).
Instead of storing a list of numbers, you can instead store an array of boolean values. This use of a bitmap can be thought of as a way to implement a set, which works well for dense sets (there aren't big gaps between the values of members).
An answer on a recent python sieve question uses this implementation python-style. It turns out a lot of people have implemented a sieve, or something they thought was a sieve, and then come on SO to ask why it was slow. :P Look at the related-questions sidebar from some of them if you want more reading material.
Finding the element that holds the boolean that says whether a number is in the set or not is easy and extremely fast. array[i] is a boolean value that's true if i is in the set, false if not. The memory address can be computed directly from i with a single addition.
(I'm glossing over the fact that an array of boolean might be stored with a whole byte for each element, rather than the more efficient implementation of using every single bit for a different element. Any decent sieve will use a bitmap.)
Removing a number from the set is as simple as setting array[i] = false, regardless of the previous value. No searching, not comparison, no tracking of what happened, just one memory operation. (Well, two for a bitmap: load the old byte, clear the correct bit, store it. Memory is byte-addressable, but not bit-addressable.)
An easy optimization of the bitmap-based sieve is to not even store the even-numbered bytes, because there is only one even prime, and we can special-case it to double our memory density. Then the membership-status of i is held in array[i/2]. (Dividing by powers of two is easy for computers. Other values are much slower.)
An SO question:
Why is Sieve of Eratosthenes more efficient than the simple "dumb" algorithm? has many links to good stuff about the sieve. This one in particular has some good discussion about it, in words rather than just code. (Nevermind the fact that it's talking about a common Haskell implementation that looks like a sieve, but actually isn't. They call this the "unfaithful" sieve in their graphs, and so on.)
discussion on that question brought up the point that trial division may be fast than big sieves, for some uses, because clearing the bits for all multiples of every prime touches a lot of memory in a cache-unfriendly pattern. CPUs are much faster than memory these days.

Singpath Python Error. "Your code took too long to return."

I was playing around with the Singpath Python practice questions. And came across a simple question which asks the following:
Given an input of a list of numbers and a high number,
return the number of multiples
of each of those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each other.
I wrote this simple program, it ran perfectly fine:
"""
Given an input of a list of numbers and a high number,
return the number of multiples
of each of those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each other.
>>> countMultiples([3],30)
9
>>> countMultiples([3,5],100)
46
>>> countMultiples([3,5,7],30)
16
"""
def countMultiples(l, max):
j = []
for num in l:
i = 1
count = 0
while num * i < max:
if num * i not in j:
j.append(num * i)
i += 1
return len(j)
print countMultiples([3],30)
print countMultiples([3,5],100)
print countMultiples([3, 5, 7],30)
But when I try to run the same on SingPath, it gave me this error
Your code took too long to return.
Your solution may be stuck in an infinite loop. Please try again.
Has anyone experienced the same issues with Singpath?
I suspect the error you're getting means exactly what it says. For some input that the test program gives your function, it takes too long to return. I don't know anything about singpath myself, so I don't know exactly how long that might be. But I'd guess that they give you enough time to solve the problem if you use the best algorithm.
You can see for yourself that your code is slow if you pass in a very large max value. Try passing 10000 as max and you may end up waiting for a minute or two to get a result.
There are a couple of reasons your code is slow in these situations. The first is that you have a list of every multiple that you've found so far, and you are searching the list to see if the latest value has already been seen. Each search takes time proportional to the length of the list, so for the whole run of the function, it takes quadratic time (relative to the result value).
You could improve on this quite a lot by using a set instead of a list. You can test if an object is in a set in (amortized) constant time. But if j is a set, you don't actually need to test if a value is already in it before adding, since sets ignore duplicated values anyway. This means you can just add a value to the set without any care about whether it was there already.
def countMultiples(l, max):
j = set() # use a set object, rather than a list
for num in l:
i = 1
count = 0
while num * i < max:
j.add(num*i) # add items to the set unconditionally
i += 1
return len(j) # duplicate values are ignored, and won't be counted
This runs a fair amount faster than the original code, and max values of a million or more will return in a not too unreasonable time. But if you try values larger still (say, 100 million or a billion), you'll eventually still run into trouble. That's because your code uses a loop to find all the multiples, which takes linear time (relative to the result value). Fortunately, there is a better algorithm.
(If you want to figure out the better approach on your own, you might want to stop reading here.)
The better way is to use division to find how many times you can multiply each value to get a value less than max. The number of multiples of num that are strictly less than max is (max-1) // num (the -1 is because we don't want to count max itself). Integer division is much faster than doing a loop!
There is an added complexity though. If you divide to find the number of multiples, you don't actually have the multiples themselves to put in a set like we were doing above. This means that any integer that is a multiple of more than than one of our input numbers will be counted more than once.
Fortunately, there's a good way to fix this. We just need to count how many integers were over counted, and subtract that from our total. When we have two input values, we'll have double counted every integer that is a multiple of their least common multiple (which, since we're guaranteed that they're relatively prime, means their product).
If we have three values, We can do the same subtraction for each pair of numbers. But that won't be exactly right either. The integers that are multiples of all three of our input numbers will be counted three times, then subtracted back out three times as well (since they're multiples of the LCM of each pair of values). So we need to add a final value to make sure those multiples of all three values are included in the final sum exactly once.
import itertools
def countMultiples(numbers, max):
count = 0
for i, num in enumerate(numbers):
count += (max-1) // num # count multiples of num that are less than max
for a, b in itertools.combinations(numbers, 2):
count -= (max-1) // (a*b) # remove double counted numbers
if len(numbers) == 3:
a, b, c = numbers
count += (max-1) // (a*b*c) # add the vals that were removed too many times
return count
This should run in something like constant time for any value of max.
Now, that's probably as efficient as you need to be for the problem you're given (which will always have no more than three values). But if you wanted a solution that would work for more input values, you can write a general version. It uses the same algorithm as the previous version, and uses itertools.combinations a lot more to get different numbers of input values at a time. The number of products of the LCM of odd numbers of values get added to the count, while the number of products of the LCM of even numbers of values are subtracted.
import itertools
from functools import reduce
from operator import mul
def lcm(nums):
return reduce(mul, nums) # this is only correct if nums are all relatively prime
def countMultiples(numbers, max):
count = 0
for n in range(len(numbers)):
for nums in itertools.combinations(numbers, n+1):
count += (-1)**n * (max-1) // lcm(nums)
return count
Here's an example output of this version, which is was computed very quickly:
>>> countMultiples([2,3,5,7,11,13,17], 100000000000000)
81947464300342

Basic program generating random numbers Python

I want to create a program that will take a positive integer n, then create a list consisting of n random numbers between 0 and 9. Once I find that, I want to return the average of those numbers. Am I on the right track?
import random
def randomNumbers(n):
n > 0
myList = random.randrange(0,9)
I know I haven't gotten to the second part yet.. but is this how I start out?
This is my final code:
import random
def randomNumbers(n):
myList = []
for i in range(n):
myList.append(random.randrange(0, 9))
return sum(myList)/len(myList)
Could someone direct me to a page where it talks about a for and while loops and when to use them?
The equals sign (=) is only used to set the value of the "entire variable" (like myList). You want a list, which will then have values added to it, so start with this:
myList = []
to make it an empty list.
Now you had the right idea about repeating something while n > 0- but the ideal way is to use a for loop:
for i in range(n): # repeats the following line(s) of code n times
myList.append(random.randrange(0,9))
To find the average, take the sum and divide by the number of items (n).
If you are using Python 2.x, then note that an integer divided by another integer will round to an integer, so you should convert into a float (decimal) before division:
average = sum(myList)*1.0/n
For Python 3.x, int/int gives a float like you might expect:
average = sum(myList)/n
To return a value, use the return statement:
return average
It helps to break down what you're trying to do, if it's too complicated in a single explanation.
Take a positive integer, n
create a list of n length
each element should be a number 0-9
return the average of that list.
Let's tackle this one at a time:
take a positive integer, n
def randomNumbers(n):
if type(n)==int and n>0:
when you say def randomNumbers(n): you're saying
Hey, computer, this is a function called "randomNumbers",
and it's going to be called with a single thing in parentheses
and we're going to call that `n` from now on.
Here's what that function does:
But you haven't ensured that n is an integer greater than 0. It's just a thing. So we need to ensure that it is. if type(n)==int and n>0: does that by checking that the type of n is int and that n's value is positive (greater than 0).
That's the first bullet point done. Next is creating a list of n length with each element being an integer numbered 0-9. These should be random[1].
result=list()
for i in range(n):
result.append(random.randint(0,9))
This is less complicated than it seems. You start by creating a list that you'll call result (or whatever you want, I don't care). The for loop is just saying:
Hey, computer, you're gonna do the same thing a number of times,
and that number is "until you've counted from 0 to n times".
And when you do that, "i" is going to represent where you are
in that "0 to n" range. Here's what you're gonna do all those times:
So what are you going to do n times? You're going to append a random integer between 0 and 9 to that list we just called result. That's what result.append(random.randint(0,9)) does.
So next you want to return the average of that list. Well, the average of a list is the sum of the numbers in the list divided by the length.
return sum(result)/float(len(result))
sum and len can take any iterable, like a list, or a set, or whatever you please, and give you the sum, length, or whatever else you ask for. These are great tools to learn about. So do it. Elsewhere, preferably, frankly.
Now, if you've been really attentive, you'll see that so far we have this:
import random # I added this secretly, sorry!
def randomNumbers(n):
if type(n)==int and n>0:
result=list()
for i in range(n):
result.append(random.randint(0,9))
return sum(result)/float(len(result))
Which is great! That's what you need; try print(randomNumbers(5)) and you'll get exactly what you're looking for: the average of a bunch of random numbers.
But what if you try print(randomNumbers('foo'))? You probably get None, right? That's because it didn't fit in with the if statement we wrote, and so it never went down that path! You need to write an else statement so things that aren't positive integers still get some love.
else:
return "That's no good. Try a positive integer instead"
Stupid, but it works. Someday you'll learn about raising exceptions and all that smooth Jazz, but until then, it's fine just to say
Hey, computer, if n wasn't up to my stratospheric expectations,
then just spit back this message. I'll understand.
So at the end of the day you've got
import random
def randomNumbers(n):
if type(n)==int and n>0:
result=list()
for i in range(n):
result.append(random.randint(0,9))
return sum(result)/float(len(result))
else:
return "That's no good. Try a positive integer instead"
There's some stuff you can do with this code to make it more efficient. To make you curious, I'll give an example:
from random import randint
def randy(n:int):
return sum({randint(0,9) for i in range(n)})/n if n>0 else "Try again."
That gets into list comprehensions, sets, function annotations, and other stuff that you don't need to worry about. And frankly, that line of code might be too convoluted for good "Pythonic" code, but it's possible, which is cool.
Keep at it, and don't get discouraged. Python makes an effort to explain what you did wrong, and the community is generally pretty good if you've done your due diligence. Just don't abuse them(/us) by asking for help without seeking answers on your own first.
[1]technically they're pseudorandom but true randomness is hard to produce.
You are not on the right track. See this variate for the track.
import random
def randomNumbers(n):
l = random.sample(xrange(10), n)
return sum(l)/len(l)
Could be done more concisely with a list comprehension:
myList = [randrange(0, 9) for x in xrange(n)]
average = sum(myList) / len(myList)
You should think about making use of Numpy. This would make your task easy.
import numpy as np
def myfunc(n):
x = np.random.randint(low=0, high=10, size=n)
return np.mean(x)

Categories