Finding an interval of 100 consecutive composite numbers with Python

Finding an interval of 100 consecutive composite numbers with Python - python

I'm trying to write a program that can determine an integer A as such as there are no prime numbers between A and A+100 ...
Unfortunately, with my mediocre Python skills, this is all I managed to write:
for A in range (1,1000000):
if is_prime(n)==False in range (A,A+3):
print(A)
As you can see, I first tried to get it working with an interval of only 2 consecutive composite numbers. I also used a (working) function "is_prime" that determines if an integer is prime or not.
You can start yelling at me for my incompetence !

I recommend a sieve-style operation for performance reasons. Create a list of X numbers, mark all primes, then look for an unbroken sequence of composite numbers.

You're in the right ballpark. Just gotta finish the rest of the list comprehension.
for A in range (1,1000000):
if all(is_prime(n)==False for n in range (A,A+3)):
print(A)
style nitpick: not is_prime(n) would be preferable to is_prime(n) == False.

Related

Algorithm-finding-dedicated-sum-from-the-population-of-variables

I need a way of finding an exact value made of the sum of variables chosen from the population. The algorithm can find just the first solution or all. So we can have 10, 20, or 30 different numbers and we will sum some of them to get a desirable number. As an example we have a population of the below numbers: -2,-1,1,2,3,5,8,10 and we try to get 6 - this can be made of 8 and -2, 1 + 5 etc. I need at least 2 decimal places to consider as well for accuracy and ideally, the sum of variables will be exact to the asking value.
Thanks for any advice and help on this:)
I build a model Using the simplex method in Excel but I need the solution in Python.

This is the subset sum problem, which is an NP Complete problem.
There is a known pseudo-polynomial solution for it, if the numbers are integers. In your case, you need to consider numbers only to 2nd decimal point, so you could convert the problem into integers by multiplying by 1001, and then run the pseudo-polynomial algorithm.
It will works quite nicely and efficiently - if the range of numbers you have is quite small (Complexity is O(n*W), where W is the sum of numbers in absolute value).
Appendix:
Pseudo polynomial time solution is Dynamic Programming adaptation of the following recursive formula:
k is the desired number
n is the total number of elements in list.
// stop clause: Found a sum
D(k, i) = true | for all 0 <= i < n
// Stop clause: failing attempt, cannot find sum in this branch.
D(x, n) = false | x != k
// Recursive step, either take the current element or skip it.
D(x, i) = D(x + arr[i], i+1) OR D(x, i+1)
Start from D(0,0)
If this is not the case, and the range of numbers is quite high, you might have to go with brute force solution, of checking all possible subsets. This solution is of course exponential, and processing it is in O(2^n) .
(1) Consider rounding if needed, but that's a simple preprocessing that doesn't affect the answer.

String divisibility without search tables?

Given two strings s & t, determine if s is divisible by t. For
example: "abab" is divisible by "ab" But "ababab" is not divisible by
"abab". If it isn't divisible, return -1. If it is, return the length
of the smallest common divisor: So, for "abababab" and "abab", return
2 as s is divisible by t and the smallest common divisor is "ab" with
length 2.
The way I thought it through was: I define the lengths of these two strings, find the greatest common divisor of these two. If t divides s, then the smallest common divisor is just the smallest divisor of t. And then one can use this algorithm: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.
Is there any simpler solution?

To test for divisibility:
test that the length of s is a multiple of the length of t (otherwise not divisible);
divide s into chunks of length t; and
check that all the chunks are the same.
To find the smallest common divisor, you need to find the shortest repeating substring of t that makes up the whole of t. One approach is:
Find the factors of the length of t (the crude approach of searching from 1 up to sqrt(len(t)) should be fine for strings of any reasonable length);
For each factor (start with the smallest):
i. divide t into chunks of length factor;
ii. check if all the chunks are the same, and return factor if they are.
Using a Python set is a neat way to check if all the chunks in a list are equal. len(set(chunks)) == 1 tells you they are.

The GCD of the string lengths doesn't help you.
A string t "divides" a string s if len(s) is a multiple of len(t) and s == t * (len(s)//len(t)).
As for finding the length of the smallest divisor of t, the classic trick for that is (t+t).index(t, 1): find the first nonzero index of t in t+t. I've used the built-in index method here, but depending on why you're doing this and what runtime properties you want, KMP string search may be a better choice. Manually implementing KMP in Python is going to have a massive constant-factor overhead, but it'll have a much better worst-case runtime than index.
KMP isn't particularly hard to implement, as far as string searches go. You're not going to get anything significantly "simpler" without delegating to a library routine or sacrificing asymptotic complexity properties (or both).
If you want to avoid a search table and keep asymptotic complexity guarantees, you can use something like two-way string matching, but it's not going to be any easier to implement than KMP.

Code finding the first triangular number with more than 500 divisors will not finish running

Okay, so I'm working on Euler Problem 12 (find the first triangular number with a number of factors over 500) and my code (in Python 3) is as follows:
factors = 0
y=1
def factornum(n):
x = 1
f = []
while x <= n:
if n%x == 0:
f.append(x)
x+=1
return len(f)
def triangle(n):
t = sum(list(range(1,n)))
return t
while factors<=500:
factors = factornum(triangle(y))
y+=1
print(y-1)
Basically, a function goes through all the numbers below the input number n, checks if they divide into n evenly, and if so add them to a list, then return the length in that list. Another generates a triangular number by summing all the numbers in a list from 1 to the input number and returning the sum. Then a while loop continues to generate a triangular number using an iterating variable y as the input for the triangle function, and then runs the factornum function on that and puts the result in the factors variable. The loop continues to run and the y variable continues to increment until the number of factors is over 500. The result is then printed.
However, when I run it, nothing happens - no errors, no output, it just keeps running and running. Now, I know my code isn't the most efficient, but I left it running for quite a bit and it still didn't produce a result, so it seems more likely to me that there's an error somewhere. I've been over it and over it and cannot seem to find an error.
I'd merely request that a full solution or a drastically improved one isn't given outright but pointers towards my error(s) or spots for improvement, as the reason I'm doing the Euler problems is to improve my coding. Thanks!

You have very inefficient algorithm.
If you ask for pointers rather than full solution, main pointers are:
There is a more efficient way to calculate next triangular number. There is an explicit formula in the wiki. Also if you generate sequence of all numbers it is just more efficient to add next n to the previous number. (Sidenote list in sum(list(range(1,n))) makes no sense to me at all. If you want to use this approach anyway, sum(xrange(1,n) will probably be much more efficient as it doesn't require materialization of the range)
There are much more efficient ways to factorize numbers
There is a more efficient way to calculate number of factors. And it is actually called after Euler: see Euler's totient function
Generally Euler project problems (as in many other programming competitions) are not supposed to be solvable by sheer brute force. You should come up with some formula and/or more efficient algorithm first.

As far as I can tell your code will work, but it will take a very long time to calculate the number of factors. For 150 factors, it takes on the order of 20 seconds to run, and that time will grow dramatically as you look for higher and higher number of factors.
One way to reduce the processing time is to reduce the number of calculations that you're performing. If you analyze your code, you're calculating n%1 every single time, which is an unnecessary calculation because you know every single integer will be divisible by itself and one. Are there any other ways you can reduce the number of calculations? Perhaps by remembering that if a number is divisible by 20, it is also divisible by 2, 4, 5, and 10?
I can be more specific, but you wanted a pointer in the right direction.

From the looks of it the code works fine, it`s just not the best approach. A simple way of optimizing is doing until the half the number, for example. Also, try thinking about how you could do this using prime factors, it might be another solution. Best of luck!

First you have to def a factor function:
from functools import reduce
def factors(n):
step = 2 if n % 2 else 1
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(pow(n,0.5) + 1)) if n % i
== 0)))
This will create a set and put all of factors of number n into it.
Second, use while loop until you get 500 factors:
a = 1
x = 1
while len(factors(a)) < 501:
x += 1
a += x
This loop will stop at len(factors(a)) = 500.
Simple print(a) and you will get your answer.

Sum of primes below 2,000,000 in python

I am attempting problem 10 of Project Euler, which is the summation of all primes below 2,000,000. I have tried implementing the Sieve of Erasthotenes using Python, and the code I wrote works perfectly for numbers below 10,000.
However, when I attempt to find the summation of primes for bigger numbers, the code takes too long to run (finding the sum of primes up to 100,000 took 315 seconds). The algorithm clearly needs optimization.
Yes, I have looked at other posts on this website, like Fastest way to list all primes below N, but the solutions there had very little explanation as to how the code worked (I am still a beginner programmer) so I was not able to actually learn from them.
Can someone please help me optimize my code, and clearly explain how it works along the way?
Here is my code:
primes_below_number = 2000000 # number to find summation of all primes below number
numbers = (range(1, primes_below_number + 1, 2)) # creates a list excluding even numbers
pos = 0 # index position
sum_of_primes = 0 # total sum
number = numbers[pos]
while number < primes_below_number and pos < len(numbers) - 1:
pos += 1
number = numbers[pos] # moves to next prime in list numbers
sum_of_primes += number # adds prime to total sum
num = number
while num < primes_below_number:
num += number
if num in numbers[:]:
numbers.remove(num) # removes multiples of prime found
print sum_of_primes + 2
As I said before, I am new to programming, therefore a thorough explanation of any complicated concepts would be deeply appreciated. Thank you.

As you've seen, there are various ways to implement the Sieve of Erasthotenes in Python that are more efficient than your code. I don't want to confuse you with fancy code, but I can show how to speed up your code a fair bit.
Firstly, searching a list isn't fast, and removing elements from a list is even slower. However, Python provides a set type which is quite efficient at performing both of those operations (although it does chew up a bit more RAM than a simple list). Happily, it's easy to modify your code to use a set instead of a list.
Another optimization is that we don't have to check for prime factors all the way up to primes_below_number, which I've renamed to hi in the code below. It's sufficient to just go to the square root of hi, since if a number is composite it must have a factor less than or equal to its square root.
We don't need to keep a running total of the sum of the primes. It's better to do that at the end using Python's built-in sum() function, which operates at C speed, so it's much faster than doing the additions one by one at Python speed.
# number to find summation of all primes below number
hi = 2000000
# create a set excluding even numbers
numbers = set(xrange(3, hi + 1, 2))
for number in xrange(3, int(hi ** 0.5) + 1):
if number not in numbers:
#number must have been removed because it has a prime factor
continue
num = number
while num < hi:
num += number
if num in numbers:
# Remove multiples of prime found
numbers.remove(num)
print 2 + sum(numbers)
You should find that this code runs in a a few seconds; it takes around 5 seconds on my 2GHz single-core machine.
You'll notice that I've moved the comments so that they're above the line they're commenting on. That's the preferred style in Python since we prefer short lines, and also inline comments tend to make the code look cluttered.
There's another small optimization that can be made to the inner while loop, but I let you figure that out for yourself. :)

First, removing numbers from the list will be very slow. Instead of this, make a list
primes = primes_below_number * True
primes[0] = False
primes[1] = False
Now in your loop, when you find a prime p, change primes[k*p] to False for all suitable k. (You wouldn't actually do multiply, you'd continually add p, of course.)
At the end,
primes = [n for n i range(primes_below_number) if primes[n]]
This should be a great deal faster.
Second, you can stop looking once your find a prime greater than the square root of primes_below_number, since a composite number must have a prime factor that doesn't exceed its square root.

Try using numpy, should make it faster. Replace range by xrange, it may help you.

Here's an optimization for your code:
import itertools
primes_below_number = 2000000
numbers = list(range(3, primes_below_number, 2))
pos = 0
while pos < len(numbers) - 1:
number = numbers[pos]
numbers = list(
itertools.chain(
itertools.islice(numbers, 0, pos + 1),
itertools.ifilter(
lambda n: n % number != 0,
itertools.islice(numbers, pos + 1, len(numbers))
)
)
)
pos += 1
sum_of_primes = sum(numbers) + 2
print sum_of_primes
The optimization here is because:
Removed the sum to outside the loop.
Instead of removing elements from a list we can just create another one, memory is not an issue here (I hope).
When creating the new list we create it by chaining two parts, the first part is everything before the current number (we already checked those), and the second part is everything after the current number but only if they are not divisible by the current number.
Using itertools can make things faster since we'd be using iterators instead of looping through the whole list more than once.
Another solution would be to not remove parts of the list but disable them like #saulspatz said.
And here's the fastest way I was able to find: http://www.wolframalpha.com/input/?i=sum+of+all+primes+below+2+million 😁
Update
Here is the boolean method:
import itertools
primes_below_number = 2000000
numbers = [v % 2 != 0 for v in xrange(primes_below_number)]
numbers[0] = False
numbers[1] = False
numbers[2] = True
number = 3
while number < primes_below_number:
n = number * 3 # We already excluded even numbers
while n < primes_below_number:
numbers[n] = False
n += number
number += 1
while number < primes_below_number and not numbers[number]:
number += 1
sum_of_numbers = sum(itertools.imap(lambda index_n: index_n[1] and index_n[0] or 0, enumerate(numbers)))
print(sum_of_numbers)
This executes in seconds (took 3 seconds on my 2.4GHz machine).

Instead of storing a list of numbers, you can instead store an array of boolean values. This use of a bitmap can be thought of as a way to implement a set, which works well for dense sets (there aren't big gaps between the values of members).
An answer on a recent python sieve question uses this implementation python-style. It turns out a lot of people have implemented a sieve, or something they thought was a sieve, and then come on SO to ask why it was slow. :P Look at the related-questions sidebar from some of them if you want more reading material.
Finding the element that holds the boolean that says whether a number is in the set or not is easy and extremely fast. array[i] is a boolean value that's true if i is in the set, false if not. The memory address can be computed directly from i with a single addition.
(I'm glossing over the fact that an array of boolean might be stored with a whole byte for each element, rather than the more efficient implementation of using every single bit for a different element. Any decent sieve will use a bitmap.)
Removing a number from the set is as simple as setting array[i] = false, regardless of the previous value. No searching, not comparison, no tracking of what happened, just one memory operation. (Well, two for a bitmap: load the old byte, clear the correct bit, store it. Memory is byte-addressable, but not bit-addressable.)
An easy optimization of the bitmap-based sieve is to not even store the even-numbered bytes, because there is only one even prime, and we can special-case it to double our memory density. Then the membership-status of i is held in array[i/2]. (Dividing by powers of two is easy for computers. Other values are much slower.)
An SO question:
Why is Sieve of Eratosthenes more efficient than the simple "dumb" algorithm? has many links to good stuff about the sieve. This one in particular has some good discussion about it, in words rather than just code. (Nevermind the fact that it's talking about a common Haskell implementation that looks like a sieve, but actually isn't. They call this the "unfaithful" sieve in their graphs, and so on.)
discussion on that question brought up the point that trial division may be fast than big sieves, for some uses, because clearing the bits for all multiples of every prime touches a lot of memory in a cache-unfriendly pattern. CPUs are much faster than memory these days.

Determine whether N is fibonacci or not, if not find the largest fibonacci number smaller than N

How can I determine whether a given number N is a fibonacci number or not, if that number is not a fibonacci number how can I*determine the largest fibonacci number smaller than N?
I found the solution via generating the series of fibonacci number with limit N.
Is there any better way to do this in Python?
guys consider while DOWN VOTES, I've accepted the solution provided here. I do not think it as worthwhile since I have posted what I need and got the solution from you guys.
Thank You.

A simple test for whether some integer N is a Fibonacci number is as follows:
N is a Fibonacci number iff either (5 * n^2 + 4) or (5 * n^2 - 4) is a square number.
See here for the ingenious proof (page 417): http://www.fq.math.ca/Scanned/10-4/advanced10-4.pdf
If it turns out that N is not a Fibonacci number, then the simplest method is just to keep trying with smaller numbers until you find one, although this could take a very long time for large N.

Here's a general algorithm:
The naive way is to solve it with recursion - but in terms of run time complexity it's not useful at all.
Create a new array, let's call it FibArr .
Insert 1,1, to the the array.
Then , the value of the i'th index in the array is fibArr[i-1]+fibArr[i-2] (i>=3)
In every iteration check whether the new value inserted into fibArr==N.
If true , return.
Else, check whether the inserted value is bigger then N.
If true , assuming now fibArr has k values, return the (k-1) value.
Else , keep iterating :)
*With python it's even easier to do - but notice that in python there are no arrays , but lists.
It's easier with python becuase you don't have to set the list length, like in java.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.