Python Set Comprehension - python

So I have these two problems for a homework assignment and I'm stuck on the second one.
Use a Python Set Comprehension (Python's equivalent of Set Builder notation) to generate a set of all of the prime numbers that are less than 100. Recall that a prime number is an integer that is greater than 1 and not divisible by any integer other than itself and 1. Store your set of primes in a variable (you will need it for additional parts). Output your set of primes (e.g., with the print function).
Use a Python Set Comprehension to generate a set of ordered pairs (tuples of length 2) consisting of all of the prime pairs consisting of primes less than 100. A Prime Pair is a pair of consecutive odd numbers that are both prime. Store your set of Prime Pairs in a variable. Your set of number 1 will be very helpful. Output your Set of Prime Pairs.
For the first one, this works perfectly:
r= {x for x in range(2, 101)
if not any(x % y == 0 for y in range(2, x))}
However, I'm pretty stumped on the second one. I think I may have to take the Cartesian product of the set r with something but I'm just not sure.
This gets me somewhat close but I just want the consecutive pairs.
cart = { (x, y) for x in r for y in r
if x < y }

primes = {x for x in range(2, 101) if all(x%y for y in range(2, min(x, 11)))}
I simplified the test a bit - if all(x%y instead of if not any(not x%y
I also limited y's range; there is no point in testing for divisors > sqrt(x). So max(x) == 100 implies max(y) == 10. For x <= 10, y must also be < x.
pairs = {(x, x+2) for x in primes if x+2 in primes}
Instead of generating pairs of primes and testing them, get one and see if the corresponding higher prime exists.

You can get clean and clear solutions by building the appropriate predicates as helper functions. In other words, use the Python set-builder notation the same way you would write the answer with regular mathematics set-notation.
The whole idea behind set comprehensions is to let us write and reason in code the same way we do mathematics by hand.
With an appropriate predicate in hand, problem 1 simplifies to:
low_primes = {x for x in range(1, 100) if is_prime(x)}
And problem 2 simplifies to:
low_prime_pairs = {(x, x+2) for x in range(1,100,2) if is_prime(x) and is_prime(x+2)}
Note how this code is a direct translation of the problem specification, "A Prime Pair is a pair of consecutive odd numbers that are both prime."
P.S. I'm trying to give you the correct problem solving technique without actually giving away the answer to the homework problem.

You can generate pairs like this:
{(x, x + 2) for x in r if x + 2 in r}
Then all that is left to do is to get a condition to make them prime, which you have already done in the first example.
A different way of doing it: (Although slower for large sets of primes)
{(x, y) for x in r for y in r if x + 2 == y}

Related

Sum of two squares in Python

I have written a code based on the two pointer algorithm to find the sum of two squares. My problem is that I run into a memory error when running this code for an input n=55555**2 + 66666**2. I am wondering how to correct this memory error.
def sum_of_two_squares(n):
look=tuple(range(n))
i=0
j = len(look)-1
while i < j:
x = (look[i])**2 + (look[j])**2
if x == n:
return (j,i)
elif x < n:
i += 1
else:
j -= 1
return None
n=55555**2 + 66666**2
print(sum_of_two_squares(n))
The problem Im trying to solve using two pointer algorithm is:
return a tuple of two positive integers whose squares add up to n, or return None if the integer n cannot be so expressed as a sum of two squares. The returned tuple must present the larger of its two numbers first. Furthermore, if some integer can be expressed as a sum of two squares in several ways, return the breakdown that maximizes the larger number. For example, the integer 85 allows two such representations 7*7 + 6*6 and 9*9 + 2*2, of which this function must therefore return (9, 2).
You're creating a tuple of size 55555^2 + 66666^2 = 7530713581
So if each element of the tuple takes one byte, the tuple will take up 7.01 GiB.
You'll need to either reduce the size of the tuple, or possibly make each element take up less space by specifying the type of each element: I would suggest looking into Numpy for the latter.
Specifically for this problem:
Why use a tuple at all?
You create the variable look which is just a list of integers:
look=tuple(range(n)) # = (0, 1, 2, ..., n-1)
Then you reference it, but never modify it. So: look[i] == i and look[j] == j.
So you're looking up numbers in a list of numbers. Why look them up? Why not just use i in place of look[i] and remove look altogether?
As others have pointed out, there's no need to use tuples at all.
One reasonably efficient way of solving this problem is to generate a series of integer square values (0, 1, 4, 9, etc...) and test whether or not subtracting these values from n leaves you with a value that is a perfect square.
You can generate a series of perfect squares efficiently by adding successive odd numbers together: 0 (+1) → 1 (+3) → 4 (+5) → 9 (etc.)
There are also various tricks you can use to test whether or not a number is a perfect square (for example, see the answers to this question), but — in Python, at least — it seems that simply testing the value of int(n**0.5) is faster than iterative methods such as a binary search.
def integer_sqrt(n):
# If n is a perfect square, return its (integer) square
# root. Otherwise return -1
r = int(n**0.5)
if r * r == n:
return r
return -1
def sum_of_two_squares(n):
# If n can be expressed as the sum of two squared integers,
# return these integers as a tuple. Otherwise return <None>
# i: iterator variable
# x: value of i**2
# y: value we need to add to x to obtain (i+1)**2
i, x, y = 0, 0, 1
# If i**2 > n / 2, then we can stop searching
max_x = n >> 1
while x <= max_x:
r = integer_sqrt(n-x)
if r >= 0:
return (i, r)
i, x, y = i+1, x+y, y+2
return None
This returns a solution to sum_of_two_squares(55555**2 + 66666**2) in a fraction of a second.
You do not need the ranges at all, and certainly do not need to convert them into tuples. They take a ridiculous amount of space, but you only need their current elements, numbers i and j. Also, as the friendly commenter suggested, you can start with sqrt(n) to improve the performance further.
def sum_of_two_squares(n):
i = 1
j = int(n ** (1/2))
while i < j:
x = i * i + j * j
if x == n:
return j, i
if x < n:
i += 1
else:
j -= 1
Bear in mind that the problem takes a very long time to be solved. Be patient. And no, NumPy won't help. There is nothing here to vectorize.

List comprehension has excruciatingly slow load time than regular code that does the same thing

I have a list comprehension that prints all the prime numbers from 1 to 1000. For some strange reason, my list comprehension takes 1:46 to load in the terminal. I find this very weird because when I write the same code out normally, it loads instantaneously.
Here is my comprehension: print([x for x in range(2, 1000) if x not in [y * z for y in range(2, 1001) for z in range(2, 1001)if y * z < 1000]])
As you can see, it makes a list of number from 2 and 1000 and prints the (prime) ones that are not in the list of composite numbers under 1000. When I run this, it correctly outputs, but takes ages on every computer I try. I thought maybe my code was just erroneous. However, when I isolate the [y * z for y in range(2, 1001) for z in range(2, 1001)if y * z < 1000] line, there is no delay in displaying the composites. And when I generate the regular list of number for comparison, there is also no lag. It's just when I use the "not in" operator that the comprehension takes ridiculously long to print them.
I thought that perhaps the not in comparison was being extra slow. But to my frustration, I noticed that when I wrote out the comparison part of the code normally and not in comprehension, there was absolutely no delay. See this:
x = [y * z for y in range(2, 1001) for z in range(2, 1001)if y * z < 1000]
newlist = []
for z in range(2, 1000):
if z not in x:
newlist.append(z)
print(newlist)
As you can see, I slapped the composite list into a variable and did the if statement and loop regularly. If x wasn't in the list, then it was added to a new list. Achieving the same goal of my list comprehension. I was wondering, if there was a solution to my list comprehension being so slow. The logic matches my original comprehension so why is it taking longer in comprehension format if it is essentially the same?
Please try to not add any additional features to my code, I'm to trying to use list comprehension and only list comprehension.
The inner list is recreated on every iteration of x. Simply separate it out:
composites = [y*z for y in range(2, 1001) for z in range(2, 1001) if y*z < 1000]
[x for x in range(2, 1000) if x not in composites]
By the way, if you make composites a set, lookups (in and not in) are much faster (O(1) instead of O(n), where n=len(composites)).
composites = {y*z for y ...}

Having trouble with the variant of the "Two Sum" coding challenge?

The two problems seeks to find two elements x and y such that x+y=target. This can be implemented using a brute force approach.
for x in arr:
for y in arr:
if x+y==target:
return [x,y]
We are doing some redundant computation in the for loop -- that is we only want to consider combinations of two elements. We can do a N C 2 dual-loop as follows.
for i, x in enumerate(arr):
if y in arr[i+1:]:
if x+y==target:
return [x,y]
And we save a large constant factor of time complexity. Now let's note that inner most loop is a search. We can either use a hash search or a binary search for.
seen = set()
for i, x in enumerate(arr):
if target-x in seen:
y = target-x
return [x,y]
seen.add(x)
Not that seen is only of length of i. And it will only trigger when hit the second number (because it's complement must be in the set).
A variant of this problem is: to find elements that satisfy the following x-y = target. It's a simple variant but it adds a bit of logical complexity to this problem.
My question is: why does the following not work? That is, we're just modifying the previous code?
seen = set()
for i, x in enumerate(arr):
for x-target in seen:
y = x-target
return [x,y]
seen.add(x)
I've asked a friend, however I didn't understand him. He said that subtraction isn't associative. We're exploiting the associative property of addition in the two sum problem to achieve the constant time improvement. But that's all he told me. I don't get it to be honest. I still think my code should work. Can someone tell me why my code doesn't work?
Your algorithm (once the if/for mixup is fixed) still doesn't work because subtraction is not commutative. The algorithm only effectively checks x,y pairs where x comes later in the array than y. That's OK when it's testing x+y = target, since it doesn't matter which order the two values are in. But for x-y = target, the order does matter, since x - y is not the same thing as y - x.
A fix for this would be to check each number in the array to see if it could be either x or y with the other value being one of the earlier values from arr. There needs to be a different check for each, so you probably need two if statements inside the loop:
seen = set()
for n in arr:
if n-target in seen:
x = n
y = n-target
return [x,y]
if n+target in seen:
x = n+target
y = n
return [x,y]
seen.add(x)
Note that I renamed the loop variable to n, since it could be either x or y depending on how the math worked out. It's not strictly necessary to use x and y variables in the bodies of the if statements, you could do those computations directly in the return statement. I also dropped the unneeded enumerate call, since the single-loop versions of the code don't use i at all.

Two Sorted Arrays, sum of 2 elements equal a certain number

I was wondering if I could get some help. I want to find an algorithm that is THETA(n) or linear time for determining whether 2 numbers in a 2 sorted arrays add up to a certain number.
For instance, let's say we have 2 sorted arrays: X and Y
I want to determine if there's an element of X and an element of Y that add up to exactly a certain number, let's say 50.
I have been able to come up with these algorithms in Python so far, but I am pretty sure they are order of THETA(n^2) rather than THETA(n).
def arrayTestOne(X,Y):
S =[1 for x in X for y in Y if x+y == 50]
def arrayTestTwo(X,Y):
for x in X:
for y in Y:
if x + y == 50:
print("1")
I'm thinking it's the double for loops that break the linear time, but how else would you iterate through 2 lists? Any thoughts would be appreciated.
What you can do is start with the highest in one list and the lowest in the other, and check the sum.
If the sum is your target, you're done.
If it's too high, go to the next highest value in the first list.
If it's too low, go to the next lowest value in the second.
If you go through both lists without reaching the target, you return false.
Here is a 2n for you which doesn't even need sorting:
def check_array(x, y, needed_sum):
y = set(y)
return next(((i, needed_sum-i) for i in x if (needed_sum-i) in y), None)

Debugging IndexError: Trivial sum generation problem

I can't wrap my head around what's causing the index error here, not exactly looking for a quick fix. Let me know however if my code repulses you/is incredibly ineffectual. The goal is to generate palindromes produced by the product of two four digit numbers.
Code:
for x in range(10000):
for y in range(10000):
product = str(x*y)
lengthprod = len(str(product))
for digit in range(lengthprod+1):
if (product[digit]==product[lengthprod-digit]):
print x,"x",y,"=",product
Traceback:
Traceback (most recent call last):
File "<pyshell#31>", line 6, in <module>
if (product[digit]==product[lengthprod-digit]):
IndexError: string index out of range
Converting a number to a string is generally a slow operation, since there are many possibilities in general (integers, floating point, scientific notation, maybe something exotic like fractions or imaginary numbers, not to mention things like handling leading zero or overwidth numbers or rounding to two decimal places). Thus, it is often a better approach for checking if a positive integer is a palindrome to reverse the digits numerically by repeatedly taking the input modulo 10 to extract the last digit, adding the digit to an accumulator that is multiplied by 10 at each step, then dividing the input number by 10 before looping. I don't speak Python, so here is my Scheme program to reverse a number:
(define (rev n)
(let loop ((n n) (r 0))
(if (zero? n) r
(loop (quotient n 10)
(+ (* r 10) (modulo n 10))))))
Then you can check if the number is a palindrome by checking if the input number equals its reversal.
Edit: Here it is in Python:
def rev(n):
r = 0
while n > 0:
r = r * 10 + n % 10
n = n // 10
return r
You iterate over values 0...lengthprod , but the legal subscripts for product are 0...lengthprod-1.
The last index is out of bounds. It references an element that is one byte beyond the end of the string.
Two changes:
1: range(0, lengthprod+1) should be range(0, lengthprod)
See documentation on range()
2: product[lengthprod-digit] should be product[lengthprod-digit-1]
Off by one error since lengthprod is a length (1 based) and digit is an index (0 based).
Note, this will only give you valid "single digit" palindromes, but gets you past the index out of range error.
Your code repulses me!
(Sorry, I wouldn't normally be so rude, but since you asked for it... ;)
Use xrange rather than range for long loops like this.
Start the range from 1 rather than 0 unless you don't mind all the duplicated trivial results.
Since multiplication commutes, you might want to loop over a "triangle" rather than a "square" to avoid duplicates.
Your variable name product shadows a function from numeric core.
The question says you're interested in the "product of two four digit numbers", but your code has no such restrictions on the number of digits in the numbers. If you want the four digit numbers as input, just start your xrange(start, stop) from 1000.
Since your stated "goal is to generate palindromes", how about to try it with the correct tool for the job: generators!
def pairs(n):
for x in xrange(n):
for y in xrange(n):
yield (x,y)
pairs_generator = pairs(100)
filter(None, ['{x}*{y}={xy}'.format(x=x,y=y,xy=x*y) if str(x*y) == str(x*y)[::-1] else None for x,y in pairs_generator])
I kept my generator simple for clarity's purpose. I will leave it as an exercise for you to simply make a generator to spit out the palindromes. This will involve moving the logic which I have put in my list comprehension into the generator (or you could make a new palindrome_generator which uses a pairs_generator).
Have fun!

Categories