Having trouble with the variant of the "Two Sum" coding challenge? - python

The two problems seeks to find two elements x and y such that x+y=target. This can be implemented using a brute force approach.
for x in arr:
for y in arr:
if x+y==target:
return [x,y]
We are doing some redundant computation in the for loop -- that is we only want to consider combinations of two elements. We can do a N C 2 dual-loop as follows.
for i, x in enumerate(arr):
if y in arr[i+1:]:
if x+y==target:
return [x,y]
And we save a large constant factor of time complexity. Now let's note that inner most loop is a search. We can either use a hash search or a binary search for.
seen = set()
for i, x in enumerate(arr):
if target-x in seen:
y = target-x
return [x,y]
seen.add(x)
Not that seen is only of length of i. And it will only trigger when hit the second number (because it's complement must be in the set).
A variant of this problem is: to find elements that satisfy the following x-y = target. It's a simple variant but it adds a bit of logical complexity to this problem.
My question is: why does the following not work? That is, we're just modifying the previous code?
seen = set()
for i, x in enumerate(arr):
for x-target in seen:
y = x-target
return [x,y]
seen.add(x)
I've asked a friend, however I didn't understand him. He said that subtraction isn't associative. We're exploiting the associative property of addition in the two sum problem to achieve the constant time improvement. But that's all he told me. I don't get it to be honest. I still think my code should work. Can someone tell me why my code doesn't work?

Your algorithm (once the if/for mixup is fixed) still doesn't work because subtraction is not commutative. The algorithm only effectively checks x,y pairs where x comes later in the array than y. That's OK when it's testing x+y = target, since it doesn't matter which order the two values are in. But for x-y = target, the order does matter, since x - y is not the same thing as y - x.
A fix for this would be to check each number in the array to see if it could be either x or y with the other value being one of the earlier values from arr. There needs to be a different check for each, so you probably need two if statements inside the loop:
seen = set()
for n in arr:
if n-target in seen:
x = n
y = n-target
return [x,y]
if n+target in seen:
x = n+target
y = n
return [x,y]
seen.add(x)
Note that I renamed the loop variable to n, since it could be either x or y depending on how the math worked out. It's not strictly necessary to use x and y variables in the bodies of the if statements, you could do those computations directly in the return statement. I also dropped the unneeded enumerate call, since the single-loop versions of the code don't use i at all.

Related

Analyzing the complexity matrix path-finding

Recently in my homework, I was assinged to solve the following problem:
Given a matrix of order nxn of zeros and ones, find the number of paths from [0,0] to [n-1,n-1] that go only through zeros (they are not necessarily disjoint) where you could only walk down or to the right, never up or left. Return a matrix of the same order where the [i,j] entry is the number of paths in the original matrix that go through [i,j], the solution has to be recursive.
My solution in python:
def find_zero_paths(M):
n,m = len(M),len(M[0])
dict = {}
for i in range(n):
for j in range(m):
M_top,M_bot = blocks(M,i,j)
X,Y = find_num_paths(M_top),find_num_paths(M_bot)
dict[(i,j)] = X*Y
L = [[dict[(i,j)] for j in range(m)] for i in range(n)]
return L[0][0],L
def blocks(M,k,l):
n,m = len(M),len(M[0])
assert k<n and l<m
M_top = [[M[i][j] for i in range(k+1)] for j in range(l+1)]
M_bot = [[M[i][j] for i in range(k,n)] for j in range(l,m)]
return [M_top,M_bot]
def find_num_paths(M):
dict = {(1, 1): 1}
X = find_num_mem(M, dict)
return X
def find_num_mem(M,dict):
n, m = len(M), len(M[0])
if M[n-1][m-1] != 0:
return 0
elif (n,m) in dict:
return dict[(n,m)]
elif n == 1 and m > 1:
new_M = [M[0][:m-1]]
X = find_num_mem(new_M,dict)
dict[(n,m-1)] = X
return X
elif m == 1 and n>1:
new_M = M[:n-1]
X = find_num_mem(new_M, dict)
dict[(n-1,m)] = X
return X
new_M1 = M[:n-1]
new_M2 = [M[i][:m-1] for i in range(n)]
X,Y = find_num_mem(new_M1, dict),find_num_mem(new_M2, dict)
dict[(n-1,m)],dict[(n,m-1)] = X,Y
return X+Y
My code is based on the idea that the number of paths that go through [i,j] in the original matrix is equal to the product of the number of paths from [0,0] to [i,j] and the number of paths from [i,j] to [n-1,n-1]. Another idea is that the number of paths from [0,0] to [i,j] is the sum of the number of paths from [0,0] to [i-1,j] and from [0,0] to [i,j-1]. Hence I decided to use a dictionary whose keys are matricies of the form [[M[i][j] for j in range(k)] for i in range(l)] or [[M[i][j] for j in range(k+1,n)] for i in range(l+1,n)] for some 0<=k,l<=n-1 where M is the original matrix and whose values are the number of paths from the top of the matrix to the bottom. After analizing the complexity of my code I arrived at the conclusion that it is O(n^6).
Now, my instructor said this code is exponential (for find_zero_paths), however, I disagree.
The recursion tree (for find_num_paths) size is bounded by the number of submatrices of the form above which is O(n^2). Also, each time we add a new matrix to the dictionary we do it in polynomial time (only slicing lists), SO... the total complexity is polynomial (poly*poly = poly). Also, the function 'blocks' runs in polynomial time, and hence 'find_zero_paths' runs in polynomial time (2 lists of polynomial-size times a function which runs in polynomial time) so all in all the code runs in polynomial time.
My question: Is the code polynomial and my O(n^6) bound is wrong or is it exponential and I am missing something?
Unfortunately, your instructor is right.
There is a lot to unpack here:
Before we start, as quick note. Please don't use dict as a variable name. It hurts ^^. Dict is a reserved keyword for a dictionary constructor in python. It is a bad practice to overwrite it with your variable.
First, your approach of counting M_top * M_bottom is good, if you were to compute only one cell in the matrix. In the way you go about it, you are unnecessarily computing some blocks over and over again - that is why I pondered about the recursion, I would use dynamic programming for this one. Once from the start to end, once from end to start, then I would go and compute the products and be done with it. No need for O(n^6) of separate computations. Sine you have to use recursion, I would recommend caching the partial results and reusing them wherever possible.
Second, the root of the issue and the cause of your invisible-ish exponent. It is hidden in the find_num_mem function. Say you compute the last element in the matrix - the result[N][N] field and let us consider the simplest case, where the matrix is full of zeroes so every possible path exists.
In the first step, your recursion creates branches [N][N-1] and [N-1][N].
In the second step, [N-1][N-1], [N][N-2], [N-2][N], [N-1][N-1]
In the third step, you once again create two branches from every previous step - a beautiful example of an exponential explosion.
Now how to go about it: You will quickly notice that some of the branches are being duplicated over and over. Cache the results.

defining a new variable equal to the length of two other arrays

This is purely out of curiosity, and I'm definitely over thinking it, but it's something I occasionally bump into and I never like my solution.
Given two lists:
x = [1, 2, 3, 4]
y = [128, 244, 132, 161]
I need to compute a new variable n, equal to the length of these lists. I could use n = len(x) or n = len(y), but this is explicitly setting n equal to one and specifically not the other. I feel like the following does what I want:
def common_length(x, y):
assert len(x) == len(y)
return len(list(zip(x, y)))
But that's clearly overkill. I don't know why this bugs me but I'd like to know what alternatives there are, if any, or if I should just get on with my life by using n=len(x).
If you know for certain that x and y have the same length, but want to make it explicit in the code (e.g. for sake of future maintainance) that you could have chosen either of them, then you could of course add a comment:
n = len(x) # also equals len(y)
If you suspect that they might not be equal, and you want to raise an exception if they are not, then you should consider what exception is most appropriate. The use of assert is intended as a debugging aid, so your code should only raise an AssertionError in the event that it actually contains a bug. So if your code ought to be creating x and y with equal lengths but you want to check that it is actually doing so, then by all means you could use:
n = len(x)
assert n == len(y)
However, if x and y derive from user input and the user might have wrongly provided inputs of unequal length, then it would be more appropriate to do e.g.:
n = len(x)
if n != len(y):
raise ValueError('x and y should have equal lengths')
None of the above are symmetrical in x and y to look at, but in all cases it is obvious to the reader that the two lengths are supposed to be equal -- and in the last two it is enforced. There is little justification in using len(list(zip(x, y))) merely for sake of aesthetic symmetry, when it adds unnecessary expense in iterating over both inputs and creating a temporary list. A cheaper (but still unnecessary) equivalent would be min((len(x), len(y))).
The other thing to consider, of course, is whether the fact that x and y must always have the same length is indicative of the fact that your data should be organised differently. For example, depending how you intend to use the data, it might be better to store your data as a list of tuples -- i.e. what your list(zip(x, y)) would produce:
data = [(1, 128), (2, 244), (3, 132), (4, 161)]
or maybe as a numpy array:
data = np.array([[1, 128], [2, 244], [3, 132], [4, 161]])
whereupon your statement just becomes:
n = len(data)
and the aesthetic gain comes as a natural consequence of actually organising your data in a way that ensures that the constraint of equal length cannot be violated, rather than as a result of contriving to write a symmetrical expression merely for aesthetic purposes.
In the numpy array case, you can also still refer to x and y separately by creating the relevant slices (recall that in numpy, slices are views of the data rather than copies):
x = data[:,0]
y = data[:,1]
though this approach does of course depend on them having the same data type.
It looks pretty simple case:
You have n1=len(x) and n2=len(y)
If n1 and n2 will always be the same, you can use any of these 2 for declaring n
n=n1 or n=n2
If they can be different, it depends on your needs to get the min of max of these 2 with
n=max(n1, n2) or n=min(n1,n2)
There can be no other case under these parameters

Python: Is this the most efficient way to reverse order without using shortcuts?

x = [1,2,3,4,5,6,7,8,9,10]
#Random list elements
for i in range(int(len(x)/2)):
value = x[i]
x[i] = x[len(x)-i-1]
x[len(x)-i-1] = value
#Confusion on efficiency
print(x)
This is a uni course for first year. So no python shortcuts are allowed
Not sure what counts as "a shortcut" (reversed and the "Martian Smiley" [::-1] being obvious candidates -- but does either count as "a shortcut"?!), but at least a couple small improvements are easy:
L = len(x)
for i in range(L//2):
mirror = L - i - 1
x[i], x[mirror] = x[mirror], x[i]
This gets len(x) only once -- it's a fast operation but there's no reason to keep repeating it over and over -- also computes mirror but once, does the swap more directly, and halves L (for the range argument) directly with the truncating-division operator rather than using the non-truncating division and then truncating with int. Nanoseconds for each case, but it may be considered slightly clearer as well as microscopically faster.
x = [1,2,3,4,5,6,7,8,9,10]
x = x.__getitem__(slice(None,None,-1))
slice is a python builtin object (like range and len that you used in your example)
__getitem__ is a method belonging to iterable types ( of which x is)
there are absolutely no shortcuts here :) and its effectively one line.

Python Set Comprehension

So I have these two problems for a homework assignment and I'm stuck on the second one.
Use a Python Set Comprehension (Python's equivalent of Set Builder notation) to generate a set of all of the prime numbers that are less than 100. Recall that a prime number is an integer that is greater than 1 and not divisible by any integer other than itself and 1. Store your set of primes in a variable (you will need it for additional parts). Output your set of primes (e.g., with the print function).
Use a Python Set Comprehension to generate a set of ordered pairs (tuples of length 2) consisting of all of the prime pairs consisting of primes less than 100. A Prime Pair is a pair of consecutive odd numbers that are both prime. Store your set of Prime Pairs in a variable. Your set of number 1 will be very helpful. Output your Set of Prime Pairs.
For the first one, this works perfectly:
r= {x for x in range(2, 101)
if not any(x % y == 0 for y in range(2, x))}
However, I'm pretty stumped on the second one. I think I may have to take the Cartesian product of the set r with something but I'm just not sure.
This gets me somewhat close but I just want the consecutive pairs.
cart = { (x, y) for x in r for y in r
if x < y }
primes = {x for x in range(2, 101) if all(x%y for y in range(2, min(x, 11)))}
I simplified the test a bit - if all(x%y instead of if not any(not x%y
I also limited y's range; there is no point in testing for divisors > sqrt(x). So max(x) == 100 implies max(y) == 10. For x <= 10, y must also be < x.
pairs = {(x, x+2) for x in primes if x+2 in primes}
Instead of generating pairs of primes and testing them, get one and see if the corresponding higher prime exists.
You can get clean and clear solutions by building the appropriate predicates as helper functions. In other words, use the Python set-builder notation the same way you would write the answer with regular mathematics set-notation.
The whole idea behind set comprehensions is to let us write and reason in code the same way we do mathematics by hand.
With an appropriate predicate in hand, problem 1 simplifies to:
low_primes = {x for x in range(1, 100) if is_prime(x)}
And problem 2 simplifies to:
low_prime_pairs = {(x, x+2) for x in range(1,100,2) if is_prime(x) and is_prime(x+2)}
Note how this code is a direct translation of the problem specification, "A Prime Pair is a pair of consecutive odd numbers that are both prime."
P.S. I'm trying to give you the correct problem solving technique without actually giving away the answer to the homework problem.
You can generate pairs like this:
{(x, x + 2) for x in r if x + 2 in r}
Then all that is left to do is to get a condition to make them prime, which you have already done in the first example.
A different way of doing it: (Although slower for large sets of primes)
{(x, y) for x in r for y in r if x + 2 == y}

"'generator' object is not subscriptable" error

Why am I getting this error, from line 5 of my code, when attempting to solve Project Euler Problem 11?
for x in matrix:
p = 0
for y in x:
if p < 17:
currentProduct = int(y) * int(x[p + 1]) * int(x[p + 2]) * int(x[p + 3])
if currentProduct > highestProduct:
print(currentProduct)
highestProduct = currentProduct
else:
break
p += 1
'generator' object is not subscriptable
Your x value is is a generator object, which is an Iterator: it generates values in order, as they are requested by a for loop or by calling next(x).
You are trying to access it as though it were a list or other Sequence type, which let you access arbitrary elements by index as x[p + 1].
If you want to look up values from your generator's output by index, you may want to convert it to a list:
x = list(x)
This solves your problem, and is suitable in most cases. However, this requires generating and saving all of the values at once, so it can fail if you're dealing with an extremely long or infinite list of values, or the values are extremely large.
If you just needed a single value from the generator, you could instead use itertools.islice(x, p) to discard the first p values, then next(...) to take the one you need. This eliminate the need to hold multiple items in memory or compute values beyond the one you're looking for.
import itertools
result = next(itertools.islice(x, p))
As an extension to Jeremy's answer some thoughts about the design of your code:
Looking at your algorithm, it appears that you do not actually need truly random access to the values produced by the generator: At any point in time you only need to keep four consecutive values (three, with an extra bit of optimization). This is a bit obscured in your code because you mix indexing and iteration: If indexing would work (which it doesn't), your y could be written as x[p + 0].
For such algorithms, you can apply kind of a "sliding window" technique, demonstrated below in a stripped-down version of your code:
import itertools, functools, operator
vs = [int(v) for v in itertools.islice(x, 3)]
for v in x:
vs.append(int(v))
currentProduct = functools.reduce(operator.mul, vs, 1)
print(currentProduct)
vs = vs[1:]

Categories