defining a new variable equal to the length of two other arrays - python

This is purely out of curiosity, and I'm definitely over thinking it, but it's something I occasionally bump into and I never like my solution.
Given two lists:
x = [1, 2, 3, 4]
y = [128, 244, 132, 161]
I need to compute a new variable n, equal to the length of these lists. I could use n = len(x) or n = len(y), but this is explicitly setting n equal to one and specifically not the other. I feel like the following does what I want:
def common_length(x, y):
assert len(x) == len(y)
return len(list(zip(x, y)))
But that's clearly overkill. I don't know why this bugs me but I'd like to know what alternatives there are, if any, or if I should just get on with my life by using n=len(x).

If you know for certain that x and y have the same length, but want to make it explicit in the code (e.g. for sake of future maintainance) that you could have chosen either of them, then you could of course add a comment:
n = len(x) # also equals len(y)
If you suspect that they might not be equal, and you want to raise an exception if they are not, then you should consider what exception is most appropriate. The use of assert is intended as a debugging aid, so your code should only raise an AssertionError in the event that it actually contains a bug. So if your code ought to be creating x and y with equal lengths but you want to check that it is actually doing so, then by all means you could use:
n = len(x)
assert n == len(y)
However, if x and y derive from user input and the user might have wrongly provided inputs of unequal length, then it would be more appropriate to do e.g.:
n = len(x)
if n != len(y):
raise ValueError('x and y should have equal lengths')
None of the above are symmetrical in x and y to look at, but in all cases it is obvious to the reader that the two lengths are supposed to be equal -- and in the last two it is enforced. There is little justification in using len(list(zip(x, y))) merely for sake of aesthetic symmetry, when it adds unnecessary expense in iterating over both inputs and creating a temporary list. A cheaper (but still unnecessary) equivalent would be min((len(x), len(y))).
The other thing to consider, of course, is whether the fact that x and y must always have the same length is indicative of the fact that your data should be organised differently. For example, depending how you intend to use the data, it might be better to store your data as a list of tuples -- i.e. what your list(zip(x, y)) would produce:
data = [(1, 128), (2, 244), (3, 132), (4, 161)]
or maybe as a numpy array:
data = np.array([[1, 128], [2, 244], [3, 132], [4, 161]])
whereupon your statement just becomes:
n = len(data)
and the aesthetic gain comes as a natural consequence of actually organising your data in a way that ensures that the constraint of equal length cannot be violated, rather than as a result of contriving to write a symmetrical expression merely for aesthetic purposes.
In the numpy array case, you can also still refer to x and y separately by creating the relevant slices (recall that in numpy, slices are views of the data rather than copies):
x = data[:,0]
y = data[:,1]
though this approach does of course depend on them having the same data type.

It looks pretty simple case:
You have n1=len(x) and n2=len(y)
If n1 and n2 will always be the same, you can use any of these 2 for declaring n
n=n1 or n=n2
If they can be different, it depends on your needs to get the min of max of these 2 with
n=max(n1, n2) or n=min(n1,n2)
There can be no other case under these parameters

Related

Is there a more efficient way to find the missing integer?

I'm currently studying a module called data structures and algorithms at a university. We've been tasked with writing an algorithm that finds the smallest positive integer which does not occur in a given sequence. I was able to find a solution, but is there a more efficient way?
x = [5, 6, 3, 1, 2]
def missing_integer():
for i in range(1, 100):
if i not in x:
return i
print(missing_integer())
The instructions include some examples:
given x = [1, 3, 6, 4, 1, 2], the function should return 5,
given x = [1, 2, 3], the function should return 4 and
given x = [−1, −3], the function should return 1.
You did not ask for the most efficient way to solve the problem, just if there is a more efficient way than yours. The answer to that is yes.
If the missing integer is near the top of the range of the integers and the list is long, your algorithm as a run-time efficiency of O(N**2)--your loop goes through all possible values, and the not in operator searches through the entire list if a match is not found. (Your code searches only up to the value 100--I assume that is just a mistake on your part and you want to handle sequences of any length.)
Here is a simple algorithm that is merely order O(N*log(N)). (Note that quicker algorithms exist--I show this one since it is simple and thus answers your question easily.) Sort the sequence (which has the order I stated) then run through it starting at the smallest value. This linear search will easily find the missing positive integer. This algorithm also has the advantage that the sequence could involve negative numbers, non-integer numbers, and repeated numbers, and the code could easily handle those. This also handles sequences of any size and with numbers of any size, though of course it runs longer for longer sequences. If a good sort routine is used, the memory usage is quite small.
I think the O(n) algorithm goes like this: initialise an array record of length n + 2 (list in Python) to None, and iterate over the input. If the element is one of the array indexes, set the element in the record to True. Now iterate over the new list record starting from index 1. Return the first None encountered.
The slow step in your algorithm is that line:
if i not in x:
That step takes linear time, which makes the entire algorithm O(N*N). If you first turn the list into a set, the lookup is much faster:
def missing_integer():
sx = set(x)
for i in range(1, 100):
if i not in sx:
return i
Lookup in a set is fast, in fact it takes constant time, and the algorithm now runs in linear time O(N).
Another solution is creating an array with a size of Max value, and traverse the array and marking each location of the array when that value is seen. Then, iterate from the start of the array and report the first finding unlabeled location as the smallest missing value. This is done in O(n) (Fill the array and finding the smallest unlabeled location).
Also, for negative values you can add all values the Min value to find all values positive. Then, apply the above method.
The space complexity of this method is \Theta(n).
To know more, see this post about the implementation and scrutinize more about this method.
Can be done in O(n) time with a bit of maths. initialise a minimum_value and maximum_value, and sum_value names then loop once through the numbers to find the minimum and maximum and the sum of all the numbers (mn, mx, sm).
Now the sum of integers 0..n = n*(n-1)/2 = s(n)
Therefore: missing_number = (s(mx) - s(mn)) - sm
All done with traversing the numbers only once!
My answer using list comprehension:
def solution(A):
max_val = max(A);
min_val = min(A);
if max_val<0: val = 1;
elif max_val > 0:
li = [];
[li.append(X) for X in range(min_val,max_val) if X not in A];
if len(li)>0:
if min(li)<0: val = 1;
else: val = min(li);
if len(li)==0: val=max_val+1;
return val;
L = [-1, -3];
res = solution(L);
print(res);

Having trouble with the variant of the "Two Sum" coding challenge?

The two problems seeks to find two elements x and y such that x+y=target. This can be implemented using a brute force approach.
for x in arr:
for y in arr:
if x+y==target:
return [x,y]
We are doing some redundant computation in the for loop -- that is we only want to consider combinations of two elements. We can do a N C 2 dual-loop as follows.
for i, x in enumerate(arr):
if y in arr[i+1:]:
if x+y==target:
return [x,y]
And we save a large constant factor of time complexity. Now let's note that inner most loop is a search. We can either use a hash search or a binary search for.
seen = set()
for i, x in enumerate(arr):
if target-x in seen:
y = target-x
return [x,y]
seen.add(x)
Not that seen is only of length of i. And it will only trigger when hit the second number (because it's complement must be in the set).
A variant of this problem is: to find elements that satisfy the following x-y = target. It's a simple variant but it adds a bit of logical complexity to this problem.
My question is: why does the following not work? That is, we're just modifying the previous code?
seen = set()
for i, x in enumerate(arr):
for x-target in seen:
y = x-target
return [x,y]
seen.add(x)
I've asked a friend, however I didn't understand him. He said that subtraction isn't associative. We're exploiting the associative property of addition in the two sum problem to achieve the constant time improvement. But that's all he told me. I don't get it to be honest. I still think my code should work. Can someone tell me why my code doesn't work?
Your algorithm (once the if/for mixup is fixed) still doesn't work because subtraction is not commutative. The algorithm only effectively checks x,y pairs where x comes later in the array than y. That's OK when it's testing x+y = target, since it doesn't matter which order the two values are in. But for x-y = target, the order does matter, since x - y is not the same thing as y - x.
A fix for this would be to check each number in the array to see if it could be either x or y with the other value being one of the earlier values from arr. There needs to be a different check for each, so you probably need two if statements inside the loop:
seen = set()
for n in arr:
if n-target in seen:
x = n
y = n-target
return [x,y]
if n+target in seen:
x = n+target
y = n
return [x,y]
seen.add(x)
Note that I renamed the loop variable to n, since it could be either x or y depending on how the math worked out. It's not strictly necessary to use x and y variables in the bodies of the if statements, you could do those computations directly in the return statement. I also dropped the unneeded enumerate call, since the single-loop versions of the code don't use i at all.

Most efficient if statement?

I would like to write a function that takes integer numbers x, y, L and R as parameters and returns True if x**y lies in the interval (L, R] and False otherwise.
I am considering several ways to write a conditional statement inside this function:
if L < x ** y <= R:
if x ** y > L and x ** y <= R:
if x ** y in range(L + 1, R + 1):
Why is option 1 the most efficient in terms of execution time ?
Both #1 and #3 avoid recalculating x ** y, where #2 must calculate it twice.
On Python 2, #3 will be terrible, because it must compute the whole contents of the range. On Python 3.2+, it doesn't have to (range is smart, and can properly determine mathematically whether an int appears in the range without actually iterating, in constant time), but it's at best equivalent to #1, since creating the range object at all has some overhead.
As tobias_k mentions in the comments, if x ** y produces a float, #3 will be slower (breaks the Python 3.2+ O(1) membership testing optimization, requiring an implicit loop over all values), and will get different results than #1 and #2 if the value is not equal to any int value in the range. That is, testing 3.5 in range(1, 5) returns False, and has to check 3.5 against 1, 2, 3, and 4 individually before it can even tell you that much.
Basically, stick to #1, it's going to be the only one that avoids redundant computations and avoids creating a ton of values for comparison on both Py 2 and Py3. #3 is not going to be much (if at all) slower on Python 3.2+, but it does involve creating a range object that isn't needed here, and won't be quite logically equivalent.
The first one has to evaluate x**y only once, so it should be faster than the second (also, more readable). The third one would have to loop over the iterator (in python 2, so it should be slower than both) or make two comparisons (in python 3, so it is no better than the first one). Keep the first one.

Python: Is this the most efficient way to reverse order without using shortcuts?

x = [1,2,3,4,5,6,7,8,9,10]
#Random list elements
for i in range(int(len(x)/2)):
value = x[i]
x[i] = x[len(x)-i-1]
x[len(x)-i-1] = value
#Confusion on efficiency
print(x)
This is a uni course for first year. So no python shortcuts are allowed
Not sure what counts as "a shortcut" (reversed and the "Martian Smiley" [::-1] being obvious candidates -- but does either count as "a shortcut"?!), but at least a couple small improvements are easy:
L = len(x)
for i in range(L//2):
mirror = L - i - 1
x[i], x[mirror] = x[mirror], x[i]
This gets len(x) only once -- it's a fast operation but there's no reason to keep repeating it over and over -- also computes mirror but once, does the swap more directly, and halves L (for the range argument) directly with the truncating-division operator rather than using the non-truncating division and then truncating with int. Nanoseconds for each case, but it may be considered slightly clearer as well as microscopically faster.
x = [1,2,3,4,5,6,7,8,9,10]
x = x.__getitem__(slice(None,None,-1))
slice is a python builtin object (like range and len that you used in your example)
__getitem__ is a method belonging to iterable types ( of which x is)
there are absolutely no shortcuts here :) and its effectively one line.

"'generator' object is not subscriptable" error

Why am I getting this error, from line 5 of my code, when attempting to solve Project Euler Problem 11?
for x in matrix:
p = 0
for y in x:
if p < 17:
currentProduct = int(y) * int(x[p + 1]) * int(x[p + 2]) * int(x[p + 3])
if currentProduct > highestProduct:
print(currentProduct)
highestProduct = currentProduct
else:
break
p += 1
'generator' object is not subscriptable
Your x value is is a generator object, which is an Iterator: it generates values in order, as they are requested by a for loop or by calling next(x).
You are trying to access it as though it were a list or other Sequence type, which let you access arbitrary elements by index as x[p + 1].
If you want to look up values from your generator's output by index, you may want to convert it to a list:
x = list(x)
This solves your problem, and is suitable in most cases. However, this requires generating and saving all of the values at once, so it can fail if you're dealing with an extremely long or infinite list of values, or the values are extremely large.
If you just needed a single value from the generator, you could instead use itertools.islice(x, p) to discard the first p values, then next(...) to take the one you need. This eliminate the need to hold multiple items in memory or compute values beyond the one you're looking for.
import itertools
result = next(itertools.islice(x, p))
As an extension to Jeremy's answer some thoughts about the design of your code:
Looking at your algorithm, it appears that you do not actually need truly random access to the values produced by the generator: At any point in time you only need to keep four consecutive values (three, with an extra bit of optimization). This is a bit obscured in your code because you mix indexing and iteration: If indexing would work (which it doesn't), your y could be written as x[p + 0].
For such algorithms, you can apply kind of a "sliding window" technique, demonstrated below in a stripped-down version of your code:
import itertools, functools, operator
vs = [int(v) for v in itertools.islice(x, 3)]
for v in x:
vs.append(int(v))
currentProduct = functools.reduce(operator.mul, vs, 1)
print(currentProduct)
vs = vs[1:]

Categories