"'generator' object is not subscriptable" error - python

Why am I getting this error, from line 5 of my code, when attempting to solve Project Euler Problem 11?
for x in matrix:
p = 0
for y in x:
if p < 17:
currentProduct = int(y) * int(x[p + 1]) * int(x[p + 2]) * int(x[p + 3])
if currentProduct > highestProduct:
print(currentProduct)
highestProduct = currentProduct
else:
break
p += 1
'generator' object is not subscriptable

Your x value is is a generator object, which is an Iterator: it generates values in order, as they are requested by a for loop or by calling next(x).
You are trying to access it as though it were a list or other Sequence type, which let you access arbitrary elements by index as x[p + 1].
If you want to look up values from your generator's output by index, you may want to convert it to a list:
x = list(x)
This solves your problem, and is suitable in most cases. However, this requires generating and saving all of the values at once, so it can fail if you're dealing with an extremely long or infinite list of values, or the values are extremely large.
If you just needed a single value from the generator, you could instead use itertools.islice(x, p) to discard the first p values, then next(...) to take the one you need. This eliminate the need to hold multiple items in memory or compute values beyond the one you're looking for.
import itertools
result = next(itertools.islice(x, p))

As an extension to Jeremy's answer some thoughts about the design of your code:
Looking at your algorithm, it appears that you do not actually need truly random access to the values produced by the generator: At any point in time you only need to keep four consecutive values (three, with an extra bit of optimization). This is a bit obscured in your code because you mix indexing and iteration: If indexing would work (which it doesn't), your y could be written as x[p + 0].
For such algorithms, you can apply kind of a "sliding window" technique, demonstrated below in a stripped-down version of your code:
import itertools, functools, operator
vs = [int(v) for v in itertools.islice(x, 3)]
for v in x:
vs.append(int(v))
currentProduct = functools.reduce(operator.mul, vs, 1)
print(currentProduct)
vs = vs[1:]

Related

Is there a more efficient way to find the missing integer?

I'm currently studying a module called data structures and algorithms at a university. We've been tasked with writing an algorithm that finds the smallest positive integer which does not occur in a given sequence. I was able to find a solution, but is there a more efficient way?
x = [5, 6, 3, 1, 2]
def missing_integer():
for i in range(1, 100):
if i not in x:
return i
print(missing_integer())
The instructions include some examples:
given x = [1, 3, 6, 4, 1, 2], the function should return 5,
given x = [1, 2, 3], the function should return 4 and
given x = [−1, −3], the function should return 1.
You did not ask for the most efficient way to solve the problem, just if there is a more efficient way than yours. The answer to that is yes.
If the missing integer is near the top of the range of the integers and the list is long, your algorithm as a run-time efficiency of O(N**2)--your loop goes through all possible values, and the not in operator searches through the entire list if a match is not found. (Your code searches only up to the value 100--I assume that is just a mistake on your part and you want to handle sequences of any length.)
Here is a simple algorithm that is merely order O(N*log(N)). (Note that quicker algorithms exist--I show this one since it is simple and thus answers your question easily.) Sort the sequence (which has the order I stated) then run through it starting at the smallest value. This linear search will easily find the missing positive integer. This algorithm also has the advantage that the sequence could involve negative numbers, non-integer numbers, and repeated numbers, and the code could easily handle those. This also handles sequences of any size and with numbers of any size, though of course it runs longer for longer sequences. If a good sort routine is used, the memory usage is quite small.
I think the O(n) algorithm goes like this: initialise an array record of length n + 2 (list in Python) to None, and iterate over the input. If the element is one of the array indexes, set the element in the record to True. Now iterate over the new list record starting from index 1. Return the first None encountered.
The slow step in your algorithm is that line:
if i not in x:
That step takes linear time, which makes the entire algorithm O(N*N). If you first turn the list into a set, the lookup is much faster:
def missing_integer():
sx = set(x)
for i in range(1, 100):
if i not in sx:
return i
Lookup in a set is fast, in fact it takes constant time, and the algorithm now runs in linear time O(N).
Another solution is creating an array with a size of Max value, and traverse the array and marking each location of the array when that value is seen. Then, iterate from the start of the array and report the first finding unlabeled location as the smallest missing value. This is done in O(n) (Fill the array and finding the smallest unlabeled location).
Also, for negative values you can add all values the Min value to find all values positive. Then, apply the above method.
The space complexity of this method is \Theta(n).
To know more, see this post about the implementation and scrutinize more about this method.
Can be done in O(n) time with a bit of maths. initialise a minimum_value and maximum_value, and sum_value names then loop once through the numbers to find the minimum and maximum and the sum of all the numbers (mn, mx, sm).
Now the sum of integers 0..n = n*(n-1)/2 = s(n)
Therefore: missing_number = (s(mx) - s(mn)) - sm
All done with traversing the numbers only once!
My answer using list comprehension:
def solution(A):
max_val = max(A);
min_val = min(A);
if max_val<0: val = 1;
elif max_val > 0:
li = [];
[li.append(X) for X in range(min_val,max_val) if X not in A];
if len(li)>0:
if min(li)<0: val = 1;
else: val = min(li);
if len(li)==0: val=max_val+1;
return val;
L = [-1, -3];
res = solution(L);
print(res);

How to find last "K" indexes of vector satisfying condition (Python) ? (Analogue of Matlab's "find" )

Consider some vector:
import numpy as np
v = np.arange(10)
Assume we need to find last 2 indexes satisfying some condition.
For example in Matlab it would be written e.g.
find(v <5 , 2,'last')
answer = [ 3 , 4 ] (Note: Matlab indexing from 1)
Question: What would be the clearest way to do that in Python ?
"Nice" solution should STOP search when it finds 2 desired results, it should NOT search over all elements of vector.
So np.where does not seems to be "nice" in that sense.
We can easyly write that using "for", but is there any alternative way ?
I am afraid using "for" since it might be slow (at least it is very much so in Matlab).
This attempt doesn't use numpy, and it is probably not very idiomatic.
Nevertheless, if I understand it correctly, zip, filter and reversed are all lazy iterators that take only the elements that they really need. Therefore, you could try this:
x = list(range(10))
from itertools import islice
res = reversed(list(map(
lambda xi: xi[1],
islice(
filter(
lambda xi: xi[0] < 5,
zip(reversed(x), reversed(range(len(x))))
),
2
)
)))
print(list(res))
Output:
[3, 4]
What it does (from inside to outside):
create index range
reverse both array and indices
zip the reversed array with indices
filter the two (value, index)-pairs that you need, extract them by islice
Throw away the values, retain only indices with map
reverse again
Even though it looks somewhat monstrous, it should all be lazy, and stop after it finds the first two elements that you are looking for. I haven't compared it with a simple loop, maybe just using a loop would be both simpler and faster.
Any solution you'd find will iterate over the list even if the loop is 'hidden' inside a function.
The solution to your problem depends on the assumptions you can make e.g. is the list sorted?
for the general case I'd iterate over the loop starting at the end:
def find(condition, k, v):
indices = []
for i, var in enumerate(reversed(v)):
if condition(var):
indices.append(len(v) - i - 1)
if len(indices) >= k:
break
return indices
The condition should then be passed as a function, so you can use a lambda:
v = range(10)
find(lambda x: x < 5, 3, v)
will output
[4, 3, 2]
I'm not aware of a "good" numpy solution to short-circuiting.
The most principled way to go would be using something like Cython which to brutally oversimplify it adds fast loops to Python. Once you have set that up it would be easy.
If you do not want to do that you'd have to employ some gymnastics like:
import numpy as np
def find_last_k(vector, condition, k, minchunk=32):
if k > minchunk:
minchunk = k
l, r = vector.size - minchunk, vector.size
found = []
n_found = 0
while r > 0:
if l <= 0:
l = 0
found.append(l + np.where(condition(vector[l:r]))[0])
n_found += len(found[-1])
if n_found >= k:
break
l, r = 3 * l - 2 * r, l
return np.concatenate(found[::-1])[-k:]
This tries balancing loop overhead and numpy "inflexibility" by searching in chunks, which we grow exponentially until enough hits are found.
Not exactly pretty, though.
This is what I've found that seems to do this job for the example described (using argwhere which returns all indices that meet the criteria and then we find the last two of these as a numpy array):
ind = np.argwhere(v<5)
ind[-2:]
This searches through the entire array so is not optimal but is easy to code.

Having trouble with the variant of the "Two Sum" coding challenge?

The two problems seeks to find two elements x and y such that x+y=target. This can be implemented using a brute force approach.
for x in arr:
for y in arr:
if x+y==target:
return [x,y]
We are doing some redundant computation in the for loop -- that is we only want to consider combinations of two elements. We can do a N C 2 dual-loop as follows.
for i, x in enumerate(arr):
if y in arr[i+1:]:
if x+y==target:
return [x,y]
And we save a large constant factor of time complexity. Now let's note that inner most loop is a search. We can either use a hash search or a binary search for.
seen = set()
for i, x in enumerate(arr):
if target-x in seen:
y = target-x
return [x,y]
seen.add(x)
Not that seen is only of length of i. And it will only trigger when hit the second number (because it's complement must be in the set).
A variant of this problem is: to find elements that satisfy the following x-y = target. It's a simple variant but it adds a bit of logical complexity to this problem.
My question is: why does the following not work? That is, we're just modifying the previous code?
seen = set()
for i, x in enumerate(arr):
for x-target in seen:
y = x-target
return [x,y]
seen.add(x)
I've asked a friend, however I didn't understand him. He said that subtraction isn't associative. We're exploiting the associative property of addition in the two sum problem to achieve the constant time improvement. But that's all he told me. I don't get it to be honest. I still think my code should work. Can someone tell me why my code doesn't work?
Your algorithm (once the if/for mixup is fixed) still doesn't work because subtraction is not commutative. The algorithm only effectively checks x,y pairs where x comes later in the array than y. That's OK when it's testing x+y = target, since it doesn't matter which order the two values are in. But for x-y = target, the order does matter, since x - y is not the same thing as y - x.
A fix for this would be to check each number in the array to see if it could be either x or y with the other value being one of the earlier values from arr. There needs to be a different check for each, so you probably need two if statements inside the loop:
seen = set()
for n in arr:
if n-target in seen:
x = n
y = n-target
return [x,y]
if n+target in seen:
x = n+target
y = n
return [x,y]
seen.add(x)
Note that I renamed the loop variable to n, since it could be either x or y depending on how the math worked out. It's not strictly necessary to use x and y variables in the bodies of the if statements, you could do those computations directly in the return statement. I also dropped the unneeded enumerate call, since the single-loop versions of the code don't use i at all.

Creating two concatenated arrays from a generator

Consider the following example in Python 2.7. We have an arbitrary function f() that returns two 1-dimensional numpy arrays. Note that in general f() may returns arrays of different size and that the size may depend on the input.
Now we would like to call map on f() and concatenate the results into two separate new arrays.
import numpy as np
def f(x):
return np.arange(x),np.ones(x,dtype=int)
inputs = np.arange(1,10)
result = map(f,inputs)
x = np.concatenate([i[0] for i in result])
y = np.concatenate([i[1] for i in result])
This gives the intended result. However, since result may take up much memory, it may be preferable to use a generator by calling imap instead of map.
from itertools import imap
result = imap(f,inputs)
x = np.concatenate([i[0] for i in result])
y = np.concatenate([i[1] for i in result])
However, this gives an error because the generator is empty at the point where we calculate y.
Is there a way to use the generator only once and still create these two concatenated arrays? I'm looking for a solution without a for loop, since it is rather inefficient to repeatedly concatenate/append arrays.
Thanks in advance.
Is there a way to use the generator only once and still create these two concatenated arrays?
Yes, a generator can be cloned with tee:
import itertools
a, b = itertools.tee(result)
x = np.concatenate([i[0] for i in a])
y = np.concatenate([i[1] for i in b])
However, using tee does not help with the memory usage in your case. The above solution would require 5 N memory to run:
N for caching the generator inside tee,
2 N for the list comprehensions inside np.concatenate calls,
2 N for the concatenated arrays.
Clearly, we could do better by dropping the tee:
x_acc = []
y_acc = []
for x_i, y_i in result:
x_acc.append(x_i)
y_acc.append(y_i)
x = np.concatenate(x_acc)
y = np.concatenate(y_acc)
This shaved off one more N, leaving 4 N. Going further means dropping the intermediate lists and preallocating x and y. Note, that you needn't know the exact sizes of the arrays, only the upper bounds:
x = np.empty(capacity)
y = np.empty(capacity)
right = 0
for x_i, y_i in result:
left = right
right += len(x_i) # == len(y_i)
x[left:right] = x_i
y[left:right] = y_i
x = x[:right].copy()
y = y[:right].copy()
In fact, you don't even need an upper bound. Just ensure that x and y are big enough to accommodate the new item:
for x_i, y_i in result:
# ...
if right >= len(x):
# It would be slightly trickier for >1D, but the idea
# remains the same: alter the 0-the dimension to fit
# the new item.
new_capacity = max(right, len(x)) * 1.5
x = x.resize(new_capacity)
y = y.resize(new_capacity)

Python: Is this the most efficient way to reverse order without using shortcuts?

x = [1,2,3,4,5,6,7,8,9,10]
#Random list elements
for i in range(int(len(x)/2)):
value = x[i]
x[i] = x[len(x)-i-1]
x[len(x)-i-1] = value
#Confusion on efficiency
print(x)
This is a uni course for first year. So no python shortcuts are allowed
Not sure what counts as "a shortcut" (reversed and the "Martian Smiley" [::-1] being obvious candidates -- but does either count as "a shortcut"?!), but at least a couple small improvements are easy:
L = len(x)
for i in range(L//2):
mirror = L - i - 1
x[i], x[mirror] = x[mirror], x[i]
This gets len(x) only once -- it's a fast operation but there's no reason to keep repeating it over and over -- also computes mirror but once, does the swap more directly, and halves L (for the range argument) directly with the truncating-division operator rather than using the non-truncating division and then truncating with int. Nanoseconds for each case, but it may be considered slightly clearer as well as microscopically faster.
x = [1,2,3,4,5,6,7,8,9,10]
x = x.__getitem__(slice(None,None,-1))
slice is a python builtin object (like range and len that you used in your example)
__getitem__ is a method belonging to iterable types ( of which x is)
there are absolutely no shortcuts here :) and its effectively one line.

Categories