Set comprehension and different comparable relations

Set comprehension and different comparable relations - python

I have set objects which are comparable in some way and I want to remove objects from the set. I thought about how this problem changes, for different comparable relations between the elements. I was interested in the development of the search space, the use of memory and how the problem scales.
1st scenario: In the most simple scenario the relation would be bidirectional and thus we could remove both elements, as long as we can ensure that by removing the elements do not remove other 'partners'.
2nd scenario: The comparable relation is not bidirectional. Remove only the element in question, not the one it is comparable to. A simplified scenario would be the set consisting of integers and the comparable operation would be 'is dividable without rest'
I can do the following, not removing elements:
a_set = set([4,2,3,7,9,16])
def compare(x, y):
if (x % y == 0) and not (x is y):
return True
return False
def list_solution():
out = set()
for x in a_set:
for y in a_set:
if compare(x,y):
out.add(x)
break
result = a_set-out
print(result)
Of course, my first question as a junior Python programmer would be:
What is the appropriate set comprehension for this?
Also: I cannot modify the size of Python sets during iteration, without copying, right?
And now for the algo people out there: How does this problem change if the relation between the number of elements an element can be comparable to increases. How does it change if the comparable relation represent partial orders?

I will precede with confirming your claim - changing set while iterating it would trigger a RuntimeError, which would claim something along the lines of "Set changed size during iteration".
Now, lets start from the compare function: since you are using sets, x is y is probably like x == y and the last one is always a better choice when possible.
Furthermore, no need for the condition; you are already performing one:
def compare (x, y):
return x != y and x % y == 0
Now to the set comprehension - that's a messy one. After setting the set as an argument - which is better than using global variable - the normal code would be something like
for x in my_set:
for y in my_set:
if compare(x, y):
for a in (x, y):
temp.append(a)
notice the last two lines, which do not use unpacking because that would be impossible in the comprehension. Now, all what's left is to move the a to the front and make all : disappear - and the magic happens:
def list_solution (my_set):
return my_set - {a for x in my_set for y in my_set if compare(x, y) for a in (x, y)}
you can test it with something like
my_set = set([4, 2, 3, 7, 9, 16])
print(list_solution(my_set)) # {7}
The condition and the iteration on (x, y) can switch places - but I believe that iterating after confirming would be faster then going in and starting to iterate when there's a chance you wouldn't perform any action.
The change for the second case is minor - merely using x instead of the x, y unpacking:
def list_solution_2 (my_set):
return my_set - {x for x in my_set for y in my_set if compare(x, y)}

Related

Nested generator vs nested list comprehension inside generator

I have 2 generators. One has a nested generator, and the other has a nested list comprehension.
// list of variables
variables = []
nestedGen = (x for x in (y for y in variables))
nestedList = (x for x in [y for y in variables])
Both generators can be simplified to remove nesting, but are they identical in terms of function unchanged?

There's a difference if variables gets modified.
Reassigning variables won't do anything, because both versions retrieve variables up front. The first for target in a genexp is evaluated immediately, unlike the rest of the genexp. For nestedList, that means evaluating the list comprehension immediately. For nestedGen, that means evaluating (but not iterating over) the nested genexp (y for y in variables), and evaluating that evaluates variables immediately.
Mutating the variables list, however, will affect nestedGen but not nestedList. nestedList immediately iterates over variables and builds a new list with the same contents, so adding or removing elements from variables won't affect nestedList. nestedGen doesn't do that, so adding or removing elements from variables will affect nestedGen.
For example, the following test:
variables = [1]
nestedGen = (x for x in (y for y in variables))
nestedList = (x for x in [y for y in variables])
variables.append(2)
variables = []
print(list(nestedGen))
print(list(nestedList))
prints
[1, 2]
[1]
The append affected only nestedGen. The variables = [] reassignment didn't affect either genexp.
Aside from that, there's also a difference in memory consumption, since nestedList builds a copy of variables with the list comprehension. That doesn't affect output, but it's still a practical consideration to keep in mind.

Yes, both expressions will produce a similar result. When the outer generator is enumerated, the inner sequence will be iterated and then disposed.
The difference is that, in the second case, a list object will have to be instantiated to hold the sequence of elements and then discarded. This is not a huge penalty, unless you are doing this a lot. The first option is preferred, in the general case.
** Important Note: ***
This would NOT be equivalent if you decide to store the inner sequence in a variable!
variables = (y for y in "abdce")
nestedGenList = list(x for x in variables)
nestedGenList = list(x for x in variables) #// will be empty!
variables = [y for y in "abdce"]
nestedGenList = list(x for x in variables)
nestedGenList = list(x for x in variables) #// ok!
In the first case, the iterator will already be fully iterated after the first use, and when tried to iterate the variable again, it will be empty.

Write an generator/iterator expression for this sequence

I have this exercise that I fail to understand
Suppose we are given a list X of integers. We need to construct a sequence of indices (positions) of the elements in this list equal to the maximal element. The indicies in the sequence are in the ascending order.
Hint use the enumerator function
from typing import Iterator
X = [1,10,3,4,10,5]
S : Iterator[int] = YOUR_EXPRESSION
assert list(S)==[1,4]
This is the only thing I could come up with, but for sure it does not return [1,4]
If you wondering what I don't understand, it is not clear from reading the description how it could return [1,4].
Maybe you want to try to explain that to me first...
This is my (wrong) solution
my_enumerate=enumerate (X)
my_enumerate=(list(my_enumerate))
my_enumerate.sort(reverse=True)

So you have the list X containing [1,10,3,4,10,5]. The maximal, or largest, element is 10. Which means we should return a list of all the indices where we find 10. There are two 10s at index 1 and 4 respectively.
Using enumerate you get at each iteration the index and element. You can use that to filter out the elements you don't need. List comprehensions are useful in this case, allowing for filtering with the if syntax i.e. [val for val in items if some_condition]

you can use a generator like this
max_val=max(X)
s = (i for i, v in enumerate(X) if v==max_val)

This is my solution
( x[0] for x in enumerate (X) if x[1] == max(X) )
this is the book solution
(i for (i, n) in enumerate(X) if n == max(X))

This requires two steps:
Determine the maximum value with max
Iterate the indices of your list and retain those that have this maximum value
To avoid a bad time complexity, it is necessary to not repeat the first step:
S : Iterator[int] = (lambda mx:
(i for i, x in enumerate(X) if x == mx)
)(max(X))
The reason for presenting the code in such ugly expression, is that in the question it seems a requirement to follow the template, and only alter the part that is marked with "YOUR_EXPRESSION".
This is not how you would write it without such artificial constraints. You would just do mx = max(X) and then assign the iterator to S in the next statement without the need for this inline lambda.

List Comprehension Python Prime numbers

I came across a solution on Stack Overflow to generate prime numbers using list comprehension. But was unable to understand what does the inner for loop do.
I have tried something like
[x for x in range(5,20) for y in range(2,int(x/2)+1) if any(x%y == 0)]
It throws an error: 'bool' object is not iterable
I know that my syntax is wrong but logically for prime numbers we have a for loop followed by a for loop and then a if condition to calculate remainder(x%y).
But the answer on Stack Overflow is
[x for x in range(2, 20) if all(x % y != 0 for y in range(2, x))]
I understood the reason why all is used, but I am unable to get how the condition inside all() is working as ideally for should be following if so that range(2,x) is iterated and y gets values which are in turn used for computing(x%y). How can y be used even before it is has been assigned a value.

That is just the wonderful thing about list comprehension if it can work normally like the for loop, people wont create it because the for loop is more readable and understandable.
You may find out that the result of list comprehension is always a list, meanwhile the result of for loop would always many single values and these single values is a part of iterable
[x +1 for x in range(1,5)]
[2, 3, 4, 5]
for x in range (1,10): print(x+1)
2
3
4
5
You can simply understand that the loop comprehension already have the list of values, then they just simply feed orderly to the condition value by value. Like this:
[1+1 , 2+1 , 3+1 , 4+1]
Your code is wrong because you inherit too much from the ordinary for loop. Your code written in for loop would be like this:
for x in range(5,20):
for y in range(2,int(x/2)+1):
if any(x%y == 0):
print(x)
And the result would obviously:
TypeError: 'bool' object is not iterable
because any requires an iterable such as a generator expression or a **list** as mentioned above by #meowgoesthedog . Coincidentally, list is just all about list comprehension. However, you need comprehend it in order to utilize the list comprehension well. It sometimes happens to me too, in your case, the for y in range(2,int(x/2)+1) works as a normal for loop.
This is the syntax of list comprehension.
In side the condition if which is optional predicate. We can create another list comprehension by following the rules with x%y==0 is output expression and a variable y representing members of the input sequence range(2,int(x/2)+1)

all() and any() works on itterable objects. For example all([True, True, False, True]) returns False. You cant use any(True) (like in your example: any(x%y == 0))
This statement [x for x in range(2, 20) if all(x % y != 0 for y in range(2, x))] can be translated to this code:
res = []
for x in range(2, 20):
temporary_list = (x%y != 0 for y in range(2,x))
if all(temporary_list):
res.append(x)
Ps. I saw in comments that you are not sure how y is declared. In python, there are more great structures than list of comprehension. One of them is generator of comprehension - I believe it is used in this case.

The syntax all and any work on iterable objects (list, sets, etc). Therefore you get an error when you apply it on boolean - x%y==0.
You can use any in the following manner -
[x for x in range(5,20) if not any([x % y == 0 for y in range(2, int(x/2)+1)])]
or -
[x for x in range(2, 20) if not any(x % y == 0 for y in range(2, int(x/2)+1))]
As any and all complement each other.

python intersection of lists while not having the same index

I have a curious case, and after some time I have not come up with an adequate solution.
Say you have two lists and you need to find items that have the same index.
x = [1,4,5,7,8]
y = [1,3,8,7,9]
I am able to get a correct intersection of those which appear in both lists with the same index by using the following:
matches = [i for i, (a,b) in enumerate(zip(x,y)) if a==b)
This would return:
[0,3]
I am able to get a a simple intersection of both lists with the following (and in many other ways, this is just an example)
intersected = set(x) & set(y)
This would return this list:
[1,8,7,9]
Here's the question. I'm wondering for some ideas for a way of getting a list of items (as in the second list) which do not include those matches above but are not in the same position on the list.
In other words, I'm looking items in x that do not share the same index in the y
The desired result would be the index position of "8" in y, or [2]
Thanks in advance

You're so close: iterate through y; look for a value that is in x, but not at the same position:
offset = [i for i, a in enumerate(y) if a in x and a != x[i] ]
Result:
[2]
Including the suggested upgrade from pault, with respect to Martijn's comment ... the pre-processing reduces the complexity, in case of large lists:
>>> both = set(x) & set(y)
>>> offset = [i for i, a in enumerate(y) if a in both and a != x[i] ]
As PaulT pointed out, this is still quite readable at OP's posted level.

I'd create a dictionary of indices for the first list, then use that to test if the second value is a) in that dictionary, and b) the current index is not present:
def non_matching_indices(x, y):
x_indices = {}
for i, v in enumerate(x):
x_indices.setdefault(v, set()).add(i)
return [i for i, v in enumerate(y) if i not in x_indices.get(v, {i})]
The above takes O(len(x) + len(y)) time; a single full scan through the one list, then another full scan through the other, where each test to include i is done in constant time.
You really don't want to use a value in x containment test here, because that requires a scan (a loop) over x to see if that value is really in the list or not. That takes O(len(x)) time, and you do that for each value in y, which means that the fucntion takes O(len(x) * len(y)) time.
You can see the speed differences when you run a time trial with a larger list filled with random data:
>>> import random, timeit
>>> def using_in_x(x, y):
... return [i for i, a in enumerate(y) if a in x and a != x[i]]
...
>>> x = random.sample(range(10**6), 1000)
>>> y = random.sample(range(10**6), 1000)
>>> for f in (using_in_x, non_matching_indices):
... timer = timeit.Timer("f(x, y)", f"from __main__ import f, x, y")
... count, total = timer.autorange()
... print(f"{f.__name__:>20}: {total / count * 1000:6.3f}ms")
...
using_in_x: 10.468ms
non_matching_indices: 0.630ms
So with two lists of 1000 numbers each, if you use value in x testing, you easily take 15 times as much time to complete the task.

x = [1,4,5,7,8]
y = [1,3,8,7,9]
result=[]
for e in x:
if e in y and x.index(e) != y.index(e):
result.append((x.index(e),y.index(e),e))
print result #gives tuple with x_position,y_position,value
This version goes item by item through the first list and checks whether the item is also in the second list. If it is, it compares the indices for the found item in both lists and if they are different then it stores both indices and the item value as a tuple with three values in the result list.

python iterate over the two lists while comparing items

i have two lists eg x = [1,2,3,4,4,5,6,7,7] y = [3,4,5,6,7,8,9,10], i want to iterate over the two lists while comparing items. For those that match, i would like to call some function and remove them from the lists, in this example i should end up with x= [1,2] and y = [8,9,10]. Sets will not work for this problem because of my type of data and the comparison operator.
for i in x:
for j in y:
if i ==j:
callsomefunction(i,j)
remove i, j from x and y respectively

Edit: After discovering the person asking the question simply didn't know about __hash__ I provided this information in a comment:
To use sets, implement __hash__. So if obj1 == obj2 when obj1.a == obj2.a and ob1.b == obj2.b, __hash__ should be return hash((self.a, self.b)) and your sets will work as expected.
That solved their problem, and they switched to using sets.
The rest of this answer is now obsolete, but it's still correct (but horribly inefficient) so I'll leave it here.
This code does what you want. At the end, newx and newy are the non-overlapping items of x and y specifically.
x = [1,2,3,4,4,5,6,7,7]
y = [3,4,5,6,7,8,9,10]
# you can leave out bad and just compare against
# x at the end if memory is more important than speed
newx, bad, newy = [], [], []
for i in x:
if i in y:
callsomefunction(i)
bad.append(i)
else:
newx.append(i)
for i in y:
if i not in bad:
newy.append(i)
print newx
print newy
However, I know without even seeing your code that this is the wrong way to do this. You can certainly do it with sets, but if you don't want to, that's up to you.

Ok, discard my post, I hadn't seen the point where you mentionned that sets wouldn't work.
Nevertheless, if you're OK with a little work, you might want to use classes so that operators do work as they are expected to.
I think the most "pythonistic" way of doing this is to use sets.
You could then do :
x = set([1,2,3,4,4,5,6,7,7])
y = set([3,4,5,6,7,8,9,10])
for item in x.intersection(y): #y.intersection(x) is fine too.
my_function(item) #You could use my_function(item, item) if that's what your function requires
x.remove(item)
y.remove(item)
I think that sets are also more efficient than lists for this kind of work when it comes down to performance (though this might not be your top priority).
On a sidenote, you could also use:
x,y = x.difference(y), y.difference(x)
This effectively removes items that are in x and y from x and y.

Try this:
for i in x:
if i in y:
callsomefunction(i)
x.remove(i)
y.remove(i)
EDIT: updated answer

how about this:
import itertools
x = [1,2,3,4,4,5,6,7,7]
y = [3,4,5,6,7,8,9,10]
output = map(somefunction, itertools.product(x,y))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Set comprehension and different comparable relations - python

Related

Nested generator vs nested list comprehension inside generator

Write an generator/iterator expression for this sequence

List Comprehension Python Prime numbers

python intersection of lists while not having the same index

python iterate over the two lists while comparing items

Categories

Resources