Removing an element from a list based on a condition - python

I need to remove an element (in this case a tuple) from one list based on a condition (if satisfied) in another list.
I have 2 lists (list of tuples).
List1 = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
List2 = [(1, 2), (1, 3), (1, 2), (2, 3), (2, 2), (3, 2)]
List1 is basically computed from the following code.
import pandas as pd
mapping = {'name': ['a', 'b', 'c', 'd'],'ID': [1,2,3,2]}
df = pd.DataFrame(mapping)
comb = df['name'].to_list()
List1 = list(combinations(comb,2))
# mapping the elements of the list to an 'ID' from the dataframe and creating a list based on the following code
List2 = [(df['ID'].loc[df.name == x].item(), df['ID'].loc[df.name == y].item()) for (x, y) in List1]
Now I need to apply a condition here; looking at List2, I need to look at all tuples in List2 and see if there is any tuple with same 'ID's in it. For example, in List2 I see there is (2,2). So, I want to go back to List1 based on this remove the corresponding tuple which yielded this (2,2) pair.
Essentially my final revised list should be this:
RevisedList = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('c', 'd')]
('b','d') should be removed because they yield (2,2) same IDs in a set

List1 = [('a','b'), ('a','c'), ('a','d'), ('b','c'), ('b','d')]
List2 = [(1,2), (1,3), (1,2), (2,3), (2,2)]
new_List1 = [elem for index,elem in enumerate(List1) if List2[index][0]!=List2[index][1]]
// Result: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c')]
It is not entirely clear but is this what you are looking for? new_List1 only contains those indexes where at that index List2 has two different numbers in the tuple

Related

delete tuple if item in tuple not in list in other dataframe

If i have a list of tuples, and another list on which the tuples should be conditional:
my_list=['a','e']
my_list1= ['a','b','a','c','b','d','e','f','a','h']
v=[data for data in zip(my_list1,my_list1[1::1])]
del v[1::2]
now to retain only the tuples that contain elemtns from my_list i use
v = [tup for tup in v if any(c in my_list for c in tup)]
which returns [('a', 'b'), ('a', 'c'), ('e', 'f'), ('a', 'h')]
now i want to perform the same operation iterating over dataframes
df15=pd.DataFrame({'names':[['a','c'],['k','f']]})
df14=pd.DataFrame({'tup':[[('a','g'), ('b','h'), ('c', 'i')],[('d', 'j'), ('e', 'k'), ('f', 'l')]]})
but calling
for index,row in df14.iterrows():
df15['tup']=[tup for tup in df14['tup'] if any(c in df15.names for c in tup)]
returns this error:
Length of values does not match length of index
expected output would be
df15
names tup
0 ['a','c'] [('a','g'),('c', 'i')] #not necessary to keep the tuples
1 ['k','f'] [('e', 'k'),('f','l')]

Check list of tuples where first element of tuple is specified by defined string

This question is similar to Check that list of tuples has tuple with 1st element as defined string but no one has properly answered the "wildcard" question.
Say I have [('A', 2), ('A', 1), ('B', 0.2)]
And I want to identify the tuples where the FIRST element is A. How do I return just the following?
[('A', 2), ('A', 1)]
Using a list comprehension:
>>> l = [('A', 2), ('A', 1), ('B', 0.2)]
>>> print([el for el in l if el[0] == 'A'])
[('A', 2), ('A', 1)]
You could use Python's filter function for this as follows:
l = [('A', 2), ('A', 1), ('B', 0.2)]
print filter(lambda x: x[0] == 'A', l)
Giving:
[('A', 2), ('A', 1)]
Simple enough list comprehension:
>>> L = [('A', 2), ('A', 1), ('B', 0.2)]
>>> [(x,y) for (x,y) in L if x == 'A']
[('A', 2), ('A', 1)]

Unique Combinations in a list of k,v tuples in Python

I have a list of various combos of items in tuples
example = [(1,2), (2,1), (1,1), (1,1), (2,1), (2,3,1), (1,2,3)]
I wish to group and count by unique combinations
yielding the result
result = [((1,2), 3), ((1,1), 2), ((2,3,1), 2)]
It is not important that the order is maintained or which permutation of the combination is preserved but it is very important that operation be done with a lambda function and the output format be still a list of tuples as above because I will be working with a spark RDD object
My code currently counts patterns taken from a data set using
RDD = sc.parallelize(example)
result = RDD.map(lambda(y):(y, 1))\
.reduceByKey(add)\
.collect()
print result
I need another .map command that will add account for different permutations as explained above
How about this: maintain a set that contains the sorted form of each item you've already seen. Only add an item to the result list if you haven't seen its sorted form already.
example = [ ('a','b'), ('a','a','a'), ('a','a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
result = []
seen = set()
for item in example:
sorted_form = tuple(sorted(item))
if sorted_form not in seen:
result.append(item)
seen.add(sorted_form)
print result
Result:
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
You can use an OrderedDict to crate an ordered dictionary based on sorted case of its items :
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i in example:
... d.setdefault(tuple(sorted(i)),i)
...
('a', 'b')
('a', 'a', 'a')
('a', 'a')
('a', 'b')
('c', 'd')
('b', 'c', 'a')
('b', 'c', 'a')
>>> d
OrderedDict([(('a', 'b'), ('a', 'b')), (('a', 'a', 'a'), ('a', 'a', 'a')), (('a', 'a'), ('a', 'a')), (('c', 'd'), ('c', 'd')), (('a', 'b', 'c'), ('b', 'c', 'a'))])
>>> d.values()
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
This is similar as the sorted dict.
from itertools import groupby
ex = [(1,2,3), (3,2,1), (1,1), (2,1), (1,2), (3,2), (2,3,1)]
f = lambda x: tuple(sorted(x)) as key
[tuple(k) for k, _ in groupby(sorted(ex, key=f), key=f)]
The nice thing is that you can get which are tuples are of the same combination:
In [16]: example = [ ('a','b'), ('a','a','a'), ('a','a'), ('a', 'a', 'a', 'a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
In [17]: for k, grpr in groupby(sorted(example, key=lambda x: tuple(sorted(x))), key=lambda x: tuple(sorted(x))):
print k, list(grpr)
....:
('a', 'a') [('a', 'a')]
('a', 'a', 'a') [('a', 'a', 'a')]
('a', 'a', 'a', 'a') [('a', 'a', 'a', 'a')]
('a', 'b') [('a', 'b'), ('b', 'a')]
('a', 'b', 'c') [('b', 'c', 'a'), ('a', 'b', 'c')]
('c', 'd') [('c', 'd')]
What you actually seem to need based on the comments, is map-reduce. I don't have Spark installed, but according to the docs (see transformations) this must be like this:
data.map(lambda i: (frozenset(i), i)).reduceByKey(lambda _, i : i)
This however will return (b, a) if your dataset has (a, b), (b, a) in that order.
I solved my own problem, but it was difficult to understand what I was really looking for I used
example = [(1,2), (1,1,1), (1,1), (1,1), (2,1), (3,4), (2,3,1), (1,2,3)]
RDD = sc.parallelize(example)
result = RDD.map(lambda x: list(set(x)))\
.filter(lambda x: len(x)>1)\
.map(lambda(x):(tuple(x), 1))\
.reduceByKey(add)\
.collect()
print result
which also eliminated simply repeated values such as (1,1) and (1,1,1) which was of added benefit to me
Since you are looking for a lambda function, try the following:
lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
You can use this lambda function like so:
uniquify = lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
result = uniquify(example)
Obviously, this sacrifices readability over the other answers. It is basically doing the same thing as Kasramvd's answer, in a single ugly line.

List comprehension behavior in python

I am working with Codeskulptor on a rock collision problem. I want to check collisions between rocks and my rocks are in a list. I came up with the solution to build a list of combinations and then check for collision.
I do not have itertools available.
My combination list was created like this:
def combinations(items):
n_items = [(n,item) for n,item in enumerate(items)]
return [(item,item2) for n,item in n_items for m,item2 in n_items[n:] if n != m]
letters = ['A','B','C','D']
print combinations(letters)
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
The result is ok.
I tried to do this in a one liner before with functions:
def combinations2(items):
return [(item,item2) for n,item in enumerate(items) for m,item2 in enumerate(items[n:]) if n != m]
letters = ['A','B','C','D']
print combinations2(letters)
But the outcome is completely different and wrong:
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'B'), ('B', 'D'), ('C', 'C'), ('C', 'D'), ('D', 'D')]
List comprehension is still a little black magic to me. I cannot explain this behavior, would like to understand the wrong out though.
I know that my two line solution is much faster, since enumerate is only done once and than used. But the wrong output is unexplainable to me, especially as BC is missing and BB CC DD doubles are there while AA is missing.
Can someone help me?
First thing to do when understanding a list comprehension is to expand it to a regular set of for loops. Read the loops from left to right and nest accordingly.
Working code:
def combinations(items):
n_items = []
for n,item in enumerate(items):
n_items.append((n,item))
result = []
for n, item in n_items:
for m, item2 in n_items[n:]:
if n != m:
result.append((item, item2))
return result
and your attempt that doesn't work:
def combinations2(items):
result = []
for n, item in enumerate(items):
for m, item2 in enumerate(items[n:]):
if n != m:
result.append((item, item2))
return result
Perhaps this way it is easier to see what goes wrong between the two versions.
Your version slices just items, not the indices produced by enumerate(). The original version slices [(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D')] down to [(1, 'B'), (2, 'C'), (3, 'D')], etc. while your version re-numbers that slice to [(0, 'B'), (1, 'C'), (2, 'D')]. This in turn leads to your erroneous output.
Start the inner loop at the higher index by adding a second argument to the enumerate() function, the index at which to start numbering:
def combinations2(items):
result = []
for n, item in enumerate(items):
for m, item2 in enumerate(items[n:], n):
if n != m:
result.append((item, item2))
return result
Back to a one-liner:
def combinations2(items):
return [(item, item2) for n, item in enumerate(items) for m, item2 in enumerate(items[n:], n) if n != m]
This then works correctly:
>>> def combinations2(items):
... return [(item, item2) for n, item in enumerate(items) for m, item2 in enumerate(items[n:], n) if n != m]
...
>>> letters = ['A','B','C','D']
>>> combinations2(letters)
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
Note that you can simplify it further; the only time when n == m is True is for the first iteration of each inner loop. Just slice the items list for the inner list one value further; start the outer enumerate() at 1, drop the inner enumerate() and drop the n != m test:
def combinations3(items):
result = []
for n, item in enumerate(items, 1):
for item2 in items[n:]:
result.append((item, item2))
return result
or as a list comprehension:
def combinations3(items):
return [(item, item2) for n, item in enumerate(items, 1) for item2 in items[n:]]
Just skip the clashes in the iterator.
>>> letter = ['A', 'B', 'C', 'D']
>>> list ( (x,y) for n, x in enumerate(letter) for y in letter[n+1:])
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
Suppose you just want to get the list of combinations.
def combinations2(items):
return filter(lambda (i,j): i <> j, [(i,j) for i in items for j in items])
letters = ['A','B','C','D']
print combinations2(letters)
The output I got is:
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'A'), ('B', 'C'), ('B', 'D'), ('C', 'A'), ('C', 'B'), ('C', 'D'), ('D', 'A'), ('D', 'B'), ('D', 'C')]

combine list elements

How can I merge/combine two or three elements of a list. For instance, if there are two elements, the list 'l'
l = [(a,b,c,d,e),(1,2,3,4,5)]
is merged into
[(a,1),(b,2),(c,3),(d,4),(e,5)]
however if there are three elements
l = [(a,b,c,d,e),(1,2,3,4,5),(I,II,II,IV,V)]
the list is converted into
[(a,1,I),(b,2,II),(c,3,III),(d,4,Iv),(e,5,V)]
Many thanks in advance.
Use zip:
l = [('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, 5)]
print zip(*l)
Result:
[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]

Categories