Split set in python - python

I have a set of dictionaries with some key-value pairs. I would like to know the most efficient way to split them in halves and then apply some processing on each set. I suppose there exists some one liner out there...
i.e. if I have the dictionaries A,B,C,D, I would like to have the resulting sets: (A,B), (A,C), (A,D) and NOT the remaining sets (C,D),(B,D),(B,C)

itertools and one-liners usually belong in the same sentence:
>>> import itertools
>>> s = ['A', 'B', 'C', 'D']
>>> i = itertools.product(s[0], s[1:])
>>> list(i)
[('A', 'B'), ('A', 'C'), ('A', 'D')]

may be something like this:
example:
In [17]: from itertools import *
In [18]: lis=('a','b','c','d')
In [19]: for x in islice(combinations(lis,2),len(lis)-1):
print x,
....:
....:
('a', 'b') ('a', 'c') ('a', 'd')

Try this:
l = ['a','b','c','d']
def foo(l):
s0 = None
for i in l:
if s0 is None:
s0=i
continue
yield (s0,i)
for k in foo(l):
print k
outputs:
('a', 'b')
('a', 'c')
('a', 'd')

With all due respect, itertools is clearly an overkill:
>>> s = 'ABCDE'
>>> [(s[0], x) for x in s[1:]]
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('A', 'E')]
>>>

Related

I need to create to For loops that run to combine tuples [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 7 years ago.
I have two tuples:
t1 = ('A', 'B')
t2 = ('C', 'D', 'E')
I wonder how to create combinations between tuples, so the result should be:
AC, AD, AE, BC, BD, BE
EDIT
Using
list(itertools.combinations('abcd',2))
I could generate list of combinations for a given string:
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
If I insert tuple instead of string the following error occurs:
TypeError: sequence item 0: expected string, tuple found
Any suggestion how to proceed?
itertools.product does exactly what you are looking for:
>>> import itertools
>>> t1 = ('A', 'B')
>>> t2 = ('C', 'D', 'E')
>>> list(itertools.product(t1, t2))
[('A', 'C'), ('A', 'D'), ('A', 'E'), ('B', 'C'), ('B', 'D'), ('B', 'E')]
>>> [''.join(x) for x in itertools.product(t1, t2)]
['AC', 'AD', 'AE', 'BC', 'BD', 'BE']
for value_one in t1:
for value_two in t2:
result = (str(value_one), str(value_two))
print result
This uses no external libraries. Literally just two for loops and string concatenation. Format the output however you'd like.
This seems like you either did not put any effort into finding this answer, or I am misinterpreting the question.
Edit: I see that you are coming from an R background, so you may not understand Python syntax. Please refer to guides for Python basics; I believe they will help greatly. To save the values from the loop, as #Naman mentioned, you will want to make an empty list and [list_name].append([value]) the desired values, then print the values in the list using other constructs.
Here is my simple method:
for i in t1:
for j in t2:
print(i+j,end="")
This three line of input gives the above combinations.
t = []
for x in t1:
for y in t2:
t.append(x+y)
t = tuple(t)
So iterate over both tuples, append every combination to a list and then convert the list back to a tuple.
You can print what you want by iterating over both the tuples(or make an empty list and store your output in that list)
l = []
for c1 in t1:
for c2 in t2:
print c1 + c2 + ',',
l.append(c1 + c2)
Finally list l will contain the output elements. You can process its elements or make a tuple of it by
t = tuple(l)
All possible combinations:
import itertools
t1 = ('A', 'B')
t2 = ('C', 'D', 'E')
print(tuple(itertools.combinations(t1 + t2, 2)))
Output: (('A', 'B'), ('A', 'C'), ('A', 'D'), ('A', 'E'), ('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E'))

Finding match from a list of tuples

I have a list of tuples as below.
x = [('b', 'c'),
('c',),
('a', 'c', 'b'),
('b', 'c', 'a', 'd'),
('b', 'c', 'a'),
('a', 'b'),
('a', 'b', 'c', 'd'),
('a', 'c', 'b', 'd'),
('b',),
('c', 'a'),
('a', 'b', 'c'),
('a',)]
I want to give input like ('a') then it should give output like,
[('a', 'c', 'b'), ('a', 'b'),('a', 'b', 'c', 'd'),('a', 'c', 'b', 'd'),('a', 'b', 'c')]
#everything starts with a. But not "a".
or for input of ('a','b') it should give an output of
[('a', 'b', 'c', 'd'),('a', 'b', 'c')]
#everything start with ('a','b') but not ('a','b') itself.
I tried to use but no success.
print(filter(lambda x: ("a","b") in x, x))
>>> <filter object at 0x00000214B3A545F8>
def f(lst, target):
return [t for t in lst if len(t) > len(target) and all(a == b for a, b in zip(t, target))]
so that:
f(x, ('a', 'b'))
returns:
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
Tuples are matched lexicographically in python, meaning that there elements are compared pair by pair, regardless of their type.
You can extract the portion of each tuple of the same length as your prefix and compare with ==:
def find_prefixes(prefix, sequence):
n = len(prefix)
return[x for x in sequence if x[:n] == prefix and len(x) > n]
List comprehensions of this type are indeed equivalent to filter calls, so you can do
def find_prefixes(prefix, sequence):
n = len(prefix)
return list(filter(lambda x: x[:n] == prefix and len(x) > n, sequence))
Doing a linear search is not a very efficient way to solve this problem. The data structure known as a Trie is made specifically for finding prefixes. It arranges all your data into a single tree. Here is a popular Python implementation you can use with the appropriate attribution: https://stackoverflow.com/a/11016430/2988730
Firstly, use list(filter(...)) to convert a filter object to a list, but your filter doesn't do what you want - it checks membership, not subsequence. You can check subsequence by using a slice.
Then you just need to add a check that the match is longer than the subsequence.
Also, a filter of a lambda is better written as a comprehension.
for sub in ('a',), ('a', 'b'):
n = len(sub)
out = [t for t in x if t[:n] == sub and len(t) > n]
print(out)
Output:
[('a', 'c', 'b'), ('a', 'b'), ('a', 'b', 'c', 'd'), ('a', 'c', 'b', 'd'), ('a', 'b', 'c')]
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
list(filter(lambda y: all([y[i] == z for i,z in enumerate(inp)]) if len(y)>=len(inp) else False, x))
for
inp = ('a', 'b')
output will be
[('a', 'b'), ('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

How to get combination in python?

I have a list like below, I want to find simple permutation with little bit modification,
For Example
l=['a', 'b']
Output:
[('a', 'a'), ('a', 'b'), ('b', 'b')]
I followed,
Try-1
list(itertools.product(L, repeat=2))
returns,
[('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')]
Try -2
print list(itertools.permutations(['a', 'b']))
returns,
[('a', 'b'), ('b', 'a')]
Try-3
i can do like below,
temp= [tuple(sorted((i,j))) for i in ['a', 'b'] for j in ['a', 'b']]
print list(set(temp))
But it seems inefficient way to solve this.
Use combinations_with_replacement:
from itertools import combinations_with_replacement
l=['a', 'b']
for c in combinations_with_replacement(l, 2):
print(c)
Output
('a', 'a')
('a', 'b')
('b', 'b')

Unique Combinations in a list of k,v tuples in Python

I have a list of various combos of items in tuples
example = [(1,2), (2,1), (1,1), (1,1), (2,1), (2,3,1), (1,2,3)]
I wish to group and count by unique combinations
yielding the result
result = [((1,2), 3), ((1,1), 2), ((2,3,1), 2)]
It is not important that the order is maintained or which permutation of the combination is preserved but it is very important that operation be done with a lambda function and the output format be still a list of tuples as above because I will be working with a spark RDD object
My code currently counts patterns taken from a data set using
RDD = sc.parallelize(example)
result = RDD.map(lambda(y):(y, 1))\
.reduceByKey(add)\
.collect()
print result
I need another .map command that will add account for different permutations as explained above
How about this: maintain a set that contains the sorted form of each item you've already seen. Only add an item to the result list if you haven't seen its sorted form already.
example = [ ('a','b'), ('a','a','a'), ('a','a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
result = []
seen = set()
for item in example:
sorted_form = tuple(sorted(item))
if sorted_form not in seen:
result.append(item)
seen.add(sorted_form)
print result
Result:
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
You can use an OrderedDict to crate an ordered dictionary based on sorted case of its items :
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i in example:
... d.setdefault(tuple(sorted(i)),i)
...
('a', 'b')
('a', 'a', 'a')
('a', 'a')
('a', 'b')
('c', 'd')
('b', 'c', 'a')
('b', 'c', 'a')
>>> d
OrderedDict([(('a', 'b'), ('a', 'b')), (('a', 'a', 'a'), ('a', 'a', 'a')), (('a', 'a'), ('a', 'a')), (('c', 'd'), ('c', 'd')), (('a', 'b', 'c'), ('b', 'c', 'a'))])
>>> d.values()
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
This is similar as the sorted dict.
from itertools import groupby
ex = [(1,2,3), (3,2,1), (1,1), (2,1), (1,2), (3,2), (2,3,1)]
f = lambda x: tuple(sorted(x)) as key
[tuple(k) for k, _ in groupby(sorted(ex, key=f), key=f)]
The nice thing is that you can get which are tuples are of the same combination:
In [16]: example = [ ('a','b'), ('a','a','a'), ('a','a'), ('a', 'a', 'a', 'a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
In [17]: for k, grpr in groupby(sorted(example, key=lambda x: tuple(sorted(x))), key=lambda x: tuple(sorted(x))):
print k, list(grpr)
....:
('a', 'a') [('a', 'a')]
('a', 'a', 'a') [('a', 'a', 'a')]
('a', 'a', 'a', 'a') [('a', 'a', 'a', 'a')]
('a', 'b') [('a', 'b'), ('b', 'a')]
('a', 'b', 'c') [('b', 'c', 'a'), ('a', 'b', 'c')]
('c', 'd') [('c', 'd')]
What you actually seem to need based on the comments, is map-reduce. I don't have Spark installed, but according to the docs (see transformations) this must be like this:
data.map(lambda i: (frozenset(i), i)).reduceByKey(lambda _, i : i)
This however will return (b, a) if your dataset has (a, b), (b, a) in that order.
I solved my own problem, but it was difficult to understand what I was really looking for I used
example = [(1,2), (1,1,1), (1,1), (1,1), (2,1), (3,4), (2,3,1), (1,2,3)]
RDD = sc.parallelize(example)
result = RDD.map(lambda x: list(set(x)))\
.filter(lambda x: len(x)>1)\
.map(lambda(x):(tuple(x), 1))\
.reduceByKey(add)\
.collect()
print result
which also eliminated simply repeated values such as (1,1) and (1,1,1) which was of added benefit to me
Since you are looking for a lambda function, try the following:
lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
You can use this lambda function like so:
uniquify = lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
result = uniquify(example)
Obviously, this sacrifices readability over the other answers. It is basically doing the same thing as Kasramvd's answer, in a single ugly line.

Contracting elements from two different lists

I have two different lists list1 = ['A','B'] and list2 = ['C','D','E']. I would like to be able to find all possible contractions between the elements of the two lists. For the present case I would like to have a code (preferably Python, Mathematica or MATLAB) that takes the lists above and returns:
AC,BD , AC,BE , AD,BC , AD,BE , AE,BC , AE,BD
which are all the possible contractions. I would like to be able to do this for lists of variable size (but always 2 of them). I've played a lot with Python's itertools but I can't get the hang of how it works with two lists. Any help would be much appreciated.
Here is my version:
import itertools
l1 = 'AB'
l2 = 'CDE'
n = min(len(l1),len(l2))
print('; '.join(
','.join(a+b for a,b in zip(s1,s2))
for s1,s2 in itertools.product(
itertools.permutations(l1,n),
itertools.combinations(l2,n),
)
))
This will output:
AC,BD; AC,BE; AD,BE; BC,AD; BC,AE; BD,AE
Note that for shortness, I did not build a list of the items, but directly iterated the strings. It does not matter which of the two lists gets permutations and which gets combinations, that just changes the order of the output. permutations takes all possible orderings, while combinations returns sorted orderings. This way, you get each contraction exactly once.
For each contraction, you will get two sequences s1 and s2, the contraction is between elements of like index in each sequence. ','.join(a+b for a,b in zip(s1,s2)) makes a nice string for such a contraction.
listA = {"A", "B"};
listB = {"C", "D", "E"};
f[x_, y_] := If[StringMatchQ[StringTake[x, {2}], StringTake[y, {2}]],
Sequence ## {}, List[x, y]];
z = Outer[StringJoin, listA, listB];
Flatten[Outer[f, First#z, Last#z], 1]
In [2]: list1 = ['A','B']
In [3]: list2 = ['C','D','E']
In [4]: list(itertools.product(list1, list2))
Out[4]: [('A', 'C'), ('A', 'D'), ('A', 'E'), ('B', 'C'), ('B', 'D'), ('B', 'E')]
In [5]: [''.join(p) for p in itertools.product(list1, list2)]
Out[5]: ['AC', 'AD', 'AE', 'BC', 'BD', 'BE']
If you're asking about how to build all permutations of the items contained within both lists, with no repetitions, with each result of length two, you could use itertools.permutation:
combined_list = []
for i in list1 + list2:
if i not in combined_list:
combined_list.append(i)
for perm in itertools.permutations(combined_list, 2):
print(perm)
For the inputs list1 = ['a', 'b'] and list2 = ['c', 'd', 'e'], this outputs:
('a', 'b') ('a', 'c') ('a', 'd') ('a', 'e') ('b', 'a') ('b', 'c') ('b', 'd') ('b', 'e') ('c', 'a') ('c', 'b') ('c', 'd') ('c', 'e') ('d', 'a') ('d', 'b') ('d', 'c') ('d', 'e') ('e', 'a') ('e', 'b') ('e', 'c') ('e', 'd')

Categories