Apriori create 3 set of word from 2 set

Apriori create 3 set of word from 2 set - python

I'm doing implement on Apriori algorithm at the moment I am stuck to create 3 set of word
Suppose I have list of 2 words like this
FI2 = [('a','b'),('a','c'),('a','d'),('b','d'),('b','e'),('e','f')];
First approach I did with by distinct all element into 1 word and using itertools.combinations of 3 which is the compute expesive and not right approach since the result should be subset from C2
It should be like this result
C3 = [('a','b','c'),('a','b','d'),('a','c','d'),('b','d','e')]
I am having a problem how to approach this problem. I would be appreciate how to give me some guideline how to do this one

any chance C3 is missing some values? ('b','e','f'), ('a','b','e')
im sure it's not the best way but its a start:
from itertools import combinations
FI2 = [('a','b'),('a','c'),('a','d'),('b','d'),('b','e'),('e','f')]
# check if two tuples have at least one var in common
check_intersection = (lambda c: len(set(c[0]).intersection(set(c[1]))) > 0)
# run on all FI2 pairs combinations
# if two tuples have at least one var in common, a merged tuple is added
# remove the duplicates tuples from the new list
C3 = list(set([tuple(sorted(set(c[0] + c[1])))for c in combinations(FI2,2) if check_intersection(c)]))
print(C3)
#=> [('b', 'd', 'e'), ('a', 'b', 'e'), ('b', 'e', 'f'), ('a', 'b', 'd'), ('a','c','d'), ('a', 'b', 'c')]

Related

Combinations of a list of items in efficient way

I am trying to find if there is a more efficient way of finding these combinations using some Python scientific library.
I am trying to avoid native for loops and list append preferring to use some NumPy or similar functionality that in theory should be more efficient given it's using C code under the hood. I am struggling to find one, but to me this is quite a common problem to make these operations in an efficient way rather than using slow Python native structures.
I am wondering if I am looking in the wrong places? E.g. this does not seem to help here: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.binomial.html
See here I am taking the binomial coefficients of a list of length 5 starting from a lower bound of 2 and finding out all the possible combinations. Meanwhile I append to a global list so I then have a nice list of "taken items" from the original input list.
import itertools
input_list = ['a', 'b', 'c', 'd', 'e']
minimum_amount = 2
comb_list = []
for i in range(minimum_amount, len(input_list)):
curr_list = input_list[:i+1]
print(f"the current index is: {i}, the lists are: {curr_list}")
curr_comb_list = list(itertools.combinations(curr_list, i))
comb_list = comb_list + curr_comb_list
print(f"found {len(comb_list)} combinations (check on set length: {len(set(comb_list))})")
print(comb_list)
Gives:
found 12 combinations (check on set length: 12)
[('a', 'b'), ('a', 'c'), ('b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'd'),
('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd'), ('a', 'b', 'c', 'e'),
('a', 'b', 'd', 'e'), ('a', 'c', 'd', 'e'), ('b', 'c', 'd', 'e')]
Is it possible to do this avoiding the for loop and using some scientific libraries to do this quicker?
How can I do this in a quicker way?

The final list contains all combinations of any length from 1 to len(input_list), which is actually the Power Set.
Look at How to get all possible combinations of a list’s elements?.

You want all combinations from input_list of length 2 or more.
To get them, you can run:
comb_lst = list(itertools.chain.from_iterable(
[ itertools.combinations(input_list, i)
for i in range(2, len(input_list)) ]))
Something similiar to powerset in examples in the itertools web site,
but not exactly the same (the length starts from 2, not from 1).
Note also that curr_list in your code is actually used only for printing.

How to efficiently group pairs based on shared item?

I have a list of pairs (tuples), for simplification something like this:
L = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]
Using python I want efficiently split this list to:
L1 = [("A","B"), ("B","C"), ("C","D")]
L2 = [("E","F")]
L3 = [("G","H"), ("G","I"), ("G","J"), ("H","I")]
How to efficiently split list into groups of pairs, where for pairs in the group there must be always at least one pair which shares one item with others? As stated in one of the answers this is actually network problem. The goal is to efficiently split network into disconnected (isolated) network parts.
Type lists, tuples (sets) may be changed for achieving higher efficiency.

This is more like a network problem, so we can use networkx:
import networkx as nx
G=nx.from_edgelist(L)
l=list(nx.connected_components(G))
# after that we create the map dict , for get the unique id for each nodes
mapdict={z:x for x, y in enumerate(l) for z in y }
# then append the id back to original data for groupby
newlist=[ x+(mapdict[x[0]],)for x in L]
import itertools
#using groupby make the same id into one sublist
newlist=sorted(newlist,key=lambda x : x[2])
yourlist=[list(y) for x , y in itertools.groupby(newlist,key=lambda x : x[2])]
yourlist
[[('A', 'B', 0), ('B', 'C', 0), ('C', 'D', 0)], [('E', 'F', 1)], [('G', 'H', 2), ('H', 'I', 2), ('G', 'I', 2), ('G', 'J', 2)]]
Then to match your output format:
L1,L2,L3=[[y[:2]for y in x] for x in yourlist]
L1
[('A', 'B'), ('B', 'C'), ('C', 'D')]
L2
[('E', 'F')]
L3
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]

Initialise a list of groups as empty
Let (a, b) be the next pair
Collect all groups that contain any elements with a or b
Remove them all, join them, add (a, b), and insert as a new group
Repeat till done
That'd be something like this:
import itertools, functools
def partition(pred, iterable):
t1, t2 = itertools.tee(iterable)
return itertools.filterfalse(pred, t1), filter(pred, t2)
groups = []
for a, b in L:
unrelated, related = partition(lambda group: any(aa == a or bb == b or aa == b or bb == a for aa, bb in group), groups)
groups = [*unrelated, sum(related, [(a, b)])]

An efficient and Pythonic approach is to convert the list of tuples to a set of frozensets as a pool of candidates, and in a while loop, create a set as group and use a nested while loop to keep expanding the group by adding the first candidate set and then performing set union with other candidate sets that intersects with the group until there is no more intersecting candidate, at which point go back to the outer loop to form a new group:
pool = set(map(frozenset, L))
groups = []
while pool:
group = set()
groups.append([])
while True:
for candidate in pool:
if not group or group & candidate:
group |= candidate
groups[-1].append(tuple(candidate))
pool.remove(candidate)
break
else:
break
Given your sample input, groups will become:
[[('A', 'B'), ('C', 'B'), ('C', 'D')],
[('G', 'H'), ('H', 'I'), ('G', 'J'), ('G', 'I')],
[('E', 'F')]]
Keep in mind that sets are unordered in Python, which is why the order of the above output doesn't match your expected output, but for your purpose the order should not matter.

You can use the following code:
l = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]
result = []
if len(l) > 1:
tmp = [l[0]]
for i in range(1,len(l)):
if l[i][0] == l[i-1][1] or l[i][1] == l[i-1][0] or l[i][1] == l[i-1][1] or l[i][0] == l[i-1][0]:
tmp.append(l[i])
else:
result.append(tmp)
tmp = [l[i]]
result.append(tmp)
else:
result = l
for elem in result:
print(elem)
output:
[('A', 'B'), ('B', 'C'), ('C', 'D')]
[('E', 'F')]
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]
Note: this code is based on the hypothesis that your initial array is sorted. If this is not the case it will not work as it does only one pass on the whole list to create the groups (complexity O(n)).
Explanations:
result will store your groups
if len(l) > 1: if you have only one element in your list or an empty list no need to do any processing you have the answer
You will to a one pass on each element of the list and compare the 4 possible equality between the tuple at position i and the one at position i-1.
tmp is used to construct your groups, as long as the condition is met you add tuples to tmp
when the condition is not respected you add tmp (the current group that has been created to the result, reinitiate tmp with the current tuple) and you continue.

You can use a while loop and start iteration from first member of L(using a for loop inside). Check for the whole list if any member(either of the two) is shared or not. Then append it to a list L1 and pop that member from original list L. Then while loop would run again (till list L is nonempty). And for loop inside would run for each element in list to append to a new list L2. You can try this. (I will provide code it doesn't help)

Get unique products between lists and maintain order of input

There are quite a lot of questions about the unique (Cartesian) product of lists, but I am looking for something peculiar that I haven't found in any of the other questions.
My input will always consist of two lists. When the lists are identical, I want to get all combinations but when they are different I need the unique product (i.e. order does not matter). However, in addition I also need the order to be preserved, in the sense that the order of the input lists matters. In fact, what I need is that the items in the first list should always be the first item of the product tuple.
I have the following working code, which does what I want with the exception I haven't managed to find a good, efficient way to keep the items ordered as described above.
import itertools
xs = ['w']
ys = ['a', 'b', 'c']
def get_up(x_in, y_in):
if x_in == y_in:
return itertools.combinations(x_in, 2)
else:
ups = []
for x in x_in:
for y in y_in:
if x == y:
continue
# sort so that cases such as (a,b) (b,a) get filtered by set later on
ups.append(sorted((x, y)))
ups = set(tuple(up) for up in ups)
return ups
print(list(get_up(xs, ys)))
# [('c', 'w'), ('b', 'w'), ('a', 'w')]
As you can see, the result is a list of unique tuples that are ordered alphabetically. I used the sorting so I could filter duplicate entries by using a set. But because the first list (xs) contains the w, I want the tuples to have that w as a first item.
[('w', 'c'), ('w', 'b'), ('w', 'a')]
If there's an overlap between two lists, the order of the items that occur in both lists don't matter., so for xs = ['w', 'a', 'b'] and ys = ['a', 'b', 'c'] the order for a doesn't matter
[('w', 'c'), ('w', 'b'), ('w', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'c')]
^
or
[('w', 'c'), ('w', 'b'), ('w', 'a'), ('a', 'c'), ('b', 'a'), ('b', 'c')]
^
Preferably I'd end up with a generator (as combinations returns). I'm also only interested in Python >= 3.6.

Collect the tuples in an order-preserving way (as when the lists are identical), then filter by removing tuples whose inverse is also in the list.
if x_in == y_in:
return itertools.combinations(x_in, 2)
else:
seen = set()
for a,b in itertools.product(x_in, y_in):
if a == b or (b, a) in seen:
continue
else:
yield (a,b)
seen.add((a,b))
This will give you the tuples in (x, y) order; when both (a,b) and (b,a) occur, you get only the order seen first.

I'll give an answer to my own question, though I bet there is a better solution using itertools or others.
xs = ['c', 'b']
ys = ['a', 'b', 'c']
def get_unique_combinations(x_in, y_in):
""" get unique combinations that maintain order, i.e. x is before y """
yielded = set()
for x in x_in:
for y in y_in:
if x == y or (x, y) in yielded or (y, x) in yielded:
continue
yield x, y
yielded.add((x, y))
return None
print(list(get_unique_combinations(xs, ys)))

Can't figure out this simple recursion with Python

Maybe it's not so simple, but I am trying to essentially find all the permutations of a list of letters.
[[a,b],[c,d],[e,f]] for simplicity as it can be longer than just 3 lists of 2 (ie 6 lists of 3 letters, etc.).
I want my program to find all 8 combinations for above example while maintaining order of the main list (is that permutation?).
ace
acf
ade
adf
bce
bcf
bde
bdf
Currently I think the solution below will iterate recursively through the combinations I want; however, I cannot figure out how to store them in order because when it reaches the base condition for the first row it will simply go to the next letter in the last index of the list.
I don't believe I was able to find something that would work for me in itertools
def find_comb(mylist):
for curr_index in range(0,len(mylist)):
for letter in mylist[curr_index]:
if (curr_index+1<=len(mylist)):
next_letter=find_comb(mylist[curr_index+1:])
return 1 #wrote 1 for now because I am stumped

I think what you want is itertools.product
from itertools import product
x = [['a','b'], ['c','d'], ['e','f']]
for _ in product(*x):
print _
Prints
('a', 'c', 'e')
('a', 'c', 'f')
('a', 'd', 'e')
('a', 'd', 'f')
('b', 'c', 'e')
('b', 'c', 'f')
('b', 'd', 'e')
('b', 'd', 'f')
Regarding your comment:
product takes a bunch of iterables and generates their product, however, in your case you were sending it a single iterable (that consisted of more iterables). So instead of passing in l1, l2, l3 you were passing in[l1, l2, l3].
To actually pass in the three iterables, we have to unpack the list using the asterisk, which will turn that single list into three arguments. For more on that, see What does ** (double star) and * (star) do for parameters?

Python backtracking strings lenght n from alphabet {a,b,c} with #a=#b

i want to make an algoritm that finds for a given n the strings made on the alphabet {a,b,c} in which the number 'a' appears the same times of 'b'
i came out with this
n=3 #length String
h=-1 #length prefix
L=['a','b','c'] #alphabet
S=['','','',''] #solution
par=0 # it's zero if a and b have same occurence
def P(n,h,par,L,S):
if h==n:
if par==0:
print(S)
else:
for i in L:
if i=='a':
par+=1
if i=='b':
par-=1
S[h+1]=i
P(n,h+1,par,L,S)
#Update the stack after recursion
if S[h+1]=='a':
par-=1
if S[h+1]=='b':
par+=1
P(n,h,par,L,S)
i apologize for the poor string implementation but it works and it's only for studying purpose, the question is: there are ways to avoid some work for the algorithm? because it only checks #a and #b in the end after have generate all n-length strings for this alphabet.
my goal is to achieve O(n*(number of strings to print))

Is this what you're trying to do:
from itertools import combinations_with_replacement
alphabet = "abc"
def combs(alphabet, r):
for comb in combinations_with_replacement(alphabet, r):
if comb.count('a') == comb.count('b'):
yield comb
For this,
list(combs(alphabet, 3)) == [('a', 'b', 'c'), ('c', 'c', 'c')]
and
list(combs(alphabet, 4)) == [('a', 'a', 'b', 'b'),
('a', 'b', 'c', 'c'),
('c', 'c', 'c', 'c')]
This will produce all combinations and reject some; according to the docs for combinations_with_replacement:
The number of items returned is (n+r-1)! / r! / (n-1)! when n > 0.
where n == len(alphabet).

You can cut out the wasted work by changing the following:
if i=='a':
par+=1
if i=='b':
par-=1
to
oldpar = par
if i=='a':
par+=1
if i=='b':
par-=1
# there are n-h-1 characters left to place
# and we need to place at least abs(par) characters to reach par=0
if abs(par)>n-h-1:
par = oldpar
continue

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Apriori create 3 set of word from 2 set - python

Related

Combinations of a list of items in efficient way

How to efficiently group pairs based on shared item?

Get unique products between lists and maintain order of input

Can't figure out this simple recursion with Python

Python backtracking strings lenght n from alphabet {a,b,c} with #a=#b

Categories

Resources