Finding match from a list of tuples - python

I have a list of tuples as below.
x = [('b', 'c'),
('c',),
('a', 'c', 'b'),
('b', 'c', 'a', 'd'),
('b', 'c', 'a'),
('a', 'b'),
('a', 'b', 'c', 'd'),
('a', 'c', 'b', 'd'),
('b',),
('c', 'a'),
('a', 'b', 'c'),
('a',)]
I want to give input like ('a') then it should give output like,
[('a', 'c', 'b'), ('a', 'b'),('a', 'b', 'c', 'd'),('a', 'c', 'b', 'd'),('a', 'b', 'c')]
#everything starts with a. But not "a".
or for input of ('a','b') it should give an output of
[('a', 'b', 'c', 'd'),('a', 'b', 'c')]
#everything start with ('a','b') but not ('a','b') itself.
I tried to use but no success.
print(filter(lambda x: ("a","b") in x, x))
>>> <filter object at 0x00000214B3A545F8>

def f(lst, target):
return [t for t in lst if len(t) > len(target) and all(a == b for a, b in zip(t, target))]
so that:
f(x, ('a', 'b'))
returns:
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

Tuples are matched lexicographically in python, meaning that there elements are compared pair by pair, regardless of their type.
You can extract the portion of each tuple of the same length as your prefix and compare with ==:
def find_prefixes(prefix, sequence):
n = len(prefix)
return[x for x in sequence if x[:n] == prefix and len(x) > n]
List comprehensions of this type are indeed equivalent to filter calls, so you can do
def find_prefixes(prefix, sequence):
n = len(prefix)
return list(filter(lambda x: x[:n] == prefix and len(x) > n, sequence))
Doing a linear search is not a very efficient way to solve this problem. The data structure known as a Trie is made specifically for finding prefixes. It arranges all your data into a single tree. Here is a popular Python implementation you can use with the appropriate attribution: https://stackoverflow.com/a/11016430/2988730

Firstly, use list(filter(...)) to convert a filter object to a list, but your filter doesn't do what you want - it checks membership, not subsequence. You can check subsequence by using a slice.
Then you just need to add a check that the match is longer than the subsequence.
Also, a filter of a lambda is better written as a comprehension.
for sub in ('a',), ('a', 'b'):
n = len(sub)
out = [t for t in x if t[:n] == sub and len(t) > n]
print(out)
Output:
[('a', 'c', 'b'), ('a', 'b'), ('a', 'b', 'c', 'd'), ('a', 'c', 'b', 'd'), ('a', 'b', 'c')]
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

list(filter(lambda y: all([y[i] == z for i,z in enumerate(inp)]) if len(y)>=len(inp) else False, x))
for
inp = ('a', 'b')
output will be
[('a', 'b'), ('a', 'b', 'c', 'd'), ('a', 'b', 'c')]

Related

How can I rearrange a set of values into new pattern on python and print results

so I will do my best to explain what I'm looking for,
at the moment I have a 100 item list that I want to repetitively shuffle using a set pattern to first check if the pattern will eventually bring me back to where I began
and 2 to print the result of each loop to a text file.
so using a 3 item list as my example
[a,b,c]
and the shuffle pattern [3 1 2]
where the 3rd item becomes the first.
the first item becomes the second
and the second item becomes the 3rd
on a loop would generate the following patterns
[a,b,c]
[3,1,2]
[c,a,b]
[b,c,a]
[a,b,c]
but I have a list at the moment of 100 items that I need to find every single arrangement for a few different patterns I would like to test out.
does anyone know of a way to do this in python please.
You can define function and call this function multi times like below:
>>> def func(lst, ptr):
... return [lst[idx-1] for idx in ptr]
>>> lst = ['a','b','c']
>>> ptr = [3,1,2]
>>> for _ in range(5):
... lst = func(lst, ptr)
... print(lst)
['c', 'a', 'b']
['b', 'c', 'a']
['a', 'b', 'c']
['c', 'a', 'b']
['b', 'c', 'a']
You could use numpy advanced integer indexing if your list contains a numeric type:
import numpy as np
original_list=[1,2,3]
numpy_array = np.array(original_list)
pattern = [2,1,0]
print(numpy_array[pattern])
>>> array([3, 2, 1])
def rearrange(pattern : list,L:list):
new_list = []
for i in pattern :
new_list.append(L[i-1])
return new_list
print(rearrange([3,1,2],['a','b','c']))
output :
['c', 'a', 'b']
Itertools could be what you need.
import itertools
p = itertools.permutations(['a','b','c', 'd'])
list(p)
Output:
[('a', 'b', 'c', 'd'),
('a', 'b', 'd', 'c'),
('a', 'c', 'b', 'd'),
('a', 'c', 'd', 'b'),
('a', 'd', 'b', 'c'),
('a', 'd', 'c', 'b'),
('b', 'a', 'c', 'd'),
('b', 'a', 'd', 'c'),
('b', 'c', 'a', 'd'),
('b', 'c', 'd', 'a'),
('b', 'd', 'a', 'c'),
('b', 'd', 'c', 'a'),
('c', 'a', 'b', 'd'),
('c', 'a', 'd', 'b'),
('c', 'b', 'a', 'd'),
('c', 'b', 'd', 'a'),
('c', 'd', 'a', 'b'),
('c', 'd', 'b', 'a'),
('d', 'a', 'b', 'c'),
('d', 'a', 'c', 'b'),
('d', 'b', 'a', 'c'),
('d', 'b', 'c', 'a'),
('d', 'c', 'a', 'b'),
('d', 'c', 'b', 'a')]
​

Elegant Python code for list decomposition

I tried to write code to solve the list decomposition with All possibilities.
The code I wrote was a mess. I need an elegant solution to solve the problem, because I want to improve my coding style.
I tried to write an initial version as follows, But the memory requirements are too large and the execution speed is too slow.
import itertools
powerset = lambda iterable: itertools.chain.from_iterable(
itertools.combinations(list(iterable), r)
for r in range(1, len(list(iterable)) + 1))
flatten = lambda list2d: [item for sublist in list2d for item in sublist]
x = list("abcd")
xxx = [val for val in powerset([val1 for val1 in powerset(x)] )]
xxxx = [val for val in xxx if x == list(sorted(flatten(val)))]
xxxx is :
[(('a', 'b', 'c', 'd'),),
(('a',), ('b', 'c', 'd')),
(('b',), ('a', 'c', 'd')),
(('c',), ('a', 'b', 'd')),
(('d',), ('a', 'b', 'c')),
(('a', 'b'), ('c', 'd')),
(('a', 'c'), ('b', 'd')),
(('a', 'd'), ('b', 'c')),
(('a',), ('b',), ('c', 'd')),
(('a',), ('c',), ('b', 'd')),
(('a',), ('d',), ('b', 'c')),
(('b',), ('c',), ('a', 'd')),
(('b',), ('d',), ('a', 'c')),
(('c',), ('d',), ('a', 'b')),
(('a',), ('b',), ('c',), ('d',))]
version 2:
import itertools
powerset = lambda iterable: itertools.chain.from_iterable(
itertools.combinations(list(iterable), r)
for r in range(1, len(list(iterable)) + 1))
flatten = lambda list2d: [item for sublist in list2d for item in sublist]
def makelist(list_1D):
for val in powerset(list(powerset(list_1D))) :
if list_1D == list(sorted(flatten(val))) :
yield val
if val == tuple(itertools.combinations(list_1D, 1)) :
break
for d in makelist(list("abcd")) :
print(d)
output:
(('a', 'b', 'c', 'd'),)
(('a',), ('b', 'c', 'd'))
(('b',), ('a', 'c', 'd'))
(('c',), ('a', 'b', 'd'))
(('d',), ('a', 'b', 'c'))
(('a', 'b'), ('c', 'd'))
(('a', 'c'), ('b', 'd'))
(('a', 'd'), ('b', 'c'))
(('a',), ('b',), ('c', 'd'))
(('a',), ('c',), ('b', 'd'))
(('a',), ('d',), ('b', 'c'))
(('b',), ('c',), ('a', 'd'))
(('b',), ('d',), ('a', 'c'))
(('c',), ('d',), ('a', 'b'))
(('a',), ('b',), ('c',), ('d',))
version 3 from Time Complexity of finding all partitions of a set
def partition(collection):
global counter
if len(collection) == 1:
yield [collection]
return
first = collection[0]
for smaller in partition(collection[1:]):
for n, subset in enumerate(smaller):
yield smaller[:n] + [[first] + subset] + smaller[n + 1:]
yield [[first]] + smaller
In order to avoid memory issues, we need to maximize the use of generators/iterators and never create a list of combinations.
Here is a way to do it by breaking down the problem in layers.
First, a generator to obtain partition sizes for a given number of elements. This will then be used to fill combinations of elements corresponding to each size except for the single element parts. The single element parts are done last in order to avoid duplicates. By doing them last, we always have exactly the right number of unused elements for the single element parts.
Partition generation
# Generator for all partition sizes forming N
def partSize(N):
if N<2: yield [1]*N;return
for s in range(1,N+1):
yield from ([s]+rest for rest in partSize(N-s))
print(*partSize(3))
# [1, 1, 1] [1, 2] [2, 1] [3]
print(*partSize(4))
# [1, 1, 1, 1] [1, 1, 2] [1, 2, 1] [1, 3] [2, 1, 1] [2, 2] [3, 1] [4]
Partition filling
# A generator that fills partitions
# with combinations of indexes contained in A
from itertools import combinations
def fillCombo(A,parts,R=None):
if R is None: R = [tuple()]*len(parts)
size = max(parts) # fill largest partitions first
if size < 2: # when only single element partitions are left
iA = iter(A) # fill them with the remaining indexes
yield [r if p!=1 else (next(iA),) for r,p in zip(R,parts)]
return
i,parts[i]= parts.index(size),0 # index of largest partition
for combo in combinations(A,size): # all combinations of that size
R[i] = combo # fill part and recurse
yield from fillCombo(A.difference(combo),[*parts],[*R])
Mapping partition to indexed values
# for each partition pattern, fill with combinations
# using set of indexes in fillCombo so that repeated values
# are processed as distinct
def partCombo(A):
for parts in partSize(len(A)):
for iParts in fillCombo({*range(len(A))},parts): # combine indexes
yield [tuple(A[i] for i in t) for t in iParts] # get actual values
output:
for pp in partCombo("abc"): print(pp)
[('a',), ('b',), ('c',)]
[('c',), ('a', 'b')]
[('b',), ('a', 'c')]
[('a',), ('b', 'c')]
[('a', 'b'), ('c',)]
[('a', 'c'), ('b',)]
[('b', 'c'), ('a',)]
[('a', 'b', 'c')]
This uses very little memory but still has exponential progression in time. For example:
sum(1 for _ in partCombo("abcdefghi")) # 768,500 combinations
takes 3.8 seconds on my laptop
Adding just one more letter, increases the execution time to 43 seconds for the 8,070,046 combinations.

How to generate random binary tree

I am given a list of labels L and I wish to recursively generate a random binary tree from L.
The desired behavior is like this:
generate(['A', 'B', 'C', 'D', 'E', 'F'])
could give:
((('A', ('B', 'C')), ('D', 'E')), 'F')
Note that the list of leaf labels from left to right should equal L.
I am in doubt how to randomly construct the tree. This is what I have so far (I split the list of labels at a random index.
def generate_tree(L):
split = randint(1, len(L)-1)
left = L[:split]
right = L[split:]
# call generate(left) and generate(right) based on some conditions
I am stuck. I would be grateful for a couple of hints or help.
You weren' too far off. All you needed was a base case and building the resulting tuple from the results of the recursive calls:
def generate_tree(L):
# base case
if len(L) == 1:
return L[0]
split = randint(1, len(L)-1)
left = L[:split]
right = L[split:]
# recursion
return (generate_tree(left), generate_tree(right))
>>> generate_tree(['A', 'B', 'C', 'D', 'E', 'F'])
(('A', 'B'), (('C', 'D'), ('E', 'F')))
>>> generate_tree(['A', 'B', 'C', 'D', 'E', 'F'])
((('A', 'B'), 'C'), (('D', 'E'), 'F'))
>>> generate_tree(['A', 'B', 'C', 'D', 'E', 'F'])
('A', (('B', 'C'), (('D', 'E'), 'F')))
And if you are code golfing and looking for a fancy (>=3.8 only) one-liner:
def gt(L):
return (gt(L[:(s:=randint(1, len(L)-1))]), gt(L[s:])) if len(L) > 1 else L[0]
You can randomly slice the list and then recursively apply the tree construction:
import random
def r_tree(d):
_l, _r = tuple(d[:(_n:=random.randint(0, len(d)-1))]), tuple(d[_n+1:])
l, r = _l if len(_l) < 3 else r_tree(_l), _r if len(_r) < 3 else r_tree(_r)
return (d[_n], n[0] if len(n:=tuple(filter(None, [l, r]))) == 1 else n)
print(r_tree(['A', 'B', 'C', 'D', 'E', 'F']))
Output:
('D', (('A', ('B', 'C')), ('E', 'F')))

Set of HUGE permutation object (in Python or R)

Aim: I'd like to get (or be able to work with) a set of all possible permutations obtained from a list of strings.
Example in Python:
import pandas as pd
import itertools
list1 = ['A', 'A', 'B', 'B']
# Get all permutations
list1_perm = list(itertools.permutations(list1))
len(list1_perm)
24
list1_perm
[('A', 'A', 'B', 'B'),
('A', 'A', 'B', 'B'),
('A', 'B', 'A', 'B'),
('A', 'B', 'B', 'A'),
('A', 'B', 'A', 'B'),
('A', 'B', 'B', 'A'),
('A', 'A', 'B', 'B'),
('A', 'A', 'B', 'B'),
('A', 'B', 'A', 'B'),
('A', 'B', 'B', 'A'),
('A', 'B', 'A', 'B'),
('A', 'B', 'B', 'A'),
('B', 'A', 'A', 'B'),
('B', 'A', 'B', 'A'),
('B', 'A', 'A', 'B'),
('B', 'A', 'B', 'A'),
('B', 'B', 'A', 'A'),
('B', 'B', 'A', 'A'),
('B', 'A', 'A', 'B'),
('B', 'A', 'B', 'A'),
('B', 'A', 'A', 'B'),
('B', 'A', 'B', 'A'),
('B', 'B', 'A', 'A'),
('B', 'B', 'A', 'A')]
Since for my analysis ('A', 'A', 'B', 'B') is the same as ('A', 'A', 'B', 'B'), (although the 'A' may have changed the position), I do:
# Get set of permutations
set1_perm = set(itertools.permutations(list1))
len(set1_perm)
6
set1_perm
{('A', 'A', 'B', 'B'),
('A', 'B', 'A', 'B'),
('A', 'B', 'B', 'A'),
('B', 'A', 'A', 'B'),
('B', 'A', 'B', 'A'),
('B', 'B', 'A', 'A')}
Now this is great, but the list I want to work with has 481 strings, with 5 unique strings with different frequencies:
len(real_list)
481
len(set(real_list))
5
# Count number of times each unique value appears
pd.Series(real_list).value_counts()
A 141
B 116
C 80
D 78
E 66
This is not a problem for itertools.permutations(real_list), but when I want to get the set, it takes ages. This is because the number of permutations is 9.044272819E+1082.
What I want to do is:
First I want to know the number of unique elements in that permutation space, i.e. the length of the set. To get the number of unique elements it might be possible to do it analytically, however since the frequency of each unique element is different I don't how to do that.
Second I would like to be able to get a sample of those unique elements in the set of permutations.
I'd appreciate any help provided.
Best,
Alejandro
Calculating the number of unique permutations is simply a matter of applying a formula - we know that were we to have n distinct elements, we would have n! permutations. Then to account for repeated permutations we must divide by each count of permutations of repeated letters. This is a multinomial coefficient.
So a simple implementation to generate the unique count may look something like
from math import factorial
from functools import reduce
from collections import Counter
def perm_cnt(l):
denom = reduce(lambda x,y: x*factorial(y), Counter(l).values())
return factorial(len(l)) // denom
Then sampling from the unique permutations is probably most simply achieved by just ensuring your sample values remain unique, as opposed to trying to generate all of the unique values and then sampling. There is a recipe in the itertools module, random_permutation, which could be useful for this.
def random_permutation(iterable, r=None):
"Random selection from itertools.permutations(iterable, r)"
pool = tuple(iterable)
r = len(pool) if r is None else r
return tuple(random.sample(pool, r))
So creating a unique sample might look something like
def uniq_sample(l, size):
s = set()
perm_size = perm_cnt(l)
cnt = 0
while cnt < min(perm_size, size):
samp = random_permutation(l)
if samp not in s:
s.add(samp)
cnt += 1
return s
Demo
>>> perm_cnt(list1)
6
>>> perm_cnt(['a']*3 + ['b']*5 + ['d']*2)
2520
>>> perm_cnt(np.random.randint(10, size=20))
105594705216000
>>> uniq_sample(list1, 4)
{('A', 'A', 'B', 'B'),
('B', 'A', 'A', 'B'),
('B', 'A', 'B', 'A'),
('B', 'B', 'A', 'A')}

All strings with list of characters?

I'm making a specialized utility similar to John the Ripper, and I'd like to use a loop that returns all strings up to x characters that can be formed from the string. For example, if the "seed" string is abcd, it should return:
a
b
c
d
aa
ab
ac
and so on. If the character limit is 10, it would generate aaaaaaaaaa, abcddcbaaa, and so on. Is there a simple for loop to do this, or is it more complicated than that?
I'll self-plagiarize from this answer and add a maximum length:
from itertools import product
def multiletters(seq, max_length):
for n in range(1, max_length+1):
for s in product(seq, repeat=n):
yield ''.join(s)
which gives
>>> list(multiletters("abc", 2))
['a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc']
>>> list(multiletters("abcd", 4))[:8]
['a', 'b', 'c', 'd', 'aa', 'ab', 'ac', 'ad']
and so on.
def all_strings(alphabet, length_limit=None):
n_letters = len(alphabet)
length = 0
n_strings = 1
buf = []
while True:
for i in xrange(0, n_strings):
k = i
for j in xrange(length - 1, -1, -1):
buf[j] = alphabet[k % n_letters]
k /= n_letters
yield ''.join(buf)
length += 1
if length == length_limit:
break
n_strings *= n_letters
buf.append(alphabet[0])
for s in all_strings('abcd', length_limit=4):
print s
As pointed out in the comment's use itertools.premutations or even better take a look #DSM's answer, as this one misses the doubles:
In [194]: from itertools import chain, permutations
In [195]: s = 'abcd'
In [196]: map(''.join,chain.from_iterable(permutations(s,x)
for x in range(1,len(s)+1)))
Out[196]:
['a',
'b',
'c',
'd',
'ab',
'ac',
'ad',
'ba',
'bc',
'bd',
...
'dbca',
'dcab',
'dcba']
Anyway, here's a version of #DSM's answer that returns a list:
from itertools import product
def ms(seq, max_length):
return [''.join(s) for n in range(1, max_length+1)
for s in product(seq,repeat=n)]
Use itertools.permuataions.
for i in range(2,4):
tuples = itertools.permutations('abca' , i)
print( list(tuples))
The example code sequence generates:
[('a', 'b'), ('a', 'c'), ('a', 'a'), ('b', 'a'), ('b', 'c'), ('b', 'a'), ('c', 'a'), ('c', 'b'), ('c', 'a'), ('a', 'a'), ('a', 'b'), ('a', 'c')]
[('a', 'b', 'c'), ('a', 'b', 'a'), ('a', 'c', 'b'), ('a', 'c', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('b', 'a', 'c'), ('b', 'a', 'a'), ('b', 'c', 'a'), ('b', 'c', 'a'), ('b', 'a', 'a'), ('b', 'a', 'c'), ('c', 'a', 'b'), ('c', 'a', 'a'), ('c', 'b', 'a'), ('c', 'b', 'a'), ('c', 'a', 'a'), ('c', 'a', 'b'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'b', 'a'), ('a', 'b', 'c'), ('a', 'c', 'a'), ('a', 'c', 'b')]

Categories