Split tuple / collection into instances of pattern

Split tuple / collection into instances of pattern - python

I have a tuple [but it can be any collection] that contains elements:
tple = (1.02, 'a', 'b', 1.02, 'a', 'b')
I'm trying to find a way to count the number of times the pattern (1.02, 'a', 'b') occurs within the tuple. The pattern may not exist at all, in which case I would want to return 0.
Is there such a way?

One approach using itertools:
from itertools import tee
def wise(iterable):
"""Iterate over contiguous overlapping chunks"""
a, b, c = tee(iterable, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b, c)
tple = (1.02, 'a', 'b', 1.02, 'a', 'b')
pattern = (1.02, 'a', 'b')
res = sum(i == pattern for i in wise(tple))
print(res)
Output
2
The function wise, is a generalization of itertools.pairwise. For the above example it returns something similar to:
[(1.02, 'a', 'b'), ('a', 'b', 1.02), ('b', 1.02, 'a'), (1.02, 'a', 'b')]
Note that by using itertools tple can be any collection. The expression:
res = sum(i == pattern for i in wise(tple))
is equivalent to the following for-loop:
res = 0
for i in wise(tple):
if pattern == i:
res += 1
print(res)
If you want to iterate in chunks of different lengths, use the following general wise function:
def wise(iterable, n=3):
"""Iterate over contiguous overlapping chunks"""
its = tee(iterable, n)
for i, it in enumerate(its):
for _ in range(i):
next(it)
return zip(*its)
UPDATE
The general function can be linear as suggested by #KellyBundy:
def wise(iterable, n=3):
"""Iterate over contiguous overlapping chunks"""
its = []
for _ in range(n):
iterable, it = tee(iterable)
its.append(it)
next(iterable, None)
return zip(*its)

You can also do something like this if you're trying out for the One-Line Olympics:
>>> tple = (1.02, 'a', 'b', 1.02, 'a', 'b')
>>> pattern = (1.02, 'a', 'b')
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
2
It should also work for patterns of any length:
>>> pattern = (1.02,)
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
2
It will return 0 if the pattern is longer than the tuple, or if the pattern is not found:
>>> pattern = (1.02, 'a', 'b', 1.02, 'a', 'b', 'c')
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
0
>>> pattern = (1.03,)
>>> sum(tple[i:i+len(pattern)] == pattern for i in range(len(tple)-len(pattern)+1))
0

Yes, there's a simple and straightforward way to do so:
# collection = ...
count = 0
for ind in range(2, len(collection)):
if collection [ind-2] == 1.02 and collection [ind-1] == 'a' and collection [ind] == 'b':
count += 1
print(count)
Want to point this out, that this code is valid with any type that is subscritable and can be passed to len(). But notice, that length of collection should be >= 3 (If otherwise it's trivial).

Related

How to reduce a List of element following a Logic in python?

I have Two list:
L = ['A','B','C']
L2 = ['A', 'B', ('B', 'A'), 'C']
I would create a single list with the following element:
L3 = ['A', ('B', 'A'), 'C']
Every time an element in the list L is present in more element in the L2 list I would pick the longest one. Important, Only if B Is at the first place of the tuple
I tried the following code:
the following code
temp_length = 0
for d in L:
for d2 in L2:
if d in d2 and temp_length<len(d2):
temp_op = d
L3.append(temp_op)
But is not adding ('B','A') instead of 'B'

Well this is one way to do it, but this question may have cleaner solutions as well:
L = ['A','B','C']
L2 = ['A', 'B', ('B', 'A'), 'C']
answer = []
for l2 in L2:
if len(l2) == 1:
if l2 in L:
answer.append(l2)
elif l2[0] in L:
answer.append(l2)
if l2[0] in answer:
answer.remove(l2[0])
print(answer)
['A', ('B', 'A'), 'C']

First, find the longest elements in L2 (according to the first sub-element match) and keep them in a dictionary using the matching sub-element as a key.
It's much faster than looking for elements in lists many times and repeat checking the shorter ones needlessly.
from typing import Dict, Iterable
l = ['A','B','C']
l2 = ['A', 'B', ('B', 'A'), 'C']
def keep_longest_elements(seq: Iterable) -> Dict:
res = {}
for el in seq:
if (exist_el := res.get(el)) is not None:
if exist_el[0] == el[0] and len(el) > len(exist_el):
res[exist_el[0]] = el
else:
res[el[0]] = el
return res
longest = keep_longest_elements(l2)
print(longest)
l3 = [longest[el] for el in l]
print(l3)
produces
{'A': 'A', 'B': ('B', 'A'), 'C': 'C'}
['A', ('B', 'A'), 'C']

Basic Sorting / Order Algorithm

Trying to implement and form a very simple algorithm. This algorithm takes in a sequence of letters or numbers. It first creates an array (list) out of each character or digit. Then it checks each individual character compared with the following character in the sequence. If the two are equal, it removes the character from the array.
For example the input: 12223344112233 or AAAABBBCCCDDAAABB
And the output should be: 1234123 or ABCDAB
I believe the issue stems from the fact I created a counter and increment each loop. I use this counter for my comparison using the counter as an index marker in the array. Although, each time I remove an item from the array it changes the index while the counter increases.
Here is the code I have:
def sort(i):
iter = list(i)
counter = 0
for item in iter:
if item == iter[counter + 1]:
del iter[counter]
counter = counter + 1
return iter

You're iterating over the same list that you are deleting from. That usually causes behaviour that you would not expect. Make a copy of the list & iterate over that.
However, there is a simpler solution: Use itertools.groupby
import itertools
def sort(i):
return [x for x, _ in itertools.groupby(list(i))]
print(sort('12223344112233'))
Output:
['1', '2', '3', '4', '1', '2', '3']

A few alternatives, all using s = 'AAAABBBCCCDDAAABB' as setup:
>>> import re
>>> re.sub(r'(.)\1+', r'\1', s)
'ABCDAB'
>>> p = None
>>> [c for c in s if p != (p := c)]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for c, p in zip(s, [None] + list(s)) if c != p]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for i, c in enumerate(s) if not s.endswith(c, None, i)]
['A', 'B', 'C', 'D', 'A', 'B']

The other answers a good. This one iterates over the list in reverse to prevent skipping items, and uses the look ahead type algorithm OP described. Quick note OP this really isn't a sorting algorithm.
def sort(input_str: str) -> str:
as_list = list(input_str)
for idx in range(len(as_list), 0, -1)):
if item == as_list[idx-1]:
del as_list[idx]
return ''.join(as_list)

Count elements in a nested list in an elegant way

I have nested tuples in a list like
l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')]
I want to know how many 'a' and 'b' in the list in total. So I currently use the following code to get the result.
amount_a_and_b = len([None for _, elem2, elem3 in l if elem2 == 'a' or elem3 == 'b'])
But I got amount_a_and_b = 1, so how to get the right answer?
Also, is there a more elegant way (less code or higher performance or using builtins) to do this?

I'd flatten the list with itertools.chain.from_iterable() and pass it to a collections.Counter() object:
from collections import Counter
from itertools import chain
counts = Counter(chain.from_iterable(l))
amount_a_and_b = counts['a'] + counts['b']
Or use sum() to count how many times a value appears in the flattened sequence:
from itertools import chain
amount_a_and_b = sum(1 for v in chain.from_iterable(l) if v in {'a', 'b'})
The two approaches are pretty much comparable in speed on Python 3.5.1 on my Macbook Pro (OS X 10.11):
>>> from timeit import timeit
>>> from collections import Counter
>>> from itertools import chain
>>> l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')] * 1000 # make it interesting
>>> def counter():
... counts = Counter(chain.from_iterable(l))
... counts['a'] + counts['b']
...
>>> def summing():
... sum(1 for v in chain.from_iterable(l) if v in {'a', 'b'})
...
>>> timeit(counter, number=1000)
0.5640139860006457
>>> timeit(summing, number=1000)
0.6066895100011607

You want to avoid putting data in a datastructure. The [...] syntax constructs a new list and fills it with the content you put in ... , after which the length of the array is taken and the array is never used. If the list if very large, this uses a lot of memory, and it is inelegant in general. You can also use iterators to loop over the existing data structure, e.g., like so:
sum(sum(c in ('a', 'b') for c in t) for t in l)
The c in ('a', 'b') predicate is a bool which evaluates to a 0 or 1 when cast to an int, causing the sum() to only count the tuple entry if the predicate evaluates to True.

Just for fun, functional method using reduce:
>>> l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')]
>>> from functools import reduce
>>> reduce(lambda x, y: (1 if 'a' in y else 0) + (1 if 'b' in y else 0) + x, l, 0)
4

You can iterate over both the list and the sub-lists in one list comprehension:
len([i for sub_list in l for i in sub_list if i in ("a", "b")])
I think that's fairly concise.
To avoid creating a temporary list, you could use a generator expression to create a sequence of 1s and pass that to sum:
sum(1 for sub_list in l for i in sub_list if i in ("a", "b"))

Although this question already has an accepted answer, just wondering why all of them as so complex. I would think that this would suffice.
>>> l = [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'e', 'a')]
>>> total = sum(tup.count('a') + tup.count('b') for tup in l)
>>> total
4
Or
>>> total = sum(1 for tup in l for v in tup if v in {'a', 'b'})

Findings letters that match in every sublist

candidates = ['A', 'B', 'C', 'D']
If a candidate appears in every sublist at least once they must be returned
listOfData = [['B','C','B','A'], #D is no longer a candidate
['B', 'C', 'B', 'D'], #A is no loner a candidate
['A','D','C','B'], # B and C are still candidates
['D', 'C', 'B', 'A']] # B and C are solid matches!
In this case the matches are [B,C]
I'm having trouble keeping track of the candidate that appears in every sublist at least once.
matches =[]
def lettersThatMatchInEverySublist():
i=0
for candidate in candidates:
for sublist in listOfData:
for char in sublist:
pass
if char == candidate:
matches.append(candidate)
return matches

Easiest way - with sets
>>> valid_vals = tuple(set(row) for row in listOfData)
>>> candidates = set(['A', 'B', 'C', 'D'])
>>> for validator in valid_vals:
candidates &= validator
>>> candidates
set(['C', 'B'])

Here are a few guiding measures that can get you started, but beyond this, you will need to restate your problem more clearly.
Try using itertools for your listOfOptions:
import itertools
options = itertools.product('ACTG', repeat=3) # This finds all the combinations of A, C, T, and G.
listOfOptions = [''.join(option) for option in options] # This uses list comprehension to prepare your options.
Clean up the findKmersSet function:
def findKmersSet(k, dataset):
dataset = dataset.splitlines()
kmers = []
for line in dataset:
line_list = []
for i in range(len(line)-k+1):
line_list.append(line[i:i+k])
kmers.append(line_list)
return kmers

1d list indexing python: enhance MaskableList

A common problem of mine is the following:
As input I have (n is some int >1)
W = numpy.array(...)
L = list(...)
where
len(W) == n
>> true
shape(L)[0] == n
>> true
And I want to sort the list L regarding the values of W and a comparator. My idea was to do the following:
def my_zip_sort(W,L):
srt = argsort(W)
return zip(L[srt],W[srt])
This should work like this:
a = ['a', 'b', 'c', 'd']
b = zeros(4)
b[0]=3;b[1]=2;b[2]=[1];b[3]=4
my_zip_sort(a,b)
>> [(c,1)(b,2)(a,3)(d,4)]
But this does not, because
TypeError: only integer arrays with one element can be converted to an index
thus, I need to do another loop:
def my_zip_sort(W,L):
srt = argsort(W)
res = list()
for i in L:
res.append((L[srt[i]],W[srt[i]]))
return res
I found a thread about a MaskableList, but this does not work for me (as you can read in the comments), because I would not only need to hold or discard particular values of my list, but also need to re-order them:
a.__class__
>> msk.MaskableList
srt = argsort(b)
a[srt]
>> ['a', 'b', 'd']
Concluding:
I want to find a way to sort a list of objects by constraints in an array. I found a way myself, which is kind of nice except for the list-indexing. Can you help me to write a class that works likewise to MaskableList for this task, which has a good performance?

You don't need to extend list do avoid the for-loop. A list-comprehension is sufficient and probably the best you can do here, if you expect a new list of tuples:
def my_zip_sort(W, L):
srt = argsort(W)
return [(L[i], W[i]) for i in srt]
Example:
n = 5
W = np.random.randint(10,size=5)
L = [chr(ord('A') + i) for i in W]
L # => ['A', 'C', 'H', 'G', 'C']
srt = np.argsort(W)
result = [(L[i], W[i]) for i in srt]
print result
[('A', 0), ('C', 2), ('C', 2), ('G', 6), ('H', 7)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split tuple / collection into instances of pattern - python

Related

How to reduce a List of element following a Logic in python?

Basic Sorting / Order Algorithm

Count elements in a nested list in an elegant way

Findings letters that match in every sublist

1d list indexing python: enhance MaskableList

Categories

Resources