Findings letters that match in every sublist - python

candidates = ['A', 'B', 'C', 'D']
If a candidate appears in every sublist at least once they must be returned
listOfData = [['B','C','B','A'], #D is no longer a candidate
['B', 'C', 'B', 'D'], #A is no loner a candidate
['A','D','C','B'], # B and C are still candidates
['D', 'C', 'B', 'A']] # B and C are solid matches!
In this case the matches are [B,C]
I'm having trouble keeping track of the candidate that appears in every sublist at least once.
matches =[]
def lettersThatMatchInEverySublist():
i=0
for candidate in candidates:
for sublist in listOfData:
for char in sublist:
pass
if char == candidate:
matches.append(candidate)
return matches

Easiest way - with sets
>>> valid_vals = tuple(set(row) for row in listOfData)
>>> candidates = set(['A', 'B', 'C', 'D'])
>>> for validator in valid_vals:
candidates &= validator
>>> candidates
set(['C', 'B'])

Here are a few guiding measures that can get you started, but beyond this, you will need to restate your problem more clearly.
Try using itertools for your listOfOptions:
import itertools
options = itertools.product('ACTG', repeat=3) # This finds all the combinations of A, C, T, and G.
listOfOptions = [''.join(option) for option in options] # This uses list comprehension to prepare your options.
Clean up the findKmersSet function:
def findKmersSet(k, dataset):
dataset = dataset.splitlines()
kmers = []
for line in dataset:
line_list = []
for i in range(len(line)-k+1):
line_list.append(line[i:i+k])
kmers.append(line_list)
return kmers

Related

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!
letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']
Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']
Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()
If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

How to find common elements from a list of lists such that order of occurrence is maintained?

I have a list of lists where the length of the lists are same. I need to find the common elements from them with the order of occurrence maintained.
For example:
Suppose the list of lists is [['a','e','d','c','f']['e','g','a','d','c']['c','a','h','e','j']]
The output list should contain ['a','e','c'] Priority should be given to elements which occur earlier in most of the lists. In this example 'a' occurs earlier, then 'e' and so on.
How to proceed with this?
you could find common items first then sorted it
from collections import defaultdict
data = [['a','e','d','c','f'],['e','g','a','d','c'],['c','a','h','e','j']]
common = set(data[0])
for line in data:
common = common.intersection(set(line))
res = defaultdict(int)
for line in data:
for idx, item in enumerate(line):
if item in common:
res[item] += idx
[item[0] for item in sorted(res.items(), key=lambda x: x[1])]
output:
['a', 'e', 'c']
Here's a quick solution that I managed to get working:
data = [['a', 'e', 'd', 'c', 'f'],
['e', 'g', 'a', 'd', 'c'], ['c', 'a', 'h', 'e', 'j']]
# count number of times each character appears
char_count = {}
for arr in data:
for char in arr:
if not char in char_count:
char_count.update({char: 1})
else:
char_count[char] += 1
# select characters that appear multiple times
common_chars = [i[0] for i in char_count.items() if i[1] > 1]
# remove characters that are not present in all lists
for char in common_chars:
count = 0
for arr in data:
if char in arr:
count += 1
if count < len(data):
common_chars.remove(char)
# final result with common characters
print(common_chars)
Resulting output:
['a', 'e', 'c']
Probably not the most efficient solution if you're working with lots of data though.

return a list of items without any elements with the same value next to each other

Implement the function unique_in_order which takes as argument a sequence and returns a list of items without any elements with the same value next to each other and preserving the original order of elements.
For example:
unique_in_order('AAAABBBCCDAABBB') == ['A', 'B', 'C', 'D', 'A', 'B']
unique_in_order('ABBCcAD') == ['A', 'B', 'C', 'c', 'A', 'D']
unique_in_order([1,2,2,3,3]) == [1,2,3]
my code return the correct output:
def unique_in_order(iterable):
list = []
for i in range(0, len(iterable)):
if iterable[i] != iterable[i-1]:
list.append(iterable[i])
return list
pass on test but it fails on attempt, saying:
should work with one element:
[] should equal ['A']
should reduce duplicates:
[] should equal ['A']
I wanna know what is wrong with my code, thanks
Use existing libraries to perform that task, like itertools.groupby
import itertools
def unique_in_order(iterable):
return [k for k,_ in itertools.groupby(iterable)]
print(unique_in_order('AAAABBBCCDAABBB')) # ['A', 'B', 'C', 'D', 'A', 'B']
print(unique_in_order(['A'])) # ['A']
With the default group key, groupby groups identical consecutive elements, yielding tuples with the value and the group of values (that we ignore here, we just need the key)

Basic Sorting / Order Algorithm

Trying to implement and form a very simple algorithm. This algorithm takes in a sequence of letters or numbers. It first creates an array (list) out of each character or digit. Then it checks each individual character compared with the following character in the sequence. If the two are equal, it removes the character from the array.
For example the input: 12223344112233 or AAAABBBCCCDDAAABB
And the output should be: 1234123 or ABCDAB
I believe the issue stems from the fact I created a counter and increment each loop. I use this counter for my comparison using the counter as an index marker in the array. Although, each time I remove an item from the array it changes the index while the counter increases.
Here is the code I have:
def sort(i):
iter = list(i)
counter = 0
for item in iter:
if item == iter[counter + 1]:
del iter[counter]
counter = counter + 1
return iter
You're iterating over the same list that you are deleting from. That usually causes behaviour that you would not expect. Make a copy of the list & iterate over that.
However, there is a simpler solution: Use itertools.groupby
import itertools
def sort(i):
return [x for x, _ in itertools.groupby(list(i))]
print(sort('12223344112233'))
Output:
['1', '2', '3', '4', '1', '2', '3']
A few alternatives, all using s = 'AAAABBBCCCDDAAABB' as setup:
>>> import re
>>> re.sub(r'(.)\1+', r'\1', s)
'ABCDAB'
>>> p = None
>>> [c for c in s if p != (p := c)]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for c, p in zip(s, [None] + list(s)) if c != p]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for i, c in enumerate(s) if not s.endswith(c, None, i)]
['A', 'B', 'C', 'D', 'A', 'B']
The other answers a good. This one iterates over the list in reverse to prevent skipping items, and uses the look ahead type algorithm OP described. Quick note OP this really isn't a sorting algorithm.
def sort(input_str: str) -> str:
as_list = list(input_str)
for idx in range(len(as_list), 0, -1)):
if item == as_list[idx-1]:
del as_list[idx]
return ''.join(as_list)

Obtain all subtrees in value

Given "a.b.c.d.e" I want to obtain all subtrees, efficiently, e.g. "b.c.d.e" and "c.d.e", but not "a.d.e" or "b.c.d".
Real world situation:
I have foo.bar.baz.example.com and I want all possible subdomain trees.
listed = "a.b.c.d.e".split('.')
subtrees = ['.'.join(listed[idx:]) for idx in xrange(len(listed))]
Given your sample data, subtrees equals ['a.b.c.d.e', 'b.c.d.e', 'c.d.e', 'd.e', 'e'].
items = data.split('.')
['.'.join(items[i:]) for i in range(0, len(items))]
def parts( s, sep ):
while True:
yield s
try:
# cut the string after the next sep
s = s[s.index(sep)+1:]
except ValueError:
# no `sep` left
break
print list(parts("a.b.c.d.e", '.'))
# ['a.b.c.d.e', 'b.c.d.e', 'c.d.e', 'd.e', 'e']
Not sure, if this is what you want.
But slicing of the list with varying sizes yields that.
>>> x = "a.b.c.d.e"
>>> k = x.split('.')
>>> k
['a', 'b', 'c', 'd', 'e']
>>> l = []
>>> for el in range(len(k)): l.append(k[el+1:])
...
>>> l
[['b', 'c', 'd', 'e'], ['c', 'd', 'e'], ['d', 'e'], ['e'], []]
>>> [".".join(l1) for l1 in l if l1]
['b.c.d.e', 'c.d.e', 'd.e', 'e']
>>>
Of course, the above was to illustrate the process. You could combine them into one liner.
[Edit: I thought the answer is same as any here and explains it well]

Categories