Basic Sorting / Order Algorithm - python

Trying to implement and form a very simple algorithm. This algorithm takes in a sequence of letters or numbers. It first creates an array (list) out of each character or digit. Then it checks each individual character compared with the following character in the sequence. If the two are equal, it removes the character from the array.
For example the input: 12223344112233 or AAAABBBCCCDDAAABB
And the output should be: 1234123 or ABCDAB
I believe the issue stems from the fact I created a counter and increment each loop. I use this counter for my comparison using the counter as an index marker in the array. Although, each time I remove an item from the array it changes the index while the counter increases.
Here is the code I have:
def sort(i):
iter = list(i)
counter = 0
for item in iter:
if item == iter[counter + 1]:
del iter[counter]
counter = counter + 1
return iter

You're iterating over the same list that you are deleting from. That usually causes behaviour that you would not expect. Make a copy of the list & iterate over that.
However, there is a simpler solution: Use itertools.groupby
import itertools
def sort(i):
return [x for x, _ in itertools.groupby(list(i))]
print(sort('12223344112233'))
Output:
['1', '2', '3', '4', '1', '2', '3']

A few alternatives, all using s = 'AAAABBBCCCDDAAABB' as setup:
>>> import re
>>> re.sub(r'(.)\1+', r'\1', s)
'ABCDAB'
>>> p = None
>>> [c for c in s if p != (p := c)]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for c, p in zip(s, [None] + list(s)) if c != p]
['A', 'B', 'C', 'D', 'A', 'B']
>>> [c for i, c in enumerate(s) if not s.endswith(c, None, i)]
['A', 'B', 'C', 'D', 'A', 'B']

The other answers a good. This one iterates over the list in reverse to prevent skipping items, and uses the look ahead type algorithm OP described. Quick note OP this really isn't a sorting algorithm.
def sort(input_str: str) -> str:
as_list = list(input_str)
for idx in range(len(as_list), 0, -1)):
if item == as_list[idx-1]:
del as_list[idx]
return ''.join(as_list)

Related

How to find common elements from a list of lists such that order of occurrence is maintained?

I have a list of lists where the length of the lists are same. I need to find the common elements from them with the order of occurrence maintained.
For example:
Suppose the list of lists is [['a','e','d','c','f']['e','g','a','d','c']['c','a','h','e','j']]
The output list should contain ['a','e','c'] Priority should be given to elements which occur earlier in most of the lists. In this example 'a' occurs earlier, then 'e' and so on.
How to proceed with this?
you could find common items first then sorted it
from collections import defaultdict
data = [['a','e','d','c','f'],['e','g','a','d','c'],['c','a','h','e','j']]
common = set(data[0])
for line in data:
common = common.intersection(set(line))
res = defaultdict(int)
for line in data:
for idx, item in enumerate(line):
if item in common:
res[item] += idx
[item[0] for item in sorted(res.items(), key=lambda x: x[1])]
output:
['a', 'e', 'c']
Here's a quick solution that I managed to get working:
data = [['a', 'e', 'd', 'c', 'f'],
['e', 'g', 'a', 'd', 'c'], ['c', 'a', 'h', 'e', 'j']]
# count number of times each character appears
char_count = {}
for arr in data:
for char in arr:
if not char in char_count:
char_count.update({char: 1})
else:
char_count[char] += 1
# select characters that appear multiple times
common_chars = [i[0] for i in char_count.items() if i[1] > 1]
# remove characters that are not present in all lists
for char in common_chars:
count = 0
for arr in data:
if char in arr:
count += 1
if count < len(data):
common_chars.remove(char)
# final result with common characters
print(common_chars)
Resulting output:
['a', 'e', 'c']
Probably not the most efficient solution if you're working with lots of data though.

Get all possible ordered sublists of a list

Let's say I have a list with the following letters:
lst=['A','B','C','D']
And I need to get all the possible sublists of that list that maintain the order. Thus, the result must be:
res=['A'
'AB'
'ABC'
'ABCD'
'B'
'BC'
'BCD'
'C'
'CD'
'D']
I had implemebted the following for loop, but an error occurs, saying that "TypeError:Can only concatenate str (not "list) to str"
res=[]
for x in range(len(lst)):
for y in range(len(lst)):
if x==y:
res.appebd(x)
if y>x:
res.append(lst[x]+lst[y:len(lst)-1]
Is there a better and more efficient way to do this?
lst=['A','B','C','D']
out = []
for i in range(len(lst)):
for j in range(i, len(lst)):
out.append( ''.join(lst[i:j+1]) )
print(out)
Prints:
['A', 'AB', 'ABC', 'ABCD', 'B', 'BC', 'BCD', 'C', 'CD', 'D']
Rather than nested loops with redefined inner loop bounds on each go, you can use itertools to generate the bounds for you:
from itertools import combinations
lst = ['A','B','C','D']
out = []
for s, e in combinations(range(len(lst) + 1), 2):
out.append(''.join(lst[s:e]))
combinations conveniently produces all possible start and end indices from a single range, producing each set one at a time in your desired order. It also simplifies the code enough that the equivalent listcomp isn't too unreadable, allowing you to condense three lines of code down to one:
out = [''.join(lst[s:e]) for s, e in combinations(range(len(lst) + 1), 2)]
Either way, out ends up with the value:
['A', 'AB', 'ABC', 'ABCD', 'B', 'BC', 'BCD', 'C', 'CD', 'D']
This is probably the closest to what you have got and will produce the desired result:
res=[]
for x in range(len(lst)):
for y in range(len(lst)):
if x==y:
res.append(lst[x])
if y>x:
res.append(''.join(lst[x:y+1]))
The error you are describing mean that you are trying to add a character to a list:
lst[x]+lst[y:len(lst)-1]
lst[x] is a character and lst[y:len(lst)-1] is a list of characters and python does not know how to add it together. It can add a character and a string though using a join function.

Access list elements that are not equal to a specific value

I am searching through a list like this:
my_list = [['a','b'],['b','c'],['a','x'],['f','r']]
and I want to see which elements come with 'a'. So first I have to find lists in which 'a' occurs. Then get access to the other element of the list. I do this by abs(pair.index('a')-1)
for pair in my_list:
if 'a' in pair:
print( pair[abs(pair.index('a')-1)] )
Is there any better pythonic way to do that?
Something like: pair.index(not 'a') maybe?
UPDATE:
Maybe it is good to point out that 'a' is not necessarily the first element.
in my case, ['a','a'] doesn't happen, but generally maybe it's good to choose a solution which handles this situation too
Are you looking for elements that accompany a? If so, a simple list comprehension will do:
In [110]: [x for x in my_list if 'a' in x]
Out[110]: [['a', 'b'], ['a', 'x']]
If you just want the elements and not the pairs, how about getting rid of a before printing:
In [112]: [(set(x) - {'a'}).pop() for x in my_list if 'a' in x]
Out[112]: ['b', 'x']
I use a set because a could either be the first or second element in the pair.
If I understand your question correctly, the following should work:
my_list = filter(
lambda e: 'a' not in e,
my_list
)
Note that in python 3, this returns a filter object instance. You may want to wrap the code in a list() command to get a list instance instead.
That technique works ok here, but it may be more efficient, and slightly more readable, to do it using sets. Here's one way to do that.
def paired_with(seq, ch):
chset = set(ch)
return [(set(pair) - chset).pop() for pair in seq if ch in pair]
my_list = [['a','b'], ['b','c'], ['x','a'], ['f','r']]
print(paired_with(my_list, 'a'))
output
['b', 'x']
If you want to do lots of tests on the same list, it would be more efficient to build a list of sets.
def paired_with(seq, ch):
chset = set(ch)
return [(pair - chset).pop() for pair in seq if ch in pair]
my_list = [['a','b'], ['b','c'], ['x','a'], ['f','r']]
my_sets = [set(u) for u in my_list]
print(my_sets)
print(paired_with(my_sets, 'a'))
output
[{'b', 'a'}, {'c', 'b'}, {'x', 'a'}, {'r', 'f'}]
['b', 'x']
This will fail if there's a pair like ['a', 'a'], but we can easily fix that:
def paired_with(seq, ch):
chset = set(ch)
return [(pair - chset or chset).pop() for pair in seq if ch in pair]
my_list = [['a','b'], ['b','c'], ['x','a'], ['f','r'], ['a', 'a']]
my_sets = [set(u) for u in my_list]
print(paired_with(my_sets, 'a'))
output
['b', 'x', 'a']

Findings letters that match in every sublist

candidates = ['A', 'B', 'C', 'D']
If a candidate appears in every sublist at least once they must be returned
listOfData = [['B','C','B','A'], #D is no longer a candidate
['B', 'C', 'B', 'D'], #A is no loner a candidate
['A','D','C','B'], # B and C are still candidates
['D', 'C', 'B', 'A']] # B and C are solid matches!
In this case the matches are [B,C]
I'm having trouble keeping track of the candidate that appears in every sublist at least once.
matches =[]
def lettersThatMatchInEverySublist():
i=0
for candidate in candidates:
for sublist in listOfData:
for char in sublist:
pass
if char == candidate:
matches.append(candidate)
return matches
Easiest way - with sets
>>> valid_vals = tuple(set(row) for row in listOfData)
>>> candidates = set(['A', 'B', 'C', 'D'])
>>> for validator in valid_vals:
candidates &= validator
>>> candidates
set(['C', 'B'])
Here are a few guiding measures that can get you started, but beyond this, you will need to restate your problem more clearly.
Try using itertools for your listOfOptions:
import itertools
options = itertools.product('ACTG', repeat=3) # This finds all the combinations of A, C, T, and G.
listOfOptions = [''.join(option) for option in options] # This uses list comprehension to prepare your options.
Clean up the findKmersSet function:
def findKmersSet(k, dataset):
dataset = dataset.splitlines()
kmers = []
for line in dataset:
line_list = []
for i in range(len(line)-k+1):
line_list.append(line[i:i+k])
kmers.append(line_list)
return kmers

Clone elements of a list

Let's say I have a Python list that looks like this:
list = [ a, b, c, d]
I am looking for the most efficient way performanse wise to get this:
list = [ a, a, a, a, b, b, b, c, c, d ]
So if the list is N elements long then the first element is cloned N-1 times, the second element N-2 times, and so forth...the last element is cloned N-N times or 0 times. Any suggestions on how to do this efficiently on large lists.
Note that I am testing speed, not correctness. If someone wants to edit in a unit test, I'll get around to it.
pyfunc_fastest: 152.58769989 usecs
pyfunc_local_extend: 154.679298401 usecs
pyfunc_iadd: 158.183312416 usecs
pyfunc_xrange: 162.234091759 usecs
pyfunc: 166.495800018 usecs
Ignacio: 238.87629509 usecs
Ishpeck: 311.713695526 usecs
FabrizioM: 456.708812714 usecs
JohnKugleman: 519.239497185 usecs
Bwmat: 1309.29429531 usecs
Test code here. The second revision is trash because I was rushing to get everybody tested that posted after my first batch of tests. These timings are for the fifth revision of the code.
Here's the fastest version that I was able to get.
def pyfunc_fastest(x):
t = []
lenList = len(x)
extend = t.extend
for l in xrange(0, lenList):
extend([x[l]] * (lenList - l))
Oddly, a version that I modified to avoid indexing into the list by using enumerate ran slower than the original.
>>> items = ['a', 'b', 'c', 'd']
>>> [item for i, item in enumerate(items) for j in xrange(len(items) - i)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
First we use enumerate to pull out both indexes and values at the same time. Then we use a nested for loop to iterate over each item a decreasing number of times. (Notice that the variable j is never used. It is junk.)
This should be near optimal, with minimal memory usage thanks to the use of the enumerate and xrange generators.
How about this - A simple one
>>> x = ['a', 'b', 'c', 'd']
>>> t = []
>>> lenList = len(x)
>>> for l in range(0, lenList):
... t.extend([x[l]] * (lenList - l))
...
>>> t
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
>>>
Lazy mode:
import itertools
l = ['foo', 'bar', 'baz', 'quux']
for i in itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)):
print i
Just shove it through list() if you really do need a list instead.
list(itertools.chain.from_iterable(itertools.repeat(e, len(l) - i)
for i, e in enumerate(l)))
My first instinct..
l = ['a', 'b', 'c', 'd']
nl = []
i = 0
while len(l[i:])>0:
nl.extend( [l[i]]*len(l[i:]) )
i+=1
print nl
The trick is in using repeat from itertools
from itertools import repeat
alist = "a b c d".split()
print [ x for idx, value in enumerate(alist) for x in repeat(value, len(alist) - idx) ]
>>>['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
Use a generator: it's O(1) memory and O(N^2) cpu, unlike any solution that produces the final list which uses O(N^2) memory and cpu. This means it'll be massively faster as soon as the input list is large enough that the constructed list fills memory and swapping starts. It's unlikely you need to have the final list in memory unless this is homework.
def triangle(seq):
for i, x in enumerate(seq):
for _ in xrange(len(seq) - i - 1):
yield x
To create that new list, list = [ a, a, a, a, b, b, b, c, c, d ] would require O(4n) = O(n) time since for every n elements, you are creating 4n elements in the second array. aaronasterling gives that linear solution.
You could cheat and just not create the new list. Simply, get the index value as input. Divide the index value by 4. Use the result as the index value of the original list.
In pseudocode:
function getElement(int i)
{
int trueIndex = i / 4;
return list[trueIndex]; // Note: that integer division will lead us to the correct index in the original array.
}
fwiw:
>>> lst = list('abcd')
>>> [i for i, j in zip(lst, range(len(lst), 0, -1)) for _ in range(j)]
['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd']
def gen_indices(list_length):
for index in range(list_length):
for _ in range(list_length - index):
yield index
new_list = [list[i] for i in gen_indices(len(list))]
untested but I think it'll work

Categories