Sort characters by frequency [duplicate]

Sort characters by frequency [duplicate] - python

This question already has answers here:
Sorting a List by frequency of occurrence in a list
(7 answers)
Closed 4 years ago.
Input:
"tree"
Output:
"eert"
Explanation:
'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.
I tried something like this :
class Solution(object):
def frequencySort(self, s):
"""
:type s: str
:rtype: str
"""
has = dict()
l = list()
for c in s:
if c not in has:
has[c] = 1
else:
has[c] += 1
for k in sorted(has,key = has.get, reverse = True):
for i in range(has[k]):
l.extend(k)
return ("".join(l))
but its O(n * m)
n = length of string, m = maximum occurrence of a character
How can i improve this to order of n?

Is there a reason you cannot use the built in sort with a lambda key?
>>> a = 'aabbbcccccd'
>>> sorted(a, key=lambda c: a.count(c))
['d', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c']
>>> sorted(a, key=lambda c: a.count(c), reverse=True)
['c', 'c', 'c', 'c', 'c', 'b', 'b', 'b', 'a', 'a', 'd']
>>> ''.join(sorted(a, key=lambda c: a.count(c), reverse=True))
'cccccbbbaad'
I believe python's sort methods is O(n log n), but the count will make this O(n^2)

Related

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!

letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']

Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']

Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()

If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

Problems removing element while iterating over list [duplicate]

This question already has answers here:
Modify a list while iterating [duplicate]
(4 answers)
Closed 2 years ago.
As a beginner, I am writing a simple script to better acquaint myself with python. I ran the code below and I am not getting the expected output. I think the for-loop ends before the last iteration and I don't know why.
letters = ['a', 'b', 'c', 'c', 'c']
print(letters)
for item in letters:
if item != 'c':
print('not c')
else:
letters.remove(item)
continue
print(letters)
output returned:
['a', 'b', 'c', 'c', 'c']
not c
not c
['a', 'b', 'c']
Expected Output:
['a', 'b', 'c', 'c', 'c']
not c
not c
['a', 'b']
Basically, I am not expecting to have 'c' within my list anymore.
If you have a better way to write the code that would be appreciated as well.

WARNING: This is an inefficient solution that I will provide to answer your question. I'll post a more concise and faster solution in answer #2.
Answer #1
When you are removing items like this, it changes the length of the list, so it is better to loop backwards. Try for item in letters[::-1] to reverse the list:
letters = ['a', 'b', 'c', 'c', 'c']
print(letters)
for item in letters[::-1]:
if item != 'c':
print('not c')
else:
letters.remove(item)
continue
print(letters)
output:
['a', 'b', 'c', 'c', 'c']
not c
not c
['a', 'b']
Answer #2 - Use list comprehension instead of looping (more detail: Is there a simple way to delete a list element by value?):
letters = ['a', 'b', 'c', 'c', 'c']
letters = [x for x in letters if x != 'c']
output:
['a', 'b']

the letters.remove(item) removes only a single instance of the element, but has the unintentional effect of reducing the size of the list as you are iterating over it. This is something you want to generally avoiding doing, modifying the same element you are iterating over. As a result the list becomes shorter and the iterator believes you have traversed all of the elements, even though the last 'c' is still in the list. This is seen with the output of:
letters = ['a', 'b', 'c', 'c', 'c']
print(letters)
for idx,item in enumerate(letters):
print("Index: {} Len: {}".format(idx,len(letters)))
if item != 'c':
print('not c')
else:
letters.remove(item)
continue
print(letters)
"""Index: 0 Len: 5
not c
Index: 1 Len: 5
not c
Index: 2 Len: 5
Index: 3 Len: 4"""
You never iterate over the last element because the index (4) would exceed the indexable elements of the list (0-3 now)!
If you want to filter a list you can use the built in filter function:
filter(lambda x: x!='c', letters)

Split a list in python with an element as the delimiter? [duplicate]

This question already has answers here:
Python split for lists
(6 answers)
Closed 2 years ago.
I want to create sub-lists from a list which has many repeating elements, ie.
l = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
Wherever the 'a' begins the list should be split. (preferably removing 'a' but not a must)
As such:
l = [ ['b', 'c', 'c', 'b'], ['b', 'c', 'b'] ]
I have tried new_list = [x.split('a')[-1] for x in l] but I am not getting the desired "New list" effect.

When you write,
new_list = [x.split('a')[-1] for x in l]
you are essentially performing,
result = []
for elem in l:
result.append(elem.split('a')[-1])
That is, you're splitting each string contained in l on the letter 'a', and collecting the last element of each of the strings into the result.
Here's one possible implementation of the mechanic you're looking for:
def extract_parts(my_list, delim):
# Locate the indices of all instances of ``delim`` in ``my_list``
indices = [i for i, x in enumerate(my_list) if x == delim]
# Collect each end-exclusive sublist bounded by each pair indices
sublists = []
for i in range(len(indices)-1):
part = my_list[indices[i]+1:indices[i+1]]
sublists.append(part)
return sublists
Using this function, we have
>>> l = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
>>> extract_parts(l, 'a')
[['b', 'c', 'c', 'b'], ['b', 'c', 'b']]

You can use zip and enumerate to do that. Create a list of ids for separation and just break it at those points.
size = len(l)
id_list = [id + 1 for id, val in
enumerate(test_list) if val == 'a']
result = [l[i:j] for i, j in zip([0] + id_list, id_list +
([size] if id_list[-1] != size else []))]

It will not include the delimiter
import itertools
lst = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
delimiter = lst[0]
li=[list(value) for key,value in itertools.groupby(lst, lambda e: e == delimiter) if not key]
print(li)
Explanation: groupby function will create a new group each time key will change
Key value
True itertools._grouper object pointing to group 'a'
False itertools._grouper object pointing to group 'b', 'c', 'c', 'b'
True itertools._grouper object pointing to group 'a'
False itertools._grouper object pointing to group 'b', 'c', 'b'
True itertools._grouper object pointing to group 'a'
In if condition checking if the key is false, return the itertools._grouper object and then pass itertool object to list.

Create a counters array for each element you want to split at then write a condition in this fashion:
l = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
counters = [0,0,0] #to count instances
index = 0 #index of l
startIndex = 0 #where to start split
endIndex = 0 #where to end split
splitLists = [] #container for splits
for element in l:
if element == 'a': #find 'a'
counters[0] += 1 #increase counter
if counters[0] == 1: #if first instance
startIndex = index + 1 #start split after
if counters[0] == 2:
endIndex = index #if second instance
splitList = l[startIndex:endIndex] #end split here
counters[0] = 1 #make second starting location
startIndex = index + 1
splitLists.append(splitList) #append to main list of lists
index += 1
print(splitLists)
So basically you are finding the start and end index of the matching pattern within the list. You use these to split the list, and append this list into a main list of lists (2d list).

Python sort list with list.count does not work if list has items of equal occurrence [duplicate]

This question already has answers here:
Python list sort in descending order
(6 answers)
Sort Python list using multiple keys
(6 answers)
Closed 4 years ago.
I have been trying to sort a list of elements (string) according to their occurrence in Python3. I have been using the inbuilt sort() method with the string.count as key as shown below.
p = "acaabbcabac"
print(sorted(p, key=p.count))
# Output : ['c', 'b', 'b', 'c', 'b', 'c', 'a', 'a', 'a', 'a', 'a']
#But expected output is ['a','a','a','a','a','b','b','b','c','c','c']
p = "acaabbcb"
print(sorted(p, key=p.count))
# Output : ['c', 'c', 'a', 'a', 'a', 'b', 'b', 'b']
#Output is as expected
p = "ababab"
print(sorted(p, key=p.count))
# Output :['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b']
#But expected output is ['a','a','a','b','b','b']
What I have observed is, the above sort works as per the occurrence of the element, but it works only if the counts of each element is different. If the occurrence of any two or more elements is same, then they are listed in the same order they appear in the string/list.
Am I doing something wrong or is there a better approach at this ? I tried searching answers for this issue but I could not find and so am posting this here.
Thanks in advance.

Use a lambda function in your sorting key, where the first operation is p.count, and the second simply sorts on the element value (which ends up being alphabetical):
p = "ababab"
sorted(p, key = lambda x: [p.count, x])
# ['a', 'a', 'a', 'b', 'b', 'b']

Removing duplicates (not by using set)

My data look like this:
let = ['a', 'b', 'a', 'c', 'a']
How do I remove the duplicates? I want my output to be something like this:
['b', 'c']
When I use the set function, I get:
set(['a', 'c', 'b'])
This is not what I want.

One option would be (as derived from Ritesh Kumar's answer here)
let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]
which gives
>>> onlySingles
['b', 'c']

Try this,
>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>

Sort the input, then removing duplicates becomes trivial:
data = ['a', 'b', 'a', 'c', 'a']
def uniq(data):
last = None
result = []
for item in data:
if item != last:
result.append(item)
last = item
return result
print uniq(sorted(data))
# prints ['a', 'b', 'c']
This is basically the shell's cat data | sort | uniq idiom.
The cost is O(N * log N), same as with a tree-based set.

Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.
Count the number of occurrences and then filter on items that appear once...
>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']
You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.
If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:
>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort characters by frequency [duplicate] - python

Related

How to remove elements from a list that appear less than k = 2?

Problems removing element while iterating over list [duplicate]

Split a list in python with an element as the delimiter? [duplicate]

Python sort list with list.count does not work if list has items of equal occurrence [duplicate]

Removing duplicates (not by using set)

Categories

Resources