How to order an array and count it in Python? - python

I want to find only the top 3 distinct items in descending order. If there's a tiebreaker, sort by alphabetical order. If there are 3 items or fewer, returning the distinct list of items is sufficient.
So if I have input of: ["a","a","b","b","c","c","c","d","d","d","d"]
The output will be ["d","c","a"]
Because d has 4 counts, c 3 counts, a and b have the same frequency, but a is alphabetically first.
In MySQL, I would usually use this:
SELECT id, COUNT(*) as frequency FROM mylist GROUP BY id ORDER BY frequency, id
How can I do that in Python?
I use this code based on SAI SANTOH CHIRAG's solution:
def main(output):
arr = sorted(output,key=lambda i:[output.count(i),-ord(i)],reverse=True)
out = []
for i in arr:
if i not in out: out.append(i)
print(out[:3])
but why is the result like this:
Input (stdin) = a a a b b c d d d d
output = ['d']
['d']
['d']
['d']
['d', 'a']
['d', 'a']
['d', 'a']
['d', 'a', 'b']
['d', 'a', 'b']
['d', 'a', 'b']
instead of what I want, which would be:
['d','a','b']

You use sorted and key for that. Try in this way:
arr = sorted(x,key=lambda i:[x.count(i),-ord(i)],reverse=True)
With this you get all the elements in the sorted order in the increase of count and then alphabetical order. Then do this to get all elements only once:
out = []
for i in arr:
if i not in out:
out.append(i)
print(out[:3])

collections.Counter will do:
the_list = ["a","a","b","b","c","c","c","d","d","d","d"]
counter = Counter(sorted(the_list))
top_3 = counter.most_common(3)
at this point, top_3 is of the form [(<entry>, <freq>)] e.g.
[('d', 4), ('c', 3), ('a', 2)]
Take out the first elements from it via list comprehension:
result = [item for item, freq in top_3]
and we get
['d', 'c', 'a']
Notes:
We pass the sorted list to the Counter because otherwise it will break the ties according to the insertion order; sorting forces the insertion order to be the alphabetical order in a way.
.most_common(3) will return at most 3 elements so we are fine e.g. even if only 2 unique entries are there. E.g. if the_list = ["b", "a"], result will be ["a", "b"] even though number of unique elements is less than 3.

you can use Counter from collections
from collections import Counter
inputs = ["a","a","b","b","c","c","c","d","d","d","d"]
counts = Counter(x)
counts.most_common(3)
final = [i[0] for i in counts.most_common]
output for counts.most_common()
[('d', 4), ('c', 3), ('a', 2), ('b', 2)]

Simpler and more efficient than the accepted answer.
>>> a = ["a","a","b","b","c","c","c","d","d","d","d"]
>>> sorted(set(a), key=lambda s: (-a.count(s), s))[:3]
['d', 'c', 'a']
This removes duplicates first and thus counts each string only once. Also, much better to simply negate the count instead of the character code and using reverse sorting. If the strings had multiple characters, you couldn't even use the character code at all.
Another, using two simpler sorts (might actually be faster):
>>> sorted(sorted(set(a)), key=a.count, reverse=True)[:3]
['d', 'c', 'a']

Related

List of index where corresponding elements of two lists are same

I want to compare two different lists and return the indexes of similar stings.
For example, if I have two lists like:
grades = ['A', 'B', 'A', 'E', 'D']
scored = ['A', 'B', 'F', 'F', 'D']
My expected output is:
[0, 1, 4] #The indexes of similar strings in both lists
However this is the result I am getting at the moment:
[0, 1, 2, 4] #Problem: The 2nd index being counted again
I have tried coding using using two approaches.
First Approach:
def markGrades(grades, scored):
indices = [i for i, item in enumerate(grades) if item in scored]
return indices
Second Approach:
def markGrades(grades, scored):
indices = []
for i, item in enumerate(grades):
if i in scored and i not in indices:
indices.append(i)
return indices
The second approach returns correct strings but not the indexes.
You can use enumerate along with zip in list comprehension to achieve this as:
>>> grades = ['A', 'B', 'A', 'E', 'D']
>>> scored = ['A', 'B', 'F', 'F', 'D']
>>> [i for i, (g, s) in enumerate(zip(grades, scored)) if g==s]
[0, 1, 4]
Issue with your code is that you are not comparing the elements at the same index. Instead via using in you are checking whether elements of one list are present in another list or not.
Because 'A' at index 2 of grades is present in scored list. You are getting index 2 in your resultant list.
Your logic fails in that it doesn't check whether the elements are in the same position, merely that the grades element appears somewhere in scored. If you simply check corresponding elements, you can do this simply.
Using your second approach:
for i, item in enumerate(grades):
if item == scored[i]:
indices.append(i)
The solution that Anonymous gives is what I was about to add as the "Pythonic" way to solve the problem.
You can access the two lists in pairs (to avoid the over-generalization of finding a match anywhere in the other array) with zip
grades = ['A', 'B', 'A', 'E', 'D']
scored = ['A', 'B', 'F', 'F', 'D']
matches = []
for ix, (gr, sc) in enumerate(zip(grades,scored)):
if gr == sc:
matches.append(ix)
or more compactly with list comprehension, if that suits your purpose
matches = [ix for ix, (gr, sc) in enumerate(zip(grades,scored)) if gr == sc]

How to run a i<j loop in Python, and not repeat (j,i) if (i,j) has already been done?

I am trying to implement an "i not equal to j" (i<j) loop, which skips cases where i = j, but I would further like to make the additional requirement that the loop does not repeat the permutation of (j,i), if (i,j) has already been done (since, due to symmetry, these two cases give the same solution).
First Attempt
In the code to follow, I make the i<j loop by iterating through the following lists, where the second list is just the first list rolled ahead 1:
mylist = ['a', 'b', 'c']
np.roll(mylist,2).tolist() = ['b', 'c', 'a']
The sequence generated by the code below turns out to not be what I want:
import numpy as np
mylist = ['a', 'b', 'c']
for i in mylist:
for j in np.roll(mylist,2).tolist():
print(i,j)
since it returns a duplicate a a and has repeated permutations a b and b a:
a b
a c
a a
b b
b c
b a
c b
c c
c a
The desired sequence should instead be the pair-wise combinations of the elements in mylist, since for N=3 elements, there should only be N*(N-1)/2 = 3 pairs to loop through:
a b
a c
b c
You can do this using quite a hacky method just by removing the first element and appending it:
mylist.append(mylist.pop(0))
Where .append(...) will append an element to the end of a list, and .pop(...) will remove an element from a given index and return it.
You can read up about the builtin data structure functions here
You can use list.insert to help with left shift and right shift.
list.pop, removes the element from the original list and returns it as well. list.insert adds the returned element into the list at given index (0 or -1 in this case). NOTE: this operation is in place!
#Left shift
mylist = ['apples', 'guitar', 'shirt']
mylist.insert(-1,mylist.pop(0))
mylist
### ['guitar', 'apples', 'shirt']
#Right shift
mylist = ['apples', 'guitar', 'shirt']
mylist.insert(0,mylist.pop(-1))
mylist
### ['shirt', 'apples', 'guitar']
A better way to do this is with collections.deque. This will allow you to work with multiple shifts and has some other neat queue functions available as well.
from collections import deque
mylist = ['apples', 'guitar', 'shirt']
q = deque(mylist)
q.rotate(1) # Right shift ['shirt', 'apples', 'guitar']
q.rotate(-1) # Left shift ['guitar', 'shirt', 'apples']
q.rotate(3) #Right shift of 3 ['apples', 'guitar', 'shirt']
EDIT: Based on your comments, you are trying to get permutations -
from itertools import product
[i for i in product(l, repeat=2) if len(set(i))>1]
[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
OR
out = []
for i in l:
for j in l:
if len(set([i,j]))>1:
print(i,j)
a b
a c
b a
b c
c a
c b

counting a permutation 4 characters long with 2 characters on each result

from itertools import permutations
perm=permutations(['A','B','C','C','D','D','D','D'],4)
for i in perm:
print (i)
how could I print a permutation of which the value in perm only prints a series of letters with 2 characters (pardon my English)
example : ADDD,DADD,BDDD,CCDD,CDDD etc (only 2 characters for every permutation)
I think for this you will have to generate all combinations, and filter down to the condition you want.
Keep this bit the same:
from itertools import permutations
perm = permutations(['A', 'B', 'C', 'C', 'D', 'D', 'D', 'D'], 4)
But then keep only the elements which satisfy your condition, by using list comprehension. Convert the element to a set, and count the length of the set. An element like ('A', 'B', 'B') gets converted to {'A', 'B'}.
perm = [x for x in perm if len(set(x))==2]
for i in perm:
if len((set(list(i))) == 2:
print (i)
lets say i is ABAA
list(i) will result in [A,B,A,A]
set() of that will result in {A,B}
then len() of that will be 2

return a list of items without any elements with the same value next to each other

Implement the function unique_in_order which takes as argument a sequence and returns a list of items without any elements with the same value next to each other and preserving the original order of elements.
For example:
unique_in_order('AAAABBBCCDAABBB') == ['A', 'B', 'C', 'D', 'A', 'B']
unique_in_order('ABBCcAD') == ['A', 'B', 'C', 'c', 'A', 'D']
unique_in_order([1,2,2,3,3]) == [1,2,3]
my code return the correct output:
def unique_in_order(iterable):
list = []
for i in range(0, len(iterable)):
if iterable[i] != iterable[i-1]:
list.append(iterable[i])
return list
pass on test but it fails on attempt, saying:
should work with one element:
[] should equal ['A']
should reduce duplicates:
[] should equal ['A']
I wanna know what is wrong with my code, thanks
Use existing libraries to perform that task, like itertools.groupby
import itertools
def unique_in_order(iterable):
return [k for k,_ in itertools.groupby(iterable)]
print(unique_in_order('AAAABBBCCDAABBB')) # ['A', 'B', 'C', 'D', 'A', 'B']
print(unique_in_order(['A'])) # ['A']
With the default group key, groupby groups identical consecutive elements, yielding tuples with the value and the group of values (that we ignore here, we just need the key)

Increment the next element based on previous element

When looping through a list, you can work with the current item of the list. For example, if you want to replace certain items with others, you can use:
a=['a','b','c','d','e']
b=[]
for i in a:
if i=='b':
b.append('replacement')
else:
b.append(i)
print b
['a', 'replacement', 'c', 'd', 'e']
However, I wish the replace certain values not based on index i, but based on index i+1. I've been trying for ages and I can't seem to make it work. I would like something like this:
c=['a','b','c','d','e']
d=[]
for i in c:
if i+1=='b':
d.append('replacement')
else:
d.append(i)
print d
d=['replacement','b','c','d','e']
Is there any way to achieve this?
Use a list comprehension along with enumerate
>>> ['replacement' if a[i+1]=='b' else v for i,v in enumerate(a[:-1])]+[a[-1]]
['replacement', 'b', 'c', 'd', 'e']
The code replaces all those elements where the next element is b. However to take care of the last index and prevent IndexError, we just append the last element and loop till the penultimate element.
Without a list comprehension
a=['a','b','c','d','e']
d=[]
for i,v in enumerate(a[:-1]):
if a[i+1]=='b':
d.append('replacement')
else:
d.append(v)
d.append(a[-1])
print d
It's generally better style to not iterate over indices in Python. A common way to approach a problem like this is to use zip (or the similar izip_longest in itertools) to see multiple values at once:
In [32]: from itertools import izip_longest
In [33]: a=['a','b','c','d','e']
In [34]: b = []
In [35]: for c, next in izip_longest(a, a[1:]):
....: if next == 'd':
....: b.append("replacement")
....: else:
....: b.append(c)
....:
In [36]: b
Out[36]: ['a', 'b', 'replacement', 'd', 'e']
I think there's a confusion in your post between the list indices and list elements. In the loop as you have written it i will be the actual element (e.g. 'b') and not the index, thus i+1 is meaningless and will throw a TypeError exception.
I think one of the smallest set of changes you can do to your example to make it work is:
c = ['a', 'b', 'c', 'd', 'e']
d = []
for i, el in enumerate(c[:-1]):
if c[i + 1] == 'b':
d.append('replacement')
else:
d.append(el)
print d
# Output...
# ['replacement', 'b', 'c', 'd']
Additionally it's undefined how you should deal with the boundaries. Particularly when i points to the last element 'e', what should i+1 point to? There are many possible answers here. In the example above I've chosen one option, which is to end the iteration one element early (so we never point to the last element e).
If I was doing this I would do something similar to a combination of the other answers:
c = ['a', 'b', 'c', 'd', 'e']
d = ['replacement' if next == 'b' else current
for current, next in zip(c[:-1], c[1:]) ]
print d
# Output...
# ['replacement', 'b', 'c', 'd']
where I have used a list comprehension to avoid the loop, and zip on the list and a shifted list to avoid the explicit indices.
Try using index of current element to check for the next element in the list .
Replace
if i+1=='b':
with
if c[c.index(i)+1]=='b':

Categories