Countletters(sorted) - python

Following is my coding for count letters and i need the output as
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
and my out put is
[('e', 1), ('g', 2), ('g', 2), ('l', 1), ('o', 2), ('o', 2)]
This is my code
def countLetters(word):
word=list(word)
word.sort()
trans=[]
for j in word:
row=[]
a=word.count(j)
row.append(j)
row.append(a)
trans.append(tuple(row))
return trans
can anyone explain me, how to get the expected output with my code?
Thank you

Why not just use a Counter?
Example:
from collections import Counter
c = Counter("Foobar")
print sorted(c.items())
Output:
[('F', 1), ('a', 1), ('b', 1), ('o', 2), ('r', 1)]
Another way is to use a dict, or better, a defaultdict (when running python 2.6 or lower, since Counter was added in Python 2.7)
Example:
from collections import defaultdict
def countLetters(word):
d = defaultdict(lambda: 0)
for j in word:
d[j] += 1
return sorted(d.items())
print countLetters("Foobar")
Output:
[('F', 1), ('a', 1), ('b', 1), ('o', 2), ('r', 1)]
Or use a simple list comprehension
word = "Foobar"
print sorted((letter, word.count(letter)) for letter in set(word))

>>> from collections import Counter
>>> Counter('google')
Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})
>>> from operator import itemgetter
>>> sorted(Counter('google').items(), key=itemgetter(0))
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
>>>
Actually, there is no need for key:
>>> sorted(Counter('google').items())
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
As tuples are sorted first by the first item, then by the second, etc.

def countLetters(word):
k=[]
Listing=[]
Cororo=[]
for warm in word:
if warm not in k:
k.append(warm)
for cold in range(len(k)):
word.count(k[cold])
Listing.append(word.count(k[cold]))
Cororo.append((k[cold],Listing[cold]))
return sorted(Cororo)
This is a bit of an old fashion way of doing this since you can use the counter module like the guy above me and make life easier.

You can modify your code like this (Python 2.5+):
def countLetters(word):
word=list(word)
word.sort()
trans=[]
for j in word:
row=[]
a=word.count(j)
row.append(j)
row.append(a)
trans.append(tuple(row))
ans = list(set(trans))
ans.sort()
return ans

The problem is you're not accounting for the duplicate occurrence of the letters in your j loop
I think a quick fix will be to modify the iteration as for j in set(word).
This ensures each letter is iterated once.

trans = list(set(trans))
Converting a list to a set removes duplicates (which I think is what you want to do).

Related

Find the most common element in list of lists

This is my list
a=[ ['a','b','a','a'],
['c','c','c','d','d','d']]
I wanna find most common elemments.
I have tried this
from collections import Counter
words = ['hello', 'hell', 'owl', 'hello', 'world', 'war', 'hello',
'war','aa','aa','aa','aa']
counter_obj = Counter(words)
counter_obj.most_common()
but it works just for simple list.
my output should be like this
[('a', 3), ('c', 3), ('d', 3), ('b', 1)]
Apply Counter().update() option on the elements of your list,
Based on suggestion from #BlueSheepToken
from collections import Counter
words = [['a','b','a','a'],['c','c','c','d','d','d']]
counter = Counter(words[0])
for i in words[1:]:
counter.update(i)
counter.most_common()
output:
[('a', 3), ('c', 3), ('d', 3), ('b', 1)]
itertools.chain.from_iterable
collections.Counter accepts any iterable of hashable elements. So you can chain your list of lists via itertools.chain. The benefit of this solution is it works for any number of sublists.
from collections import Counter
from itertools import chain
counter_obj = Counter(chain.from_iterable(a))
print(counter_obj.most_common())
[('a', 3), ('c', 3), ('d', 3), ('b', 1)]

Compare List of Tuples and Return Indices of Matched Values

I'm new to programming and am having some trouble with this exercise. The goal is to write a function that returns a list of matching items.
Items are defined by a tuple with a letter and a number and we consider item 1 to match item 2 if:
Both their letters are vowels (aeiou), or both are consonants
AND
The sum of their numbers is a multiple of 3
NOTE: The return list should not include duplicate matches --> (1,2) contains the same information as (2,1), the output list should only contain one of them.
Here's an example:
***input:*** [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
***output:*** [(0,4), (1,2), (3,5)]
Any help would be much appreciated!
from itertools import combinations
lst = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
vowels = 'aeiou'
matched = [(i[0],j[0]) for (i,j) in combinations(enumerate(lst),2) if (i[1][0] in vowels) == (j[1][0] in vowels) and ((i[1][1] + j[1][1]) % 3 == 0)]
print(matched)
Sorry, I'm high enough rep to comment, but i'll edit / update once I can.
Im a little confused about the question, what is the purpose of the letters, should we be using their positon in the alphabet as their value? i.e a=0, b=1?
what are we comparing one tuple to?
Thanks
You can use itertools.combinations with enumerate to iterate all combinations and output indices. Combinations do not include permutations, so you will not see duplicates.
from itertools import combinations
lst = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
def checker(lst):
vowels = set('aeiou')
for (idx_i, i), (idx_j, j) in combinations(enumerate(lst), 2):
if ((i[0] in vowels) == (j[0] in vowels)) and ((i[1] + j[1]) % 3 == 0):
yield idx_i, idx_j
res = list(checker(lst))
# [(0, 4), (1, 2), (3, 5)]

python: dedup and count a given list

I am using the following code to dedup and count a given list:
def my_dedup_count(l):
l.append(None)
new_l = []
current_x = l[0]
current_count = 1
for x in l[1:]:
if x == current_x:
current_count += 1
else:
new_l.append((current_x, current_count))
current_x = x
current_count = 1
return new_l
With my testing code:
my_test_list = ['a','a','b','b','b','c','c','d']
my_dedup_count(my_test_list)
result is:
[('a', 2), ('b', 3), ('c', 2), ('d', 1)]
The code is doing fine and the output is correct. However, I feel my code is quite lengthy and am wondering would anyone suggest a more elegant way to improve the above code? Thanks!
Yes, don't re-invent the wheel. Use the standard library instead; you want to use the collections.Counter() class here:
from collections import Counter
def my_dedup_count(l):
return Counter(l).items()
You may want to just return the counter itself and use all functionality it provides (such as giving you a key-count list sorted by counts).
If you expected only consecutive runs to be counted (so ['a', 'b', 'a'] results in [('a', 1), ('b', 1), ('a', 1)], then use itertools.groupby():
from itertools import groupby
def my_dedup_count(l):
return [(k, sum(1 for _ in g)) for k, g in groupby(l)]
I wrote two versions of some shorter ways to write what you accomplished.
This first option ignores ordering, and all like values in the list will be deduplicated.
from collections import defaultdict
def my_dedup_count(test_list):
foo = defaultdict(int)
for el in test_list:
foo[el] += 1
return foo.items()
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 4), ('c', 2), ('b', 3), ('d', 2)]
This second option respects order and only deduplicates consecutive duplicate values.
def my_dedup_count(my_test_list):
output = []
succession = 1
for idx, el in enumerate(my_test_list):
if idx+1 < len(my_test_list) and el == my_test_list[idx+1]:
succession += 1
else:
output.append((el, succession))
succession = 1
return output
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 2), ('b', 3), ('c', 2), ('d', 1), ('a', 2), ('d', 1)]

Iterate over OrderedDict in Python

I have the following OrderedDict:
OrderedDict([('r', 1), ('s', 1), ('a', 1), ('n', 1), ('y', 1)])
This actually presents a frequency of a letter in a word.
In the first step - I would take the last two elements to create a union tuple like this;
pair1 = list.popitem()
pair2 = list.popitem()
merge_list = (pair1[0],pair2[0])
new_pair = {}
new_pair[merge_list] = str(pair1[1] + pair2[1])
list.update(new_pair);
This created for me the following OrderedList:
OrderedDict([('r', 1), ('s', 1), ('a', 1), (('y', 'n'), '2')])
I would like now to iterate over the elements, each time taking the last three and deciding based on the lower sum of the values what is the union object.
For instance the above list will turn to;
OrderedDict([('r', 1), (('s', 'a'), '2'), (('y', 'n'), '2')])
but the above was:
OrderedDict([ ('r', 1), ('s', 2), ('a', 1), (('y', 'n'), '2')])
The result would be:
OrderedDict([('r', 1), ('s', 2), (('a','y', 'n'), '3')])
as I want the left ones to have the smaller value
I tried to do it myself but doesn't understand how to iterate from end to beginning over an OrderedDict.
How can I do it?
EDITED
Answering the comment:
I get a dictionary of frequency of a letter in a sentence:
{ 's':1, 'a':1, 'n':1, 'y': 1}
and need to create a huffman tree from it.
for instance:
((s,a),(n,y))
I am using python 3.3
Simple example
from collections import OrderedDict
d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
for key, value in d.items():
print key, value
Output:
a 1
b 2
c 3
how to iterate from end to beginning over an OrderedDict ?
Either:
z = OrderedDict( ... )
for item in z.items()[::-1]:
# operate on item
Or:
z = OrderedDict( ... )
for item in reversed(z.items()):
# operate on item
You can iterate using enumerate and iteritems:
dict = OrderedDict()
# ...
for i, (key, value) in enumerate(dict.iteritems()):
# Do what you want here
For Python 3.x
d = OrderedDict( ... )
for key, value in d.items():
print(key, value)
For Python 2.x
d = OrderedDict( ... )
for key, value in d.iteritems():
print key, value
Note that, as noted in the comments by adsmith, this is probably an instance of an XY Problem and you should reconsider your data structures.
Having said that, if you need to operate only on last three elements, then you don't need to iterate. For example:
MergeInfo = namedtuple('MergeInfo', ['sum', 'toMerge1', 'toMerge2', 'toCopy'])
def mergeLastThree(letters):
if len(letters) < 3:
return False
last = letters.popitem()
last_1 = letters.popitem()
last_2 = letters.popitem()
sum01 = MergeInfo(int(last[1]) + int(last_1[1]), last, last_1, last_2)
sum12 = MergeInfo(int(last_1[1]) + int(last_2[1]), last_1, last_2, last)
sum02 = MergeInfo(int(last[1]) + int(last_2[1]), last, last_2, last_1)
mergeInfo = min((sum01, sum12, sum02), key = lambda s: s.sum)
merged = ((mergeInfo.toMerge1[0], mergeInfo.toMerge2[0]), str(mergeInfo.sum))
letters[merged[0]] = merged[1]
letters[mergeInfo.toCopy[0]] = mergeInfo.toCopy[1]
return True
Then having:
letters = OrderedDict([('r', 1), ('s', 1), ('a', 1), ('n', 1), ('y', 1)])
print letters
mergeLastThree(letters)
print letters
mergeLastThree(letters)
print letters
Produces:
>>> OrderedDict([('r', 1), ('s', 1), ('a', 1), ('n', 1), ('y', 1)])
OrderedDict([('r', 1), ('s', 1), (('y', 'n'), '2'), ('a', 1)])
OrderedDict([('r', 1), (('a', 's'), '2'), (('y', 'n'), '2')])
And to merge the whole structure completely you need to just:
print letters
while mergeLastThree(letters):
pass
print letters
Which gives:
>>> OrderedDict([('r', 1), ('s', 1), ('a', 1), ('n', 1), ('y', 1)])
OrderedDict([((('a', 's'), 'r'), '3'), (('y', 'n'), '2')])
>>>

Smart way to delete tuples

I having a list of tuple as describes below (This tuple is sorted in decreasing order of the second value):
from string import ascii_letters
myTup = zip (ascii_letters, range(10)[::-1])
threshold = 5.5
>>> myTup
[('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), \
('i', 1), ('j', 0)]
Given a threshold, what is the best possible way to discard all tuples having the second value less than this threshold.
I am having more than 5 million tuples and thus don't want to perform comparison tuple by tuple basis and consequently delete or add to another list of tuples.
Since the tuples are sorted, you can simply search for the first tuple with a value lower than the threshold, and then delete the remaining values using slice notation:
index = next(i for i, (t1, t2) in enumerate(myTup) if t2 < threshold)
del myTup[index:]
As Vaughn Cato points out, a binary search would speed things up even more. bisect.bisect would be useful, except that it won't work with your current data structure unless you create a separate key sequence, as documented here. But that violates your prohibition on creating new lists.
Still, you could use the source code as the basis for your own binary search. Or, you could change your data structure:
>>> myTup
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'),
(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
>>> index = bisect.bisect(myTup, (threshold, None))
>>> del myTup[:index]
>>> myTup
[(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
The disadvantage here is that the deletion may occur in linear time, since Python will have to shift the entire block of memory back... unless Python is smart about deleting slices that start from 0. (Anyone know?)
Finally, if you're really willing to change your data structure, you could do this:
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd'), (-5, 'e'), (-4, 'f'),
(-3, 'g'), (-2, 'h'), (-1, 'i'), (0, 'j')]
>>> index = bisect.bisect(myTup, (-threshold, None))
>>> del myTup[index:]
>>> myTup
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd')]
(Note that Python 3 will complain about the None comparison, so you could use something like (-threshold, chr(0)) instead.)
My suspicion is that the linear time search I suggested at the beginning is acceptable in most circumstances.
Here's an exotic approach that wraps the list in a list-like object before performing bisect.
import bisect
def revkey(items):
class Items:
def __getitem__(self, index):
assert 0 <= index < _len
return items[_max-index][1]
def __len__(self):
return _len
def bisect(self, value):
return _len - bisect.bisect_left(self, value)
_len = len(items)
_max = _len-1
return Items()
tuples = [('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), ('i', 1), ('j', 0)]
for x in range(-2, 12):
assert len(tuples) == 10
t = tuples[:]
stop = revkey(t).bisect(x)
del t[stop:]
assert t == [item for item in tuples if item[1] >= x]
Maybe a bit faster code than of #Curious:
newTup=[]
for tup in myTup:
if tup[1]>threshold:
newTup.append(tup)
else:
break
Because the tuples are ordered, you do not have to go through all of them.
Another possibility would also be, to use bisection, and find the index i of last element, which is above threshold. Then you would do:
newTup=myTup[:i]
I think the last method would be the fastest.
Given the number of tuples you're dealing with, you may want to consider using NumPy.
Define a structured array like
my_array= np.array(myTup, dtype=[('f0',"|S10"), ('f1',float)])
You can access the second elements of your tuples with myarray['f1'] which gives you a float array. Youcan know use fancy indexing techniques to filter the elements you want, like
my_array[myarray['f1'] < threshold]
keeping only the entries where your f1 is less than your threshold..
You can also use itertools e.g.
from itertools import ifilter
iterable_filtered = ifilter(lambda x : x[1] > threshold, myTup)
If you wanted an iterable filtered list or just:
filtered = filter(lambda x: x[1] > threshold, myTup)
to go straight to a list.
I'm not too familiar with the relative performance of these methods and would have to test them (e.g. in IPython using %timeit).

Categories