python: dedup and count a given list - python

I am using the following code to dedup and count a given list:
def my_dedup_count(l):
l.append(None)
new_l = []
current_x = l[0]
current_count = 1
for x in l[1:]:
if x == current_x:
current_count += 1
else:
new_l.append((current_x, current_count))
current_x = x
current_count = 1
return new_l
With my testing code:
my_test_list = ['a','a','b','b','b','c','c','d']
my_dedup_count(my_test_list)
result is:
[('a', 2), ('b', 3), ('c', 2), ('d', 1)]
The code is doing fine and the output is correct. However, I feel my code is quite lengthy and am wondering would anyone suggest a more elegant way to improve the above code? Thanks!

Yes, don't re-invent the wheel. Use the standard library instead; you want to use the collections.Counter() class here:
from collections import Counter
def my_dedup_count(l):
return Counter(l).items()
You may want to just return the counter itself and use all functionality it provides (such as giving you a key-count list sorted by counts).
If you expected only consecutive runs to be counted (so ['a', 'b', 'a'] results in [('a', 1), ('b', 1), ('a', 1)], then use itertools.groupby():
from itertools import groupby
def my_dedup_count(l):
return [(k, sum(1 for _ in g)) for k, g in groupby(l)]

I wrote two versions of some shorter ways to write what you accomplished.
This first option ignores ordering, and all like values in the list will be deduplicated.
from collections import defaultdict
def my_dedup_count(test_list):
foo = defaultdict(int)
for el in test_list:
foo[el] += 1
return foo.items()
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 4), ('c', 2), ('b', 3), ('d', 2)]
This second option respects order and only deduplicates consecutive duplicate values.
def my_dedup_count(my_test_list):
output = []
succession = 1
for idx, el in enumerate(my_test_list):
if idx+1 < len(my_test_list) and el == my_test_list[idx+1]:
succession += 1
else:
output.append((el, succession))
succession = 1
return output
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 2), ('b', 3), ('c', 2), ('d', 1), ('a', 2), ('d', 1)]

Related

Ngram in python with start_pad

i'm know in python i'm take some basic thing about list and tuple but my not full understand the my cod i want create list have three index in each index have tuple with tow index like this [('~','a'),('a','b'),('b','c')] the first index in tuple have tow char or the length context when have like this [('~a','a'),('ab','b'),('bc',' c')] can any one help my ? Her my code
def getNGrams(wordlist, n):
ngrams = []
padded_tokens = "~"*(n) + wordlist
t = tuple(wordlist)
for i in range(3):
t = tuple(padded_tokens[i:i+n])
ngrams.append(t)
return ngrams
IIUC, You can change the function like below and get what you want:
def getNGrams(wordlist, n):
ngrams = []
padded_tokens = "~"*n + wordlist
for idx, i in enumerate(range(len(wordlist))):
t = tuple((padded_tokens[i:i+n], wordlist[idx]))
ngrams.append(t)
return ngrams
print(getNGrams('abc',1))
print(getNGrams('abc',2))
print(getNGrams('abc',3))
Output:
[('~', 'a'), ('a', 'b'), ('b', 'c')]
[('~~', 'a'), ('~a', 'b'), ('ab', 'c')]
[('~~~', 'a'), ('~~a', 'b'), ('~ab', 'c')]

Markov Chain from String

I am currently sitting on a problem considering Markov chains were an input is given in the form of a list of strings. This input has to be transformed into a Markov chain. I have been sitting on this problem already a couple of hours.
My idea: As you can see below I have tried to use the counter from collections to count all transitions, which has worked. Now I am trying to count all the tuples where A and B are the first elements. This gives me all possible transitions for A.
Then I'll count the transitions like (A, B).
Then I want to use these to create a matrix with all probabilities.
def markov(seq):
states = Counter(seq).keys()
liste = []
print(states)
a = zip(seq[:-1], seq[1:])
print(list(a))
print(markov(["A","A","B","B","A","B","A","A","A"]))
So far I can't get the counting of the tuples to work.
Any help or new ideas on how to solve this is appreciated
To count the tuple, you can create another counter.
b = Counter()
for word_pair in a:
b[word_pair] += 1
b will keep the count of the pair.
To create the matrix, you can use numpy.
c = np.array([[b[(i,j)] for j in states] for i in states], dtype = float)
I will leave the task of normalizing each row sum to 1 as an exercise.
I didn't get exactly what you wanted but here is what I think it is:
from collections import Counter
def count_occurence(seq):
counted_states = []
transition_dict = {}
for tup in seq:
if tup not in counted_states:
transition_dict[tup] = seq.count(tup)
counted_states.append(tup)
print(transition_dict)
#{('A', 'A'): 3, ('A', 'B'): 2, ('B', 'B'): 1, ('B', 'A'): 2}
def markov(seq):
states = Counter(seq).keys()
print(states)
#dict_keys(['A', 'B'])
a = list(zip(seq[:-1], seq[1:]))
print(a)
#[('A', 'A'), ('A', 'B'), ('B', 'B'), ('B', 'A'), ('A', 'B'), ('B',
#'A'), ('A', 'A'), ('A', 'A')]
return a
seq = markov(["A","A","B","B","A","B","A","A","A"])
count_occurence(seq)

Populate list with tuples

I'm just fiddling with a simulation of (Mendel's First Law of Inheritance).
Before i can let the critters mate and analyze the outcome, the population has to be generated, i.e., a list has to be filled with varying numbers of three different types of tuples without unpacking them.
While trying to get familiar with itertools (I'll need combinations later in the mating part), I came up with the following solution:
import itertools
k = 2
m = 3
n = 4
hd = ('A', 'A') # homozygous dominant
het = ('A', 'a') # heterozygous
hr = ('a', 'a') # homozygous recessive
fhd = itertools.repeat(hd, k)
fhet = itertools.repeat(het, m)
fhr = itertools.repeat(hr, n)
population = [x for x in fhd] + [x for x in fhet] + [x for x in fhr]
which would result in:
[('A', 'A'), ('A', 'A'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a')]
Is there a more reasonable, pythonic or memory saving way to build the final list, e.g. without generating the lists of for the three types of individuals first?
You could use itertools.chain to combine the iterators:
population = list(itertools.chain(fhd, fhet, fhr))
Though I would say there's no need to use itertools.repeat when you could simply do [hd] * k. Indeed, I would approach this simulation as follows:
pops = (20, 30, 44)
alleles = (('A', 'A'), ('A', 'a'), ('a', 'a'))
population = [a for n, a in zip(pops, alleles) for _ in range(n)]
or perhaps
allele_freqs = ((20, ('A', 'A')),
(30, ('A', 'a')),
(44, ('a', 'a')))
population = [a for n, a in allele_freqs for _ in range(n)]
This should work I suppose.
pops = [2,3,4]
alleles = [('A','A'), ('A', 'a'), ('a','a')]
out = [pop*[allele] for pop, allele in zip(pops,alleles)]
print [item for sublist in out for item in sublist]
I have put the code on CodeBunk so you could run it too.
population = 2*[('A', 'A')] + 3*[('A', 'a')] + 4*[('a', 'a')]
or
hd = ('A', 'A') # homozygous dominant
het = ('A', 'a') # heterozygous
hr = ('a', 'a') # homozygous recessive
population = 2*[hd] + 3*[het] + 4*[hr]

Countletters(sorted)

Following is my coding for count letters and i need the output as
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
and my out put is
[('e', 1), ('g', 2), ('g', 2), ('l', 1), ('o', 2), ('o', 2)]
This is my code
def countLetters(word):
word=list(word)
word.sort()
trans=[]
for j in word:
row=[]
a=word.count(j)
row.append(j)
row.append(a)
trans.append(tuple(row))
return trans
can anyone explain me, how to get the expected output with my code?
Thank you
Why not just use a Counter?
Example:
from collections import Counter
c = Counter("Foobar")
print sorted(c.items())
Output:
[('F', 1), ('a', 1), ('b', 1), ('o', 2), ('r', 1)]
Another way is to use a dict, or better, a defaultdict (when running python 2.6 or lower, since Counter was added in Python 2.7)
Example:
from collections import defaultdict
def countLetters(word):
d = defaultdict(lambda: 0)
for j in word:
d[j] += 1
return sorted(d.items())
print countLetters("Foobar")
Output:
[('F', 1), ('a', 1), ('b', 1), ('o', 2), ('r', 1)]
Or use a simple list comprehension
word = "Foobar"
print sorted((letter, word.count(letter)) for letter in set(word))
>>> from collections import Counter
>>> Counter('google')
Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})
>>> from operator import itemgetter
>>> sorted(Counter('google').items(), key=itemgetter(0))
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
>>>
Actually, there is no need for key:
>>> sorted(Counter('google').items())
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
As tuples are sorted first by the first item, then by the second, etc.
def countLetters(word):
k=[]
Listing=[]
Cororo=[]
for warm in word:
if warm not in k:
k.append(warm)
for cold in range(len(k)):
word.count(k[cold])
Listing.append(word.count(k[cold]))
Cororo.append((k[cold],Listing[cold]))
return sorted(Cororo)
This is a bit of an old fashion way of doing this since you can use the counter module like the guy above me and make life easier.
You can modify your code like this (Python 2.5+):
def countLetters(word):
word=list(word)
word.sort()
trans=[]
for j in word:
row=[]
a=word.count(j)
row.append(j)
row.append(a)
trans.append(tuple(row))
ans = list(set(trans))
ans.sort()
return ans
The problem is you're not accounting for the duplicate occurrence of the letters in your j loop
I think a quick fix will be to modify the iteration as for j in set(word).
This ensures each letter is iterated once.
trans = list(set(trans))
Converting a list to a set removes duplicates (which I think is what you want to do).

List multiplication [duplicate]

This question already has answers here:
Operation on every pair of element in a list
(5 answers)
Closed 8 months ago.
I have a list L = [a, b, c] and I want to generate a list of tuples :
[(a,a), (a,b), (a,c), (b,a), (b,b), (b,c)...]
I tried doing L * L but it didn't work. Can someone tell me how to get this in python.
You can do it with a list comprehension:
[ (x,y) for x in L for y in L]
edit
You can also use itertools.product as others have suggested, but only if you are using 2.6 onwards. The list comprehension will work will all versions of Python from 2.0. If you do use itertools.product bear in mind that it returns a generator instead of a list, so you may need to convert it (depending on what you want to do with it).
The itertools module contains a number of helpful functions for this sort of thing. It looks like you may be looking for product:
>>> import itertools
>>> L = [1,2,3]
>>> itertools.product(L,L)
<itertools.product object at 0x83788>
>>> list(_)
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
Take a look at the itertools module, which provides a product member.
L =[1,2,3]
import itertools
res = list(itertools.product(L,L))
print(res)
Gives:
[(1,1),(1,2),(1,3),(2,1), .... and so on]
Two main alternatives:
>>> L = ['a', 'b', 'c']
>>> import itertools
>>> list(itertools.product(L, L))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]
>>> [(one, two) for one in L for two in L]
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]
>>>
the former one needs Python 2.6 or better -- the latter works in just about any Python version you might be tied to.
x = [a,b,c]
y = []
for item in x:
for item2 in x:
y.append((item, item2))
Maybe not the Pythonic way but working
Ok I tried :
L2 = [(x,y) for x in L for x in L] and this got L square.
Is this the best pythonic way to do this? I would expect L * L to work in python.
The most old fashioned way to do it would be:
def perm(L):
result = []
for i in L:
for j in L:
result.append((i,j))
return result
This has a runtime of O(n^2) and is therefore quite slow, but you could consider it to be "vintage" style code.

Categories