I have a little question about how to check and compare two or more characters in the list in Python.
For example, I have a string "cdcdccddd". I made a list from this string to easier comparing the characters. And the needed output is:
c: 1 d: 1 c: 1 d: 1 c: 2 d: 3
So it is counting the characters, if first is not the same as the second, the counter = 1, if the second is the same as third, then counter is +1 and need check the third with fourth and so on.
I got so far this algorithm:
text = "cdcdccddd"
l = []
l = list(text)
print list(text)
for n in range(0,len(l)):
le = len(l[n])
if l[n] == l[n+1]:
le += 1
if l[n+1] == l[n+2]:
le += 1
print l[n], ':' , le
else:
print l[n], ':', le
but its not working good, because its counts the first and second element, but not the second and third. For this output will be:
c : 1
d : 1
c : 1
d : 1
c : 2
c : 1
d : 3
How to make this algorithm better?
Thank you!
You can use itertools.groupby:
from itertools import groupby
s = "cdcdccddd"
print([(k, sum(1 for _ in v)) for k,v in groupby(s)])
[('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]
Consecutive chars will be grouped together, so each k is the char of that group, calling sum(1 for _ in v) gives us the length of each group so we end up with (char, len(group)) pairs.
If we run it in ipython and call list on each v it should be really clear what is happening:
In [3]: from itertools import groupby
In [4]: s = "cdcdccddd"
In [5]: [(k, list(v)) for k,v in groupby(s)]
Out[5]:
[('c', ['c']),
('d', ['d']),
('c', ['c']),
('d', ['d']),
('c', ['c', 'c']),
('d', ['d', 'd', 'd'])]
We can also roll our own pretty easily:
def my_groupby(s):
# create an iterator
it = iter(s)
# set consec_count, to one and pull first char from s
consec_count, prev = 1, next(it)
# iterate over the rest of the string
for ele in it:
# if last and current char are different
# yield previous char, consec_count and reset
if prev != ele:
yield prev,
consec_count, = 0
prev = ele
consec_count, += 1
yield ele, consec_count
Which gives us the same:
In [8]: list(my_groupby(s))
Out[8]: [('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]
That looks like a regular expression of repeating characters, so you can use a regex with repeated characters and then find the length of each match:
import re
text = "cdcdccddd"
matches = re.findall(r'(.)(\1*)', text)
result = ['{}: {}'.format(match[0], len(''.join(match))) for match in matches]
Result:
>>> print(*result, sep='\n')
c: 1
d: 1
c: 1
d: 1
c: 2
d: 3
First thing, strings are already lists in python, so you can just say for character in text: to get each of the characters out.
I would try something like this:
currentchar = text[0]
currentcount = 0
for c in text[1:]:
if c == currentchar:
currentcount += 1
else:
print(currentchar + ": " + str(currentcount+1))
currentchar = c
currentcount = 0
print(currentchar + ": " + str(currentcount+1))
Related
I've attached the problem statement in the image.
I somehow managed to get the "similar" output in the form of a list because dictionary needs to have a "unique" key all the time.
Here's my code:
l=[]
words=[a for a in 'aaabccdd']
for i in words:
l.append((i,words.count(i)))
h = list(set(l))
print(h)
output of my code is:
[('a', 3), ('d', 2), ('b', 1), ('c', 2)]
but I want it in the form of a dictionary like this :
{1:['b'], 2:['c', 'd'], 3:['a']}
where count(frequency) acts as a key to the dictionary and elements with common frequency comes in the form of a list as shown above.
Click here for image
Since you want to go from
[('a', 3), ('d', 2), ('b', 1), ('c', 2)]
to
{1:['b'], 2:['c', 'd'], 3:['a']}
Here is the piece of code to achieve this:
from collections import defaultdict
d = defaultdict(list)
for word, count in h:
d[count].append(word)
print(d)
Or the full program:
from collections import defaultdict
l=[]
words=[a for a in 'aaabccdd']
for i in words:
l.append((i,words.count(i)))
h = list(set(l))
d = defaultdict(list)
for word, count in h:
d[count].append(word)
print(d)
if you are not allowed to use defaultdict, you can use the following:
l=[]
words=[a for a in 'aaabccdd']
for i in words:
l.append((i,words.count(i)))
h = list(set(l))
d = {}
for word, count in h:
if count in d:
d[count].append(word)
else:
d[count] = [word]
print(d)
from collections import Counter
new_dict = {}
for key,val in dict(Counter('aaabccdd')).items():
if val not in new_dict:
new_dict[val] = key
elif type(new_dict[val]) == list:
new_dict[val].append(key)
else:
new_dict[val] = [new_dict[val]]+[key]
print (new_dict)
output:
{3: 'a', 1: 'b', 2: ['c', 'd']}
dna_string = 'ATGCTTCAGAAAGGTCTTACG'
length = len(dna_string)
print("There are %d letters in this DNA string." % length)
print('Now here are the amounts for the letters "A", "C", "T", "G" in order.\n')
combien_a = dna_string.count('A')
combien_c = dna_string.count('C')
combien_t = dna_string.count('T')
combien_g = dna_string.count('G')
print(str(combien_a) + ' ' + str(combien_c) + ' ' + str(combien_g) + ' ' + str(combien_t))
you can try this.
dna_string = 'ATGCTTCAGAAAGGTCTTACG'
print(*[dna_string.count(a) for a in ['A','C','T','G']],sep=" ")
You can add these to a list like:
combien = [dna_string.count(x) for x in ['A','C','T','G']]
collections.Counter is useful for counting numbers.
from collections import Counter
dna_string = 'ATGCTTCAGAAAGGTCTTACG'
# Counter object works like a dictionary with element as key and count as value
combien = Counter(dna_string)
print(f"There are {len(dna_string)} letters in this DNA string.")
# you can convert the Counter to a list of (elem, count) tuples
print(list(combien.items())
Output:
There are 21 letters in this DNA string.
[('A', 6), ('T', 6), ('G', 5), ('C', 4)]
Elements are ordered from higher count to lower count in Counter, if you want another order for the result you may sort like this:
print(sorted(list(combien.items()), key=lambda x: ["A", "C", "T", "G"].index(x[0])))
Output:
[('A', 6), ('C', 4), ('T', 6), ('G', 5)]
The first function is able to separate each letter of a string and list how many times that letter appears. For example:
print(rlencode("Hello!"))
[('H', 1), ('e', 1), ('l', 2), ('o', 1), ('!', 1)]
How do I get rldecode(rle): do the the complete opposite of rlencode(s) so that rldecode(rlencode(x)) == x returns True
def rlencode(s):
"""
signature: str -> list(tuple(str, int))
"""
string=[]
count=1
for i in range(1,len(s)):
if s[i] == s[i-1]:
count += 1
else:
string.append((s[i-1], count))
count=1
if i == len(s)-1:
string.append((s[i], count))
return string
def rldecode(rle):
"""
#signature: list(tuple(str, int)) -> str
#"""
string=" "
count=1
for i in rle:
if i == rle:
string += i
return string
You can use the fact that you can multiply a string by a number to repeat it and use `''.join() to bring the elements of the list together.
To show the effect of string multiplication, I multiplied "a" by 5
"a"*5 #'aaaaa'
Using that in a comprehension will give you
str = [char[0]*char[1] for char in rle] #['H', 'e', 'll', 'o', '!']
Then add in the ''.join() and you have your answer.
l = [('H', 1), ('e', 1), ('l', 2), ('o', 1), ('!', 1)]
str = ''.join(char[0]*char[1] for char in rle) #'Hello!'
So your function would be
def rldecode(rle):
"""
signature: list(tuple(str, int)) -> str
"""
return ''.join(char[0]*char[1] for char in rle)
Also, if you would like to make your rlencode a little cleaner, you can simplify it a little bit by using enumerate to help you keep your position in the string and check if you're about to hit either a new character or the end of the string. You just have to increment the counter on each loop.
def rlencode(s):
output = []
count = 0
for i, char in enumerate(s):
count += 1
if (i == (len(s)-1)) or (char != s[i+1]):
output.append((char, count))
count = 0
return output
Use join:
b = [('H', 1), ('e', 1), ('l', 2), ('o', 1), ('!', 1)]
''.join([c[0] * c[1] for c in b])
Hello!
You can also use list comprehensions for your initial function.
You can use collections.Counter.elements():
from collections import Counter
l = [('H', 1), ('e', 1), ('l', 2), ('o', 1), ('!', 1)]
print(''.join(Counter(dict(l)).elements()))
This outputs:
Hello!
A simple, readable solution is to iterate over all of the tuples in the list returned by rlencode and construct a new string from each letter (and it's frequency) like so:
def rldecode(rle):
string = ''
for letter, n in rle:
string += letter*n
return string
An answer that's easy to read but also accounts for ordering in the problem:
def rlencode(s):
"""
signature: str -> list(tuple(str, int, list(int)))
"""
result=[]
frequency=1
for i in range(len(s)):
letters = [item[0] for item in result]
if s[i] in letters:
idx = letters.index(s[i])
frequency=result[idx][1]
frequency+=1
positions= result[idx][2]
positions.append(i)
result[idx] = (s[i],count,lst)
else:
result.append((s[i],1,[i]))
return result
def rldecode(rle):
"""
#signature: list(tuple(str, int, list(int))) -> str
#"""
frequencies = [i[1] for i in rle]
total_length = sum(frequencies)
char_list=[None]*total_length
for c in rle:
for pos in c[2]:
char_list[pos] = c[0]
return "".join(char_list)
text = "This is a lot of text where ordering matters"
encoded = rlencode(text)
print(encoded)
decoded = rldecode(encoded)
print(decoded)
I adapted it from the answer posted by #Brian Cohan
It should be noted that the answer is computationally expensive because of .index() if letter grows really long as explained in this SO post
I am using the following code to dedup and count a given list:
def my_dedup_count(l):
l.append(None)
new_l = []
current_x = l[0]
current_count = 1
for x in l[1:]:
if x == current_x:
current_count += 1
else:
new_l.append((current_x, current_count))
current_x = x
current_count = 1
return new_l
With my testing code:
my_test_list = ['a','a','b','b','b','c','c','d']
my_dedup_count(my_test_list)
result is:
[('a', 2), ('b', 3), ('c', 2), ('d', 1)]
The code is doing fine and the output is correct. However, I feel my code is quite lengthy and am wondering would anyone suggest a more elegant way to improve the above code? Thanks!
Yes, don't re-invent the wheel. Use the standard library instead; you want to use the collections.Counter() class here:
from collections import Counter
def my_dedup_count(l):
return Counter(l).items()
You may want to just return the counter itself and use all functionality it provides (such as giving you a key-count list sorted by counts).
If you expected only consecutive runs to be counted (so ['a', 'b', 'a'] results in [('a', 1), ('b', 1), ('a', 1)], then use itertools.groupby():
from itertools import groupby
def my_dedup_count(l):
return [(k, sum(1 for _ in g)) for k, g in groupby(l)]
I wrote two versions of some shorter ways to write what you accomplished.
This first option ignores ordering, and all like values in the list will be deduplicated.
from collections import defaultdict
def my_dedup_count(test_list):
foo = defaultdict(int)
for el in test_list:
foo[el] += 1
return foo.items()
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 4), ('c', 2), ('b', 3), ('d', 2)]
This second option respects order and only deduplicates consecutive duplicate values.
def my_dedup_count(my_test_list):
output = []
succession = 1
for idx, el in enumerate(my_test_list):
if idx+1 < len(my_test_list) and el == my_test_list[idx+1]:
succession += 1
else:
output.append((el, succession))
succession = 1
return output
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 2), ('b', 3), ('c', 2), ('d', 1), ('a', 2), ('d', 1)]
I'm trying to loop over a list of strings:
someplace["Canada", "USA", "England"]
for charNum in range(len(lst)): #charNum the length of the characters
print(lst[charNum])
for country in range(len([charNum])): #country is the the string
print(lst[charNum][country])
the output I'm trying to achieve is this:
c U E
a S n
n A g
a l
d a
a n
d
more details:
for k in range(len(lst[0])):
print(lst[0][k])
If run this it would print out
c
a
n
a
d
a
this because it's getting the length of the index 0. But I have to be to loop through the other numbers: 0, 1, 2.
I made some progress I created a nested for-loop:
for i in range(len(lst)): # length of list
for j in range(len(lst[i])): # length of each string
print(lst[i][j])
use itertools.izip_longest to loop over all simultaneously
from itertools import izip_longest # zip_longest in python 3
places = ["Canada", "USA", "England"]
for chars in izip_longest(*places, fillvalue=' '):
print(' '.join(chars))
Output:
C U E
a S n
n A g
a l
d a
a n
d
The Process:
The output of izip_longest is:
[('C', 'U', 'E'), ('a', 'S', 'n'), ('n', 'A', 'g'), ('a', ' ', 'l'), ('d', ' ', 'a'), ('a', ' ', 'n'), (' ', ' ', 'd')]
The for loop then assigns each "row" to chars sequentially, starting with ('C', 'U', 'E')
' '.join(chars) combines that tuple into a string, with spaces between each list member. For the first element, that would be 'C U E'
To Internate a List, You Need to have One.
my_list = ["Paris"]
SomePlaces = ['Canada', 'USA', 'England'] //Array
for SomePlaces in range(len(lst)):
print 'Places :', lst[SomePlaces]
Note: I didn't test it, sorry if I'm wrong.