Run Length encoding of symbols - python

I am trying to write a run length encoding code using python.If a message consist of long sequence of symbols. I am meant to encode it to the as a list of the symbol and the number of times it occurs.This is my code
alphabets = ['a','b','c','d','e','f','g','h','i','j','k',
'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
char_count = 0
translate = ''
words = input('Enter your word: ')
for char in words:
if char in alphabets:
char_count += 1
translate += char + str(char_count)
print(translate)
When I run my program this is what I get.
Enter your word: abbbbaaabbaaa
a1b2b3b4b5a6a7a8b9b10a11a12a13
The output is actually meant to be.
a1b4a3b2a3
Is there a way to fix this?

You can simply use regular expressions to solve the problem:
import re
translate = re.sub(r"((.)\2*)", lambda x: x.group(2) + str(len(x.group(1))), words)
This regex finds all groups of similar consecutive symbols in the words string and replaces them by its length encoding.

One possible way is to use itertools.groupby:
from itertools import groupby
''.join([f'{letter}{len(list(grouper))}' for letter, grouper in groupby(words)])
Explanation
itertools.groupby splits the string into chunks of same letters, converts each chunk into a pair (letter, grouper) and returns an object generating these pairs:
>>> groupby('abbbbaaabbaaa')
<itertools.groupby at 0x6fffeafa098>
>>> for chunk in groupby('abbbbaaabbaaa'):
print(chunk)
('a', <itertools._grouper object at 0x6fffeaf2cf8>)
('b', <itertools._grouper object at 0x6fffeae9908>)
('a', <itertools._grouper object at 0x6fffeae9898>)
('b', <itertools._grouper object at 0x6fffeaf2320>)
('a', <itertools._grouper object at 0x6fffeae9898>)
Each itertools._grouper object is again a generator which generates all the letters in the corresponding chunk. By converting it to a list, we can check its length and append it to the result.

Related

Continuous letter check for items in list [duplicate]

This question already has answers here:
Determine prefix from a set of (similar) strings
(11 answers)
Closed 2 years ago.
I need to know how to identify prefixes in strings in a list. For example,
list = ['nomad', 'normal', 'nonstop', 'noob']
Its answer should be 'no' since every string in the list starts with 'no'
I was wondering if there is a method that iterates each letter in strings in the list at the same time and checks each letter is the same with each other.
Use os.path.commonprefix it will do exactly what you want.
In [1]: list = ['nomad', 'normal', 'nonstop', 'noob']
In [2]: import os.path as p
In [3]: p.commonprefix(list)
Out[3]: 'no'
As an aside, naming a list "list" will make it impossible to access the list class, so I would recommend using a different variable name.
Here is a code without libraries:
for i in range(len(l[0])):
if False in [l[0][:i] == j[:i] for j in l]:
print(l[0][:i-1])
break
gives output:
no
There is no built-in function to do this. If you are looking for short python code that can do this for you, here's my attempt:
def longest_common_prefix(words):
i = 0
while len(set([word[:i] for word in words])) <= 1:
i += 1
return words[0][:i-1]
Explanation: words is an iterable of strings. The list comprehension
[word[:i] for word in words]
uses string slices to take the first i letters of each string. At the beginning, these would all be empty strings. Then, it would consist of the first letter of each word. Then the first two letters, and so on.
Casting to a set removes duplicates. For example, set([1, 2, 2, 3]) = {1, 2, 3}. By casting our list of prefixes to a set, we remove duplicates. If the length of the set is less than or equal to one, then they are all identical.
The counter i just keeps track of how many letters are identical so far.
We return words[0][i-1]. We arbitrarily choose the first word and take the first i-1 letters (which would be the same for any word in the list). The reason that it's i-1 and not i is that i gets incremented before we check if all of the words still share the same prefix.
Here's a fun one:
l = ['nomad', 'normal', 'nonstop', 'noob']
def common_prefix(lst):
for s in zip(*lst):
if len(set(s)) == 1:
yield s[0]
else:
return
result = ''.join(common_prefix(l))
Result:
'no'
To answer the spirit of your question - zip(*lst) is what allows you to "iterate letters in every string in the list at the same time". For example, list(zip(*lst)) would look like this:
[('n', 'n', 'n', 'n'), ('o', 'o', 'o', 'o'), ('m', 'r', 'n', 'o'), ('a', 'm', 's', 'b')]
Now all you need to do is find out the common elements, i.e. the len of set for each group, and if they're common (len(set(s)) == 1) then join it back.
As an aside, you probably don't want to call your list by the name list. Any time you call list() afterwards is gonna be a headache. It's bad practice to shadow built-in keywords.

Create a list of lists of tuples where each tuple is the first occurrence of a letter along with its row and column in the list of lists

I need to write a function that creates a list of tuples of the first occurrence of a letter followed by its row and column in a list of lists.
Example Input and Output:
#Input:
lot2 = [['.','M','M','H','H'],
['A','.','.','.','f'],
['B','C','D','.','f']]
#Output: [('M', 0, 1), ('H', 0, 3), ('f', 1, 4), ('B', 2, 0)]
As you can see the function should only look for the first occurrence of a letter and not all occurrences. Thanks for any help.
Code:
letter = '.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def list_cars(lst):
for y, row in enumerate(lst):
if letter in row:
return letter, y, row.index(letter)
First off, use the string library to get a string of all upper and lower case letters:
import string
string.ascii_letters
Out[40]: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
collector = []
output_list = []
for i in lot2:
for j in i:
if j in string.ascii_letters and j not in collector:
tmp = (j,lot2.index(i), i.index(j))
output_list.append(tmp)
collector.append(j)
output_list should give you what you want.
edit: If you want to also capture full-stops use string.printable - although this will give you a string that consists of additional punctuation and white space characters as well.

How to find the position of a character and list the position in reverse order in Python?

How can I get the position of a character inside a string in python, and list the position in reverse order? Also how can I make it look for both uppercase and lowercase character in the string?
e.g.: if I put in AvaCdefh, and I look for 'a' (both uppercase and lowercase), and return the position for a in my initial string. In this example 'a' is located in 0 and 2 position, so how can I make python to return it as '2 0' (with space)?
This is easily achieved using the re module:
import re
x = "AvaCdefh"
" ".join([str(m.start()) for m in re.finditer("[Aa]",x)][::-1])
... which produces:
'2 0'
The list is reversed before constructing the string using the method described in the second answer to How can I reverse a list in python?.
You can use string.index() to find the first character.
w= "AvaCdefh"
To change string to upper case
print w.upper() #Output: AVACDEFH
To change string to lower case
print w.lower() #Output: avacdefh
To find the first charchter using python built-in function:
print w.lower().index('a') #Output: 0
print w.index('a') #Output: 2
To reverse a word
print w[::-1] #Output: hfedCavA
But you can do this using comprehension list:
char='a'
# Finding a character in the word
findChar= [(c,index) for index,c in enumerate(list(w.lower())) if char==c ]
# Finding a character in the reversed word
inverseFindChar = [(c,index) for index,c in enumerate(list(w[::-1].lower())) if char==c ]
print findChar #Output: [('a', 0), ('a', 2)]
print inverseFindChar #Output: [('a', 5), ('a', 7)]
The other way to do it using lambda.
l = [index for index,c in enumerate(list(w.lower())) if char==c ]
ll= map(lambda x:w[x], l)
print ll #Output: ['A', 'a']
Then, you can wrap this as a function:
def findChar(char):
return " ".join([str(index) for index,c in enumerate(list(w.lower())) if char==c ])
def findCharInReversedWord(char):
return " ".join([str(index) for index,c in enumerate(list(w[::-1].lower())) if char==c ])
print findChar('a')
print findChar('c')
print findCharInReversedWord('a')

What is the inbuilt .count in python?

I've been solving problems in checkio.com and one of the questions was: "Write a function to find the letter which occurs the maximum number of times in a given string"
The top solution was:
import string
def checkio(text):
"""
We iterate through latin alphabet and count each letter in the text.
Then 'max' selects the most frequent letter.
For the case when we have several equal letter,
'max' selects the first from they.
"""
text = text.lower()
return max(string.ascii_lowercase, key=text.count)
I didn't understand what text.count is when it is used as the key in the max function.
Edit: Sorry for not being more specific. I know what the program does as well as the function of str.count(). I want to know what text.count is. If .count is a method then shouldn't it be followed by braces?
The key=text.count is what is counting the number of times all the letters appear in the string, then you take the highest number of all those numbers to get the most frequent letter that has appeared.
When the following code is run, the result is e, which is, if you count, the most frequent letter.
import string
def checkio(text):
"""
We iterate through latin alphabet and count each letter in the text.
Then 'max' selects the most frequent letter.
For the case when we have several equal letter,
'max' selects the first from they.
"""
text = text.lower()
return max(string.ascii_lowercase, key=text.count)
print checkio('hello my name is heinst')
A key function in max() is called for each element to provide an alternative to determine the maximum by, which in this case isn't all that efficient.
Essentially, the line max(string.ascii_lowercase, key=text.count) can be translated to:
max_character, max_count = None, -1
for character in string.ascii_lowercase:
if text.count(character) > max_count:
max_character = character
return max_character
where str.count() loops through the whole of text counting how often character occurs.
You should really use a multiset / bag here instead; in Python that's provided by the collections.Counter() type:
max_character = Counter(text.lower()).most_common(1)[0][0]
The Counter() takes O(N) time to count the characters in a string of length N, then to find the maximum, another O(K) to determine the highest count, where K is the number of unique characters. Asymptotically speaking, that makes the whole process take O(N) time.
The max() approach takes O(MN) time, where M is the length of string.ascii_lowercase.
Use the Counter function from the collections module.
>>> import collections
>>> word = "supercalafragalistic"
>>> c = collections.Counter(word)
>>> c.most_common()
[('a', 4), ('c', 2), ('i', 2), ('l', 2), ('s', 2), ('r', 2), ('e', 1), ('g', 1), ('f', 1), ('p', 1), ('u', 1), ('t', 1)]
>>> c.most_common()[0]
('a', 4)

Manipulating counter information - Python 2.7

I'm fairly new to Python and I have this program that I was tinkering with. It's supposed to get a string from input and display which character is the most frequent.
stringToData = raw_input("Please enter your string: ")
# imports collections class
import collections
# gets the data needed from the collection
letter, count = collections.Counter(stringToData).most_common(1)[0]
# prints the results
print "The most frequent character is %s, which occurred %d times." % (
letter, count)
However, if the string has 1 of each character, it only displays one letter and says it's the most frequent character. I thought about changing the number in the parenthesis in most_common(number), but I didn't want more to display how many times the other letters every time.
Thank you to all that help!
As I explained in the comment:
You can leave off the parameter to most_common to get a list of all characters, ordered from most common to least common. Then just loop through that result and collect the characters as long as the counter value is still the same. That way you get all characters that are most common.
Counter.most_common(n) returns the n most common elements from the counter. Or in case where n is not specified, it will return all elements from the counter, ordered by the count.
>>> collections.Counter('abcdab').most_common()
[('a', 2), ('b', 2), ('c', 1), ('d', 1)]
You can use this behavior to simply loop through all elements, ordered by their count. As long as the count is the same as of the first element in the output, you know that the element still ocurred in the same quantity in the string.
>>> c = collections.Counter('abcdefgabc')
>>> maxCount = c.most_common(1)[0][1]
>>> elements = []
>>> for element, count in c.most_common():
if count != maxCount:
break
elements.append(element)
>>> elements
['a', 'c', 'b']
>>> [e for e, c in c.most_common() if c == maxCount]
['a', 'c', 'b']

Categories