How to use Run Length Encoding in Python the recursive ways

How to use Run Length Encoding in Python the recursive ways - python

I want to make a Run Length encoding but using recursive ways for some reason, but I can't figure out how to transfer my code from looping to recursive. This is for python. This is looping one, I really want to make it to recursive.
def runLengthEncoding(words):
mylist=[]
count=1
for i in range(1,len(words)):
if words[i] == words[i-1]:
count=count+1
else:
mylist.append(words[i-1])
mylist.append(count)
count=1
if words:
mylist.append(words[-1])
mylist.append(count)
return mylist
I expect the answer ['A', 7, 'B', 3, 'C', 1, 'E', 1, 'Z', 1] for runLengthEncoding("AAAAAAABBBCEZ"). Just like the answer from the last code. but I just want to change the code to recursive ways.

What about builtin function?
from collections import Counter
letter_counter = Counter(list("AAAAAAABBBCEZ"))
print(dict(letter_counter))
result is {'A': 7, 'B': 3, 'C': 1, 'E': 1, 'Z': 1}

This could have solved easily by other methods but since you are particular about recursive solution and a list form in the end, here is my solution.
String = "AAAAAAABBBCEZ"
Global_List = []
StartWord = String[0]
Count = 0
def RecursiveLength(String):
global Global_List
global StartWord
global Count
if len(String)==0:
Global_List.append(StartWord)
Global_List.append(Count)
return
else:
if String[0] == StartWord:
Count += 1
String = String[1:]
return RecursiveLength(String)
else:
Global_List.append(StartWord)
Global_List.append(Count)
StartWord = String[0]
Count = 1
String = String[1:]
return RecursiveLength(String)
RecursiveLength(String)
print(Global_List)
This gave me the following output. However there are better ways than recursion to solve this.
['A', 7, 'B', 3, 'C', 1, 'E', 1, 'Z', 1]
All the best

You'd better put the result into a dictionary. And you can use str.count() to calculate the number of "char" in a string. Code is as below:
data = "AAAAAAABBBCEZ"
# delet the duplicated characters in data
chrs = ''
chrs = [chrs + x for x in data if x not in chrs]
res = {chr: data.count(chr) for chr in chrs}
print(res)
output
{'A': 7, 'B': 3, 'C': 1, 'E': 1, 'Z': 1}

Related

Python: regex condition to find lower case/digit before capital letter

I would like to split a string in python and make it into a dictionary such that a key is any chunk of characters between two capital letters and the value should be the number of occurrences of these chunk in the string.
As an example: string = 'ABbACc1Dd2E' should return this: {'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}
I have found two working solution so far (see below), but I am looking for a more general/elegant solution to this, possibly a one-line regex condition.
Thank you

Solution 1
string = 'ABbACc1Dd2E'
string = ' '.join(string)
for ii in re.findall("([A-Z] [a-z])",string) + \
re.findall("([A-Z] [0-9])",string) + \
re.findall("([a-x] [0-9])",string):
new_ii = ii.replace(' ','')
string = string.replace(ii, new_ii)
string = string.split()
all_dict = {}
for elem in string:
all_dict[elem] = all_dict[elem] + 1 if elem in all_dict.keys() else 1
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}
Solution 2
string = 'ABbACc1Dd2E'
all_upper = [ (pos,char) for (pos,char) in enumerate(string) if char.isupper() ]
all_dict = {}
for (pos,char) in enumerate(string):
if (pos,char) in all_upper:
new_elem = char
else:
new_elem += char
if pos < len(string) -1 :
if string[pos+1].isupper():
all_dict[new_elem] = all_dict[new_elem] + 1 if new_elem in all_dict.keys() else 1
else:
pass
else:
all_dict[new_elem] = all_dict[new_elem] + 1 if new_elem in all_dict.keys() else 1
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Thanks to usr2564301 for this suggestion:
The right regex is '[A-Z][a-z]*\d*'
import re
string = 'ABbACc1Dd2E'
print(re.findall(r'[A-Z][a-z]*\d*', string))
['A', 'Bb', 'A', 'Cc1', 'Dd2', 'E']
One can then use itertools.groupby to make an iterator that returns consecutive keys and groups from the iterable.
from itertools import groupby
all_dict = {}
for i,j in groupby(re.findall(r'[A-Z][a-z]*\d*', string)):
all_dict[i] = all_dict[i] + 1 if i in all_dict.keys() else 1
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}
Ultimately, one could use sorted() to get this in one line with the correct counting:
print({i:len(list(j)) for i,j in groupby(sorted(re.findall(r'[A-Z][a-z]*\d*', string))) })
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Update a dictionary with values from a list in Python

I have a Dictionary here:
dic = {'A':1, 'B':6, 'C':42, 'D':1, 'E':12}
and a list here:
lis = ['C', 'D', 'C', 'C', 'F']
What I'm trying to do is (also a requirement of the homework) to check whether the values in the lis matches the key in dic, if so then it increment by 1 (for example there's 3 'C's in the lis then in the output of dic 'C' should be 45). If not, then we create a new item in the dic and set the value to 1.
So the example output should be look like this:
dic = {'A':1, 'B':6, 'C':45, 'D':2, 'E':12, 'F':1}
Here's what my code is:
def addToInventory(dic, lis):
for k,v in dic.items():
for i in lis:
if i == k:
dic[k] += 1
else:
dic[i] = 1
return dic
and execute by this code:
dic = addToInventory(dic,lis)
It compiles without error but the output is strange, it added the missing F into the dic but didn't update the values correctly.
dic = {'A':1, 'B':6, 'C':1, 'D':1, 'E':12, 'F':1}
What am I missing here?

There's no need to iterate over a dictionary when it supports random lookup. You can use if x in dict to do this. Furthermore, you'd need your return statement outside the loop.
Try, instead:
def addToInventory(dic, lis):
for i in lis:
if i in dic:
dic[i] += 1
else:
dic[i] = 1
return dic
out = addToInventory(dic, lis)
print(out)
{'A': 1, 'B': 6, 'C': 45, 'D': 2, 'E': 12, 'F': 1}
As Harvey suggested, you can shorten the function a little by making use of dict.get.
def addToInventory(dic, lis):
for i in lis:
dic[i] = dic.get(i, 0) + 1
return dic
The dic.get function takes two parameters - the key, and a default value to be passed if the value associated with that key does not already exist.
If your professor allows the use of libraries, you can use the collections.Counter data structure, it's meant precisely for keeping counts.
from collections import Counter
c = Counter(dic)
for i in lis:
c[i] += 1
print(dict(c))
{'A': 1, 'B': 6, 'C': 45, 'D': 2, 'E': 12, 'F': 1}

My NLTK code almost does what I need it to, but not quite

Code:
def add_lexical_features(fdist, feature_vector):
for word, freq in fdist.items():
fname = "unigram:{0}".format(word)
if selected_features == None or fname in selected_features:
feature_vector[fname] = 1
if selected_features == None or fname in selected_features:
feature_vector[fname] = float(freq) / fdist.N()
print(feature_vector)
if __name__ == '__main__':
file_name = "restaurant-training.data"
p = process_reviews(file_name)
for i in range(0, len(p)):
print(p[i]+ "\n")
uni_dist = nltk.FreqDist(p[0])
feature_vector = {}
x = add_lexical_features(uni_dist, feature_vector)
What this is trying to do is output the frequency of words in the list of reviews (p being the list of reviews, p[0] being the string). And this works....except it does it by letter, not my word.
I am still new to NLTK, so this might be obvious, but I really can't get it.
For example, this currently outputs a large list of things like:
{'unigram:n': 0.0783132530120482}
This is fine, and I think that is the right number (number of time n appears over total letters) but I want it to be by word, not by letter.
Now, I also want it do it by bigrams, once I can get it working by single words, making the double words might be easy, but I am not quite seeing it, so some guidance their would be nice.
Thanks.

The input to nltk.FreqDist should be a list of strings, not just a string. See the difference:
>>> import nltk
>>> uni_dist = nltk.FreqDist(['the', 'dog', 'went', 'to', 'the', 'park'])
>>> uni_dist
FreqDist({'the': 2, 'went': 1, 'park': 1, 'dog': 1, 'to': 1})
>>> uni_dist2 = nltk.FreqDist('the dog went to the park')
>>> uni_dist2
FreqDist({' ': 5, 't': 4, 'e': 3, 'h': 2, 'o': 2, 'a': 1, 'd': 1, 'g': 1, 'k': 1, 'n': 1, ...})
You can convert your string into a list of individual words using split.
Side note: I think you might want to be calling nltk.FreqDist on p[i] rather than p[0].

If Statement on Dict

Trying to write function that returns True if word in list and only made up of letters in hand. I am fine on checking if word in list, but cannot figure out how to iterate through to check the second part. The below is incorrectly returning True:
word = 'chayote'
hand = {'a': 1, 'c': 2, 'u': 2, 't': 2, 'y': 1, 'h': 1, 'z': 1, 'o': 2}
list = ['peach', 'chayote']
def ValidWord(word, hand, list):
if word in list:
for i in word:
if i in hand:
return True
return False
else:
return False
ValidWord(word, hand, list)

The simplest way to solve this would be to use collections.Counter, like this
from collections import Counter
def is_valid_word(word, hand, list):
if word in my_list:
return len(Counter(word) - Counter(hand)) == 0
return False
my_list = ['peach', 'chayote']
hand = {'a': 1, 'c': 2, 'u': 2, 't': 2, 'y': 1, 'h': 1, 'z': 1, 'o': 2}
print is_valid_word("chayote", hand, my_list)
# False
print is_valid_word("peach", hand, my_list)
# False

I think you can use all to do this, might be more concise:
if all(i in hand for i in words):
return True
If the count matters, see #thefourtheye's answer.

You can change the returns inside for loop:
word = 'chayote'
word2 = 'peach'
hand = {'a': 1, 'c': 2, 'u': 2, 't': 2, 'y': 1, 'h': 1, 'z': 1, 'o': 2, 'e': 2} # I added 'e' so chayote is true and peach is false
list = ['peach', 'chayote']
def ValidWord(word, hand, list):
if word in list:
for i in word:
if i not in hand:
return False
return True
else:
return False
print(ValidWord(word, hand, list))
print(ValidWord(word2, hand, list))

If count matters expand Gnijuohz' solution:
def ValidWord(word,hand,list):
return word in list and all(i in hand for i in word) and all(hand[key]>=word.count(key) for key in hand.keys())
This will return False if a letter appears more often in word than the value hand[letter].
If you want to have at least the amount of each letter in the word specified by the hand dictionary, just change it to hand[key]<=word.count(key).
As far as I see the last way would be equivalent to the solution by thefourtheye using Counter

Program optimisation and working of dictionary in adding key value pairs

this is my program on counting the number of vowels
'''Program to count number of vowels'''
str=input("Enter a string\n")
a=0
e=0
i=0
o=0
u=0
for x in str:
if x=='a':
a=a+1
continue
if x=='e':
e=e+1
continue
if x=='i':
i=i+1
continue
if x=='o':
o=o+1
continue
if x=='u':
u=u+1
continue
count={}
if a>0:
count['a']=a
if e>0:
count['e']=e
if i>0:
count['i']=i
if o>0:
count['o']=o
if u>0:
count['u']=u
print(count)
How can I improve the initial loop for comparison along with the process of filling the dictionary.
While running the program several times I have obtained the following output:
>>>
Enter a string
abcdefgh
{'e': 1, 'a': 1}
>>> ================================ RESTART ================================
>>>
Enter a string
abcdefghijklmnopqrstuvwxyz
{'u': 1, 'a': 1, 'o': 1, 'e': 1, 'i': 1}
>>> ================================ RESTART ================================
>>>
Enter a string
abcdeabcdeiopiop
{'a': 2, 'o': 2, 'i': 2, 'e': 2}
From this I could not figure out how exactly are the key value pairs being added to the dictionary count against my expectation of:
Case 1:
{'a':1, 'e':1}
Case 2:
{'a':1, 'e':1, 'i':1, 'o':1, 'u':1}
Case 3:
{'a':2, 'e':2, 'i':2, 'o':2}
Any help is appreciated.

>>> import collections
>>> s = "aacbed"
>>> count = collections.Counter(c for c in s if c in "aeiou")
>>> count
Counter({'a': 2, 'e': 1})
Or - if you really need to maintain insertion order:
>>> s = 'debcaa'
>>> count=collections.OrderedDict((c, s.count(c)) for c in s if c in "aeiou")
>>> count
OrderedDict([('e', 1), ('a', 2)])
Finally if you want lexicographic ordering, you can either turn your dict/counter/ OrderedDict into a list of tuples:
>>> sorted(count.items())
[('a', 2), ('e', 1)]
and if you want a lexicographically OrderedDict:
>>> sorted_count = collections.OrderedDict(sorted(count.items()))
>>> sorted_count
OrderedDict([('a', 2), ('e', 1)])

A more Pythonic way to do what you want is:
'''Program to count number of vowels'''
s = input("Enter a string\n")
count = {v: s.count(v) for v in "aeiou" if s.count(v) > 0}
print(count)
You shouldn't use str as a variable name, as that is the name of the built-in string type.

Just put a=0 e=0 i=0 o=0 u=0 inside a dictionary like that:
myDict = {'a':0, 'e':0, 'i':0, 'o':0, 'u':0}
for x in string:
myDict[x] += 1
print myDict
If the value is not one of the following then a raise of KeyError will come up.
So you can do something like that:
myDict = {'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0}
for x in string:
try:
myDict[x] += 1
except KeyError:
continue
print myDict
Note: I've changed the name str to string
You can also see a very good solution by #Amber here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use Run Length Encoding in Python the recursive ways - python

What about builtin function? from collections import Counter letter_counter = Counter(list("AAAAAAABBBCEZ")) print(dict(letter_counter)) result is {'A': 7, 'B': 3, 'C': 1, 'E': 1, 'Z': 1}

Related

Python: regex condition to find lower case/digit before capital letter

Update a dictionary with values from a list in Python

My NLTK code almost does what I need it to, but not quite

If Statement on Dict

Program optimisation and working of dictionary in adding key value pairs

Categories

Resources