Python: regex condition to find lower case/digit before capital letter - python

I would like to split a string in python and make it into a dictionary such that a key is any chunk of characters between two capital letters and the value should be the number of occurrences of these chunk in the string.
As an example: string = 'ABbACc1Dd2E' should return this: {'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}
I have found two working solution so far (see below), but I am looking for a more general/elegant solution to this, possibly a one-line regex condition.
Thank you

Solution 1
string = 'ABbACc1Dd2E'
string = ' '.join(string)
for ii in re.findall("([A-Z] [a-z])",string) + \
re.findall("([A-Z] [0-9])",string) + \
re.findall("([a-x] [0-9])",string):
new_ii = ii.replace(' ','')
string = string.replace(ii, new_ii)
string = string.split()
all_dict = {}
for elem in string:
all_dict[elem] = all_dict[elem] + 1 if elem in all_dict.keys() else 1
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}
Solution 2
string = 'ABbACc1Dd2E'
all_upper = [ (pos,char) for (pos,char) in enumerate(string) if char.isupper() ]
all_dict = {}
for (pos,char) in enumerate(string):
if (pos,char) in all_upper:
new_elem = char
else:
new_elem += char
if pos < len(string) -1 :
if string[pos+1].isupper():
all_dict[new_elem] = all_dict[new_elem] + 1 if new_elem in all_dict.keys() else 1
else:
pass
else:
all_dict[new_elem] = all_dict[new_elem] + 1 if new_elem in all_dict.keys() else 1
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Thanks to usr2564301 for this suggestion:
The right regex is '[A-Z][a-z]*\d*'
import re
string = 'ABbACc1Dd2E'
print(re.findall(r'[A-Z][a-z]*\d*', string))
['A', 'Bb', 'A', 'Cc1', 'Dd2', 'E']
One can then use itertools.groupby to make an iterator that returns consecutive keys and groups from the iterable.
from itertools import groupby
all_dict = {}
for i,j in groupby(re.findall(r'[A-Z][a-z]*\d*', string)):
all_dict[i] = all_dict[i] + 1 if i in all_dict.keys() else 1
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}
Ultimately, one could use sorted() to get this in one line with the correct counting:
print({i:len(list(j)) for i,j in groupby(sorted(re.findall(r'[A-Z][a-z]*\d*', string))) })
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Related

Python: How can I sort the values in a dictionary according to size and display the 3 largest values?

For example: I have the following string s = "I do not understand this". I would now like to display the 3 most frequently occurring characters in the string s, without using "sorted" or packages.
I have the following code:
s = "I do not understand this"
d = {}
spaces = " "
for b in s:
if b not in spaces:
if b not in d:
d[b] = 1
else:
d[b] = d[b]+ 1
But I don't know how to go on.
You can create your own list with (key, value) tuples where you'll define the order. You'll then be able to convert this list back to a sorted dictionary:
s = "I do not understand this"
d = {}
for b in s:
if b != " ":
d[b] = d.get(b, 0) + 1
# the sorting part:
most_occurring_letters = [list(d.items())[0]] # list with first key/value of your dict
# going through all other keys
for b in list(d)[1:]:
for i, (letter, count) in enumerate(most_occurring_letters):
if d[b] >= d[letter]:
most_occurring_letters.insert(i, (b, d[b]))
break
print(most_occurring_letters[:3]) # the 3 most occurring elements
Output:
[('t', 3), ('n', 3), ('d', 3)]
Convert to dict to get a sorted dictionary:
print(dict(most_occurring_letters))
Output:
{'t': 3, 'n': 3, 'd': 3, 's': 2, 'o': 2, 'i': 1, 'h': 1, 'a': 1, 'r': 1, 'e': 1, 'u': 1, 'I': 1}

why doesn't this Python dictionary comprehension work for counting words in an input()?

I have to count the how many times a word is used in a sentence. I have found the solution, however i want the code to be more concise and want to solve it using dictionary comprehension. Can someone please help me understand what is wrong with the following code?
user_input = input().lower().split()
usage_dict = dict()
usage_dict = {word:usage_dict.get(word, 0) + 1 for word in user_input}
print(usage_dict)
Input : a aa abC aa ac abc bcd a
Output : {'a': 1, 'aa': 1, 'abc': 1, 'ac': 1, 'bcd': 1}
Expected Output: {'a': 2, 'aa': 2, 'abc': 2, 'ac': 1, 'bcd': 1}
Problem
The usage_dict dictionnary is created once the dict-comprehension is ended, there is no concept of temporary state of the dict, so usage_dict.get(word, 0) always gives 0
That is working only in a for loop:
usage_dict = {}
for word in user_input:
usage_dict[word] = usage_dict.get(word, 0) + 1
Solutions
Use list.count()
usage_dict = {word: user_input.count(word) for word in set(user_input)}
Use collections.Counter which uses the same for loop that above
usage_dict = Counter(user_input)
Try to use Counter
from collections import Counter
user_input = input().lower().split()
usage_dict = Counter(user_input)
print(usage_dict)
Or:
usage_dict = {x:user_input.count(x) for x in user_input}
Returns
{'a': 2, 'aa': 2, 'abc': 2, 'ac': 1, 'bcd': 1}
Use Counter. See below
from collections import Counter
data = 'a aa abC aa ac abc bcd a'.split()
c = Counter(data)
print(c)
output
Counter({'a': 2, 'aa': 2, 'abC': 1, 'ac': 1, 'abc': 1, 'bcd': 1})
You could use a dictionary comprehension:
user_input = input().lower().split()
usage_dict = {word: user_input.count(word) for word in user_input}
print(usage_dict)
Or use collections.Counter:
import collections
user_input = input().lower().split()
usage_dict = collections.Counter(user_input)
print(usage_dict)
Both codes output:
{'a': 2, 'aa': 2, 'abc': 2, 'ac': 1, 'bcd': 1}
The reason it doesn't work is because getting the count of value in the middle of a dictionary comprehension doesn't work, because it has to first finish the dictionary comprehension then assign it as a variable named usage_dict.

Ignore Whitespace while counting number of characters in a String

I am trying to write a function which will count the number of characters present in an input string and store as key-value in a dictionary.The code is partially working i.e it is also counting the whitespaces present in between 2 words.How do I avoid counting the whitespaces?
#Store Characters of a string in a Dictionary
def char_dict(string):
char_dic = {}
for i in string:
if i in char_dic:
char_dic[i]+= 1
else:
char_dic[i]= 1
return char_dic
print(char_dict('My name is Rajib'))
You could just continue if the character is a white space:
def char_dict(string):
char_dic = {}
for i in string:
if ' ' == i:
continue
if i in char_dic:
char_dic[i] += 1
else:
char_dic[i]= 1
return char_dic
print(char_dict('My name is Rajib')) # {'j': 1, 'm': 1, 'M': 1, 'i': 2, 'b': 1, 'e': 1, 'a': 2, 'y': 1, 'R': 1, 'n': 1, 's': 1}
A cleaner solution would be:
from collections import defaultdict
def countNonSpaceChars(string):
charDic = defaultdict(lambda: 0)
for char in string:
if char.isspace():
continue
charDic[char] += 1
return dict(charDic)
print(countNonSpaceChars('My name is Rajib')) # {'i': 2, 'a': 2, 'R': 1, 'y': 1, 'M': 1, 'm': 1, 'e': 1, 'n': 1, 'j': 1, 's': 1, 'b': 1}
You can delete space -> string = string.replace (" ","")
def char_dict(string):
char_dic = {}
string=string.replace(" ","")
for i in string:
if i in char_dic:
char_dic[i]+= 1
else:
char_dic[i]= 1
return char_dic
print(char_dict('My name is Rajib'))
To simplify things for you, there's a library called collections that has a Counter function that will produce a dictionary of values and their occurrences in a string. Then, I would simply remove the whitespace key from the dictionary if it is present using the del keyword.
from collections import Counter
def char_dict(string):
text = 'My name is Rajib'
c = Counter(text)
if ' ' in c: del c[' ']
print(char_dict('My name is Rajib'))
This method is very readable and doesn't require too much reinventing.

Nested for-loops and dictionaries in finding value occurrence in string

I've been tasked with creating a dictionary whose keys are elements found in a string and whose values count the number of occurrences per value.
Ex.
"abracadabra" → {'r': 2, 'd': 1, 'c': 1, 'b': 2, 'a': 5}
I have the for-loop logic behind it here:
xs = "hshhsf"
xsUnique = "".join(set(xs))
occurrences = []
freq = []
counter = 0
for i in range(len(xsUnique)):
for x in range(len(xs)):
if xsUnique[i] == xs[x]:
occurrences.append(xs[x])
counter += 1
freq.append(counter)
freq.append(xsUnique[i])
counter = 0
This does exactly what I want it to do, except with lists instead of dictionaries. How can I make it so counter becomes a value, and xsUnique[i] becomes a key in a new dictionary?
The easiest way is to use a Counter:
>>> from collections import Counter
>>> Counter("abracadabra")
Counter({'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1})
If you can't use a Python library, you can use dict.get with a default value of 0 to make your own counter:
s="abracadabra"
count={}
for c in s:
count[c] = count.get(c, 0)+1
>>> count
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
Or, you can use dict.fromkeys() to set all the values in a counter to zero and then use that:
>>> counter={}.fromkeys(s, 0)
>>> counter
{'a': 0, 'r': 0, 'b': 0, 'c': 0, 'd': 0}
>>> for c in s:
... counter[c]+=1
...
>>> counter
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
If you truly want the least Pythonic, i.e., what you might do in C, you would maybe do:
create a list for all possible ascii values set to 0
loop over the string and count characters that are present
Print non zero values
Example:
ascii_counts=[0]*255
s="abracadabra"
for c in s:
ascii_counts[ord(c)]+=1
for i, e in enumerate(ascii_counts):
if e:
print chr(i), e
Prints:
a 5
b 2
c 1
d 1
r 2
That does not scale to use with Unicode, however, since you would need more than 1 million list entries...
You can use zip function to convert your list to dictionary :
>>> dict(zip(freq[1::2],freq[0::2]))
{'h': 3, 's': 2, 'f': 1}
But as more pythonic and pretty optimized way I suggest to use collections.Counter
>>> from collections import Counter
>>> Counter("hshhsf")
Counter({'h': 3, 's': 2, 'f': 1})
And as you said you don't want to import any module you can use a dictionary using dict.setdefault method and a simple loop:
>>> d={}
>>> for i in xs:
... d[i]=d.setdefault(i,0)+1
...
>>> d
{'h': 3, 's': 2, 'f': 1}
I'm guessing theres a learning reason as to why your using two forloops?
Anyway heres a few different solutions:
# Method 1
xs = 'hshhsf'
xsUnique = ''.join(set(xs))
freq1 = {}
for i in range(len(xsUnique)):
for x in range(len(xs)):
if xsUnique[i] == xs[x]:
if xs[x] in freq1:
freq1[xs[x]] += 1
else:
freq1[xs[x]] = 1 # Introduce a new key, value pair
# Method 2
# Or use a defaultdict that auto initialize new values in a dictionary
# https://docs.python.org/2/library/collections.html#collections.defaultdict
from collections import defaultdict
freq2 = defaultdict(int) # new values initialize to 0
for i in range(len(xsUnique)):
for x in range(len(xs)):
if xsUnique[i] == xs[x]:
# no need to check if xs[x] is in the dict because
# defaultdict(int) will set any new key to zero, then
# preforms it's operation.
freq2[xs[x]] += 1
# I don't understand why your using 2 forloops though
# Method 3
string = 'hshhsf' # the variable name `xs` confuses me, sorry
freq3 = defaultdict(int)
for char in string:
freq3[char] += 1
# Method 4
freq4 = {}
for char in string:
if char in freq4:
freq4[char] += 1
else:
freq4[char] = 1
print 'freq1: %r\n' % freq1
print 'freq2: %r\n' % freq2
print 'freq3: %r\n' % freq3
print 'freq4: %r\n' % freq4
print '\nDo all the dictionaries equal each other as they stand?'
print 'Answer: %r\n\n' % (freq1 == freq2 and freq1 == freq3 and freq1 == freq4)
# convert the defaultdict's to a dict for consistency
freq2 = dict(freq2)
freq3 = dict(freq3)
print 'freq1: %r' % freq2
print 'freq2: %r' % freq2
print 'freq3: %r' % freq3
print 'freq4: %r' % freq4
Output
freq1: {'h': 3, 's': 2, 'f': 1}
freq2: defaultdict(<type 'int'>, {'h': 3, 's': 2, 'f': 1})
freq3: defaultdict(<type 'int'>, {'h': 3, 's': 2, 'f': 1})
freq4: {'h': 3, 's': 2, 'f': 1}
Do all the dictionaries equal each other as they stand?
Answer: True
freq1: {'h': 3, 's': 2, 'f': 1}
freq2: {'h': 3, 's': 2, 'f': 1}
freq3: {'h': 3, 's': 2, 'f': 1}
freq4: {'h': 3, 's': 2, 'f': 1}
[Finished in 0.1s]
Or like dawg stated, use Counter from the collections standard library
counter docs
https://docs.python.org/2/library/collections.html#collections.Counter
defaultdict docs
https://docs.python.org/2/library/collections.html#collections.defaultdict
collections library docs
https://docs.python.org/2/library/collections.html

Python Counting Vowels

I have started on a program to count vowels and have seemed to be getting nowhere. I need to count vowels from a string and then display the vowels. I need to do this by storing the number of occurrences in variables. Like this :
a = 0
b = 0
....
then print the lowest.
Current code (its not that much ):
string = str(input("please input a string: "))
edit= ''.join(string)
print(edit)
I have tried a number of methods just by my self and don't seem to get anywhere.
You could use a dictionary comprehension:
>>> example = 'this is an example string'
>>> vowel_counts = {c: example.count(c) for c in 'aeoiu'}
>>> vowel_counts
{'i': 2, 'o': 0, 'e': 5, 'u': 0, 'a': 2}
Then finding the minimum, maximum etc. is trivial.
>>> a="hello how are you"
>>> vowel_count = dict.fromkeys('aeiou',0)
>>> vowel_count
{'a': 0, 'i': 0, 'e': 0, 'u': 0, 'o': 0}
>>> for x in 'aeiou':
... vowel_count[x]=a.count(x)
...
>>> vowel_count
{'a': 1, 'i': 0, 'e': 2, 'u': 1, 'o': 3}
now from here you can print low nd max
You can use dictionary for this problem. Iterate over each character and if the character is a vowel, put it in dictionary with count 0 and increment its count by 1, and for every next occurrence keep incrementing the count.
>>> string = str(input("please input a string: "))
please input a string: 'Hello how are you'
>>> dt={} # initialize dictionary
>>> for i in string: # iterate over each character
... if i in ['a','e','i','o','u']: # if vowel
... dt.setdefault(i,0) # at first occurrence set count to 0
... dt[i]+=1 # increment count by 1
...
>>> dt
{'a': 1, 'u': 1, 'e': 2, 'o': 3}
word = input('Enter Your word : ')
vowel = 'aeiou'
vowel_counter = {}
for char in word:
if char in vowel:
vowel_counter[char] = vowel_counter.setdefault(char,0)+1
sorted_result = sorted(vowel_counter.items(), reverse=True,key=lambda x : x[1])
for key,val in sorted_result:
print(key,val)

Categories