How can I find consonants letters that came after the vowels in words of string and count the frequency
str = 'car regular double bad '
result19 = re.findall(r'\b\w*[aeiou][^ aeiou]\w*\b' , str)
print(result19) #doesn't work
Expected output
letter r count = 2
letter b count = 1
letter d count = 1
I am not sure whether this is what you want or not, but it might help as an answer and not a comment.
I think you are on the right track, but you need a few modifications and other lines to achieve the excepted:
import re
myStr = 'car regular double bad '
result19 = re.findall(r'[aeiou][^aeiou\s]+' , myStr)
myDict = {}
for value in result19:
if not value[1] in myDict:
myDict[value[1]] = 0
myDict[value[1]] += 1
myDict
This will result in a dictionary containing the values and the number the have appeared:
{'b': 1, 'd': 1, 'g': 1, 'l': 1, 'r': 2}
For having a better output you can use a for loop to print each key and its value:
for chr, value in myDict.items():
print(chr, "->", value)
Output
r -> 2
g -> 1
l -> 1
b -> 1
d -> 1
Your pattern \b\w*[aeiou][^ aeiou]\w*\b matches zero or more repetitions of a word character using \w* and only matches a single occurrence of [aeiou][^ aeiou] in the "word"
If you want to match all consonant letters based on the alphabet a-z after a vowel, you can match a single occurrence of [aeiou] and use a capture group matching a single consonant.
Then make use of re.findall to return a list of the group values.
import re
txt = 'car regular double bad '
lst = re.findall(r'[aeiou]([b-df-hj-np-tv-z])', txt)
dct = {c: lst.count(c) for c in lst}
print(dct)
Output
{'r': 2, 'g': 1, 'l': 1, 'b': 1, 'd': 1}
If you want to match a non whitespace char other than a vowel after matching a vowel, you can use this pattern [aeiou]([^\saeiou])
Note that the l is also in the output as it comes after the u in ul
I need to create a function, where, If I give an input like 999933. It should give output as "ze". It basically work as numeric mobile phone keypad. How can I this. I have searched to get some sample in internet. All, I got was quite opposite. Like, Giving the text as input and you will get the number. I couldn't get the exact flow of, how to achieve that. Please let me know, how can i do that.
def number_to_text(val):
pass
Create a mapping pad_number to letter.
Use itertools.groupby to iterate over consecutive pad presses and calculate which letter we get.
import itertools
letters_by_pad_number = {"3": "def", "9": "wxyz"}
def number_to_text(val):
message = ""
# change val to string, so we can iterate over digits
digits = str(val)
# group consecutive numbers: itertools.groupby("2244") -> ('2', '22'), ('4','44')
for digit, group in itertools.groupby(digits):
# get the pad letters, i.e. "def" for "3" pad
letters = letters_by_pad_number[digit]
# get how many consecutive times it was pressed
presses_number = len(list(group))
# calculate the index of the letter cycling through if we pressed
# more that 3 times
letter_index = (presses_number - 1) % len(letters)
message += letters[letter_index]
return message
print(number_to_text(999933))
# ze
And hardcore one-liner just for fun:
letters = {"3": "def", "9": "wxyz"}
def number_to_text(val):
return "".join([letters[d][(len(list(g)) - 1) % len(letters[d])] for d, g in itertools.groupby(str(val))])
print(number_to_text(999933))
# ze
The other answers are correct, but I tried to write a less brief more real world (including doctests) explanation of how the previous results worked:
dialpad_text.py:
# Import the groupby function from itertools,
# this takes any sequence and returns an array of groups by some key
from itertools import groupby
# Use a dictionary as a lookup table
dailpad = {
'2': ['a', 'b', 'c'],
'3': ['d', 'e', 'f'],
'4': ['g', 'h', 'i'],
'5': ['j', 'k', 'l'],
'6': ['m', 'n', 'o'],
'7': ['p', 'q', 'r', 's'],
'8': ['t', 'u', 'v'],
'9': ['w', 'x', 'y', 'z'],
}
def dialpad_text(numbers):
"""
Takes in either a number or a string of numbers and creates
a string of characters just like a nokia without T9 support
Default usage:
>>> dialpad_text(2555222)
'alc'
Handle string inputs:
>>> dialpad_text('2555222')
'alc'
Handle wrapped groups:
>>> dialpad_text(2222555222)
'alc'
Throw an error if an invalid input is given
>>> dialpad_text('1BROKEN')
Traceback (most recent call last):
...
ValueError: Unrecognized input "1"
"""
# Convert to string if given a number
if type(numbers) == int:
numbers = str(numbers)
# Create our string output for the dialed numbers
output = ''
# Group each set of numbers in the order
# they appear and iterate over the groups.
# (eg. 222556 will result in [(2, [2, 2, 2]), (5, [5, 5]), (6, [6])])
# We can use the second element of each tuple to find
# our index into the dictionary at the given number!
for number, letters in groupby(numbers):
# Convert the groupby group generator into a list and
# get the offset into our array at the specified key
offset = len(list(letters)) - 1
# Check if the number is a valid dialpad key (eg. 1 for example isn't)
if number in dailpad.keys():
# Add the character to our output string and wrap
# if the number is greater than the length of the character list
output += dailpad[number][offset % len(dailpad[number])]
else:
raise ValueError(f'Unrecognized input "{number}"')
return output
Hope this helps you understand what's going on a lower level! Also if you don't trust my code, just save that to a file and run python -m doctest dialpad_text.py and it will pass the doctests from the module.
(Notes: without the -v flag it won't output anything, silence is golden!)
You need to
group the same digits together with the regex (\d)\1* that capture a digit then the same digit X times
use the value of a digit in the group to get the key
use the length of it to get the letter
phone_letters = ["", "", "abc", "def", "ghi", "jkl", "mno", "pqrs", "tuv", "wxyz"]
def number_to_text(val):
groups = [match.group() for match in re.finditer(r'(\d)\1*', val)]
result = ""
for group in groups:
keynumber = int(group[0])
count = len(group)
result += phone_letters[keynumber][count - 1]
return result
print(number_to_text("999933")) # ze
Using list comprehension
def number_to_text(val):
groups = [match.group() for match in re.finditer(r'(\d)\1*', val)]
return "".join(phone_letters[int(group[0])][len(group) - 1] for group in groups)
A slightly Modified answer of RaFalS without using itertools
import itertools
from collections import defaultdict
letters_by_pad_number = {"3": "def", "9": "wxyz"}
val = 999933
message = ""
digits = str(val)
num_group = defaultdict(int)
for digit in digits:
num_group[digit] += 1
for num in num_group.keys():
message += letters_by_pad_number[num][num_group[num]-1]
print(message)
# ze
Hy,
I want to count given phrases from a list in another list on position zero.
list_given_atoms= ['C', 'Cl', 'Br']
list_of_molecules= ['C(B2Br)[Cl{H]Cl}P' ,'NAME']
When python find a match it should be safed in a dictionary like
countdict = [ 'Cl : 2', 'C : 1', 'Br : 1']
i tried
re.findall(r'\w+', list_of_molecules[0])
already but that resulsts in words like "B2Br", which is definitly not what i want.
can someone help me?
[a-zA-Z]+ should be used instead of \w+ because \w+ will match both letters and numbers, while you are just looking for letters:
import re
list_given_atoms= ['C', 'Cl', 'Br']
list_of_molecules= ['C(B2Br)[Cl{H]Cl}P' ,'NAME']
molecules = re.findall('[a-zA-Z]+', list_of_molecules[0])
final_data = {i:molecules.count(i) for i in list_given_atoms}
Output:
{'C': 1, 'Br': 1, 'Cl': 2}
You could use something like this:
>>> Counter(re.findall('|'.join(sorted(list_given_atoms, key=len, reverse=True)), list_of_molecules[0]))
Counter({'Cl': 2, 'C': 1, 'Br': 1})
You have to sort the elements by their length, so 'Cl' matches before 'C'.
Short re.findall() solution:
import re
list_given_atoms = ['C', 'Cl', 'Br']
list_of_molecules = ['C(B2Br)[Cl{H]Cl}P' ,'NAME']
d = { a: len(re.findall(r'' + a + '(?=[^a-z]|$)', list_of_molecules[0], re.I))
for a in list_given_atoms }
print(d)
The output:
{'C': 1, 'Cl': 2, 'Br': 1}
I tried your solutions and i figured out, that there are also several C after each other. So I came to this one here:
for element in re.findall(r'([A-Z])([a-z|A-Z])?'. list_of_molecules[0]):
if element[1].islower:
counter = element[0] + element[1]
if not (counter in counter_dict):
counter_dict[counter] = 1
else:
counter_dict[counter] += 1
The same way I checked for elements with just one case and added them to the dictionary. There is probably a better way.
You can't use a /w as a word character is equivalent to:
[a-zA-Z0-9_]
which clearly includes numbers so therefore "B2Br" is matched.
You also can't just use the regex:
[a-zA-Z]+
as that would produce one atom for something like "CO2"which should produce 2 separate molecules: C and 0.
However the regex I came up with (regex101) just checks for a capital letter and then between 0 and 1 (so optional) lower case letter.
Here it is:
[A-Z][a-z]{0,1}
and it will correctly produce the atoms.
So to incorporate this into your original lists of:
list_given_atoms= ['C', 'Cl', 'Br']
list_of_molecules= ['C(B2Br)[Cl{H]Cl}P' ,'NAME']
we want to first find all the atoms in list_of_molecules and then create a dictionary of the counts of the atoms in list_given_atoms.
So to find all the atoms, we can use re.findall on the first element in the molecules list:
atoms = re.findall("[A-Z][a-z]{0,1}", list_of_molecules[0])
which gives a list:
['C', 'B', 'Br', 'Cl', 'H', 'Cl', 'P']
then, to get the counts in a dictionary, we can use a dictionary-comprehension:
counts = {a: atoms.count(a) for a in list_given_atoms}
which gives the desired result of:
{'Cl': 2, 'C': 1, 'Br': 1}
And would also work when we have molecules like CO2 etc.
This first function counts the string's characters
def character_count(sentence):
characters = {}
for char in sentence:
if char in characters:
characters[char] = characters[char] + 1
else:
characters[char] = 1
return characters
This second function determines the most common character and identifies which one appears most often by characters[char] which is established in the previous helper function
def most_common_character(sentence):
chars = character_count(sentence)
most_common = ""
max_times = 0
for curr_char in chars:
if chars[curr_char] > max_times:
most_common = curr_char
max_times = chars[curr_char]
return most_common
Why not simply using what Python provides?
>>> from collections import Counter
>>> sentence = "This is such a beautiful day, isn't it"
>>> c = Counter(sentence).most_common(3)
>>> c
[(' ', 7), ('i', 5), ('s', 4)]
After if you really want to proceed word by word and avoid spaces:
>>> from collections import Counter
>>> sentence = "This is such a beautiful day, isn't it"
>>> res = Counter(sentence.replace(' ', ''))
>>> res.most_common(1)
[('i', 5)]
You actually don't have to change anything! Your code will work with a list as is (the variable names just become misleading). Try it:
most_common_character(['this', 'is', 'a', 'a', 'list'])
Output:
'a'
This will work for lists with any kind of elements that are hashable (numbers, strings, characters, etc)
How do I count the number of occurrences of a character in a string?
e.g. 'a' appears in 'Mary had a little lamb' 4 times.
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4
You can use .count() :
>>> 'Mary had a little lamb'.count('a')
4
To get the counts of all letters, use collections.Counter:
>>> from collections import Counter
>>> counter = Counter("Mary had a little lamb")
>>> counter['a']
4
Regular expressions maybe?
import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))
Python-3.x:
"aabc".count("a")
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
myString.count('a');
more info here
str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.
A better approach for this job would be:
from collections import defaultdict
text = 'Mary had a little lamb'
chars = defaultdict(int)
for char in text:
chars[char] += 1
So you'll have a dict that returns the number of occurrences of every letter in the string and 0 if it isn't present.
>>>chars['a']
4
>>>chars['x']
0
For a case insensitive counter you could override the mutator and accessor methods by subclassing defaultdict (base class' ones are read-only):
class CICounter(defaultdict):
def __getitem__(self, k):
return super().__getitem__(k.lower())
def __setitem__(self, k, v):
super().__setitem__(k.lower(), v)
chars = CICounter(int)
for char in text:
chars[char] += 1
>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0
This easy and straight forward function might help:
def check_freq(x):
freq = {}
for c in set(x):
freq[c] = x.count(c)
return freq
check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}
If a comprehension is desired:
def check_freq(x):
return {c: x.count(c) for c in set(x)}
Regular expressions are very useful if you want case-insensitivity (and of course all the power of regex).
my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m") # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))
Be aware that the regex version takes on the order of ten times as long to run, which will likely be an issue only if my_string is tremendously long, or the code is inside a deep loop.
I don't know about 'simplest' but simple comprehension could do:
>>> my_string = "Mary had a little lamb"
>>> sum(char == 'a' for char in my_string)
4
Taking advantage of built-in sum, generator comprehension and fact that bool is subclass of integer: how may times character is equal to 'a'.
a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
print(key, a.count(key))
An alternative way to get all the character counts without using Counter(), count and regex
counts_dict = {}
for c in list(sentence):
if c not in counts_dict:
counts_dict[c] = 0
counts_dict[c] += 1
for key, value in counts_dict.items():
print(key, value)
I am a fan of the pandas library, in particular the value_counts() method. You could use it to count the occurrence of each character in your string:
>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
8
a 5
e 4
t 4
o 3
n 3
s 3
d 3
l 3
u 2
i 2
r 2
v 2
` 2
h 2
p 1
b 1
I 1
m 1
( 1
y 1
_ 1
) 1
c 1
dtype: int64
count is definitely the most concise and efficient way of counting the occurrence of a character in a string but I tried to come up with a solution using lambda, something like this :
sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))
This will result in :
4
Also, there is one more advantage to this is if the sentence is a list of sub-strings containing same characters as above, then also this gives the correct result because of the use of in. Have a look :
sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))
This also results in :
4
But Of-course this will work only when checking occurrence of single character such as 'a' in this particular case.
a = "I walked today,"
c=['d','e','f']
count=0
for i in a:
if str(i) in c:
count+=1
print(count)
I know the ask is to count a particular letter. I am writing here generic code without using any method.
sentence1 =" Mary had a little lamb"
count = {}
for i in sentence1:
if i in count:
count[i.lower()] = count[i.lower()] + 1
else:
count[i.lower()] = 1
print(count)
output
{' ': 5, 'm': 2, 'a': 4, 'r': 1, 'y': 1, 'h': 1, 'd': 1, 'l': 3, 'i': 1, 't': 2, 'e': 1, 'b': 1}
Now if you want any particular letter frequency, you can print like below.
print(count['m'])
2
the easiest way is to code in one line:
'Mary had a little lamb'.count("a")
but if you want can use this too:
sentence ='Mary had a little lamb'
count=0;
for letter in sentence :
if letter=="a":
count+=1
print (count)
To find the occurrence of characters in a sentence you may use the below code
Firstly, I have taken out the unique characters from the sentence and then I counted the occurrence of each character in the sentence these includes the occurrence of blank space too.
ab = set("Mary had a little lamb")
test_str = "Mary had a little lamb"
for i in ab:
counter = test_str.count(i)
if i == ' ':
i = 'Space'
print(counter, i)
Output of the above code is below.
1 : r ,
1 : h ,
1 : e ,
1 : M ,
4 : a ,
1 : b ,
1 : d ,
2 : t ,
3 : l ,
1 : i ,
4 : Space ,
1 : y ,
1 : m ,
"Without using count to find you want character in string" method.
import re
def count(s, ch):
pass
def main():
s = raw_input ("Enter strings what you like, for example, 'welcome': ")
ch = raw_input ("Enter you want count characters, but best result to find one character: " )
print ( len (re.findall ( ch, s ) ) )
main()
Python 3
Ther are two ways to achieve this:
1) With built-in function count()
sentence = 'Mary had a little lamb'
print(sentence.count('a'))`
2) Without using a function
sentence = 'Mary had a little lamb'
count = 0
for i in sentence:
if i == "a":
count = count + 1
print(count)
Use count:
sentence = 'A man walked up to a door'
print(sentence.count('a'))
# 4
Taking up a comment of this user:
import numpy as np
sample = 'samplestring'
np.unique(list(sample), return_counts=True)
Out:
(array(['a', 'e', 'g', 'i', 'l', 'm', 'n', 'p', 'r', 's', 't'], dtype='<U1'),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1]))
Check 's'. You can filter this tuple of two arrays as follows:
a[1][a[0]=='s']
Side-note: It works like Counter() of the collections package, just in numpy, which you often import anyway. You could as well count the unique words in a list of words instead.
This is an extension of the accepted answer, should you look for the count of all the characters in the text.
# Objective: we will only count for non-empty characters
text = "count a character occurrence"
unique_letters = set(text)
result = dict((x, text.count(x)) for x in unique_letters if x.strip())
print(result)
# {'a': 3, 'c': 6, 'e': 3, 'u': 2, 'n': 2, 't': 2, 'r': 3, 'h': 1, 'o': 2}
No more than this IMHO - you can add the upper or lower methods
def count_letter_in_str(string,letter):
return string.count(letter)
You can use loop and dictionary.
def count_letter(text):
result = {}
for letter in text:
if letter not in result:
result[letter] = 0
result[letter] += 1
return result
spam = 'have a nice day'
var = 'd'
def count(spam, var):
found = 0
for key in spam:
if key == var:
found += 1
return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))