I have the following dictionary:
mydict = {'mindestens': 2,
'Situation': 3,
'österreichische': 2,
'habe.': 1,
'Über': 1,
}
How can I get a list / text out of it, that the strings in my dictionary are repeated as the number is mapped in the dictionary to it:
mylist = ['mindestens', 'mindestens', 'Situation', 'Situation', 'Situation',.., 'Über']
mytext = 'mindestens mindestens Situation Situation Situation ... Über'
You might just use loops:
mylist = []
for word,times in mydict.items():
for i in range(times):
mylist.append(word)
itertools library has convenient features for such cases:
from itertools import chain, repeat
mydict = {'mindestens': 2, 'Situation': 3, 'österreichische': 2,
'habe.': 1, 'Über': 1,
}
res = list(chain.from_iterable(repeat(k, v) for k, v in mydict.items()))
print(res)
The output:
['mindestens', 'mindestens', 'Situation', 'Situation', 'Situation', 'österreichische', 'österreichische', 'habe.', 'Über']
For text version - joining a list items is trivial: ' '.join(<iterable>)
How do I count the number of occurrences of a character in a string?
e.g. 'a' appears in 'Mary had a little lamb' 4 times.
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4
You can use .count() :
>>> 'Mary had a little lamb'.count('a')
4
To get the counts of all letters, use collections.Counter:
>>> from collections import Counter
>>> counter = Counter("Mary had a little lamb")
>>> counter['a']
4
Regular expressions maybe?
import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))
Python-3.x:
"aabc".count("a")
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
myString.count('a');
more info here
str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.
A better approach for this job would be:
from collections import defaultdict
text = 'Mary had a little lamb'
chars = defaultdict(int)
for char in text:
chars[char] += 1
So you'll have a dict that returns the number of occurrences of every letter in the string and 0 if it isn't present.
>>>chars['a']
4
>>>chars['x']
0
For a case insensitive counter you could override the mutator and accessor methods by subclassing defaultdict (base class' ones are read-only):
class CICounter(defaultdict):
def __getitem__(self, k):
return super().__getitem__(k.lower())
def __setitem__(self, k, v):
super().__setitem__(k.lower(), v)
chars = CICounter(int)
for char in text:
chars[char] += 1
>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0
This easy and straight forward function might help:
def check_freq(x):
freq = {}
for c in set(x):
freq[c] = x.count(c)
return freq
check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}
If a comprehension is desired:
def check_freq(x):
return {c: x.count(c) for c in set(x)}
Regular expressions are very useful if you want case-insensitivity (and of course all the power of regex).
my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m") # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))
Be aware that the regex version takes on the order of ten times as long to run, which will likely be an issue only if my_string is tremendously long, or the code is inside a deep loop.
I don't know about 'simplest' but simple comprehension could do:
>>> my_string = "Mary had a little lamb"
>>> sum(char == 'a' for char in my_string)
4
Taking advantage of built-in sum, generator comprehension and fact that bool is subclass of integer: how may times character is equal to 'a'.
a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
print(key, a.count(key))
An alternative way to get all the character counts without using Counter(), count and regex
counts_dict = {}
for c in list(sentence):
if c not in counts_dict:
counts_dict[c] = 0
counts_dict[c] += 1
for key, value in counts_dict.items():
print(key, value)
I am a fan of the pandas library, in particular the value_counts() method. You could use it to count the occurrence of each character in your string:
>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
8
a 5
e 4
t 4
o 3
n 3
s 3
d 3
l 3
u 2
i 2
r 2
v 2
` 2
h 2
p 1
b 1
I 1
m 1
( 1
y 1
_ 1
) 1
c 1
dtype: int64
count is definitely the most concise and efficient way of counting the occurrence of a character in a string but I tried to come up with a solution using lambda, something like this :
sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))
This will result in :
4
Also, there is one more advantage to this is if the sentence is a list of sub-strings containing same characters as above, then also this gives the correct result because of the use of in. Have a look :
sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))
This also results in :
4
But Of-course this will work only when checking occurrence of single character such as 'a' in this particular case.
a = "I walked today,"
c=['d','e','f']
count=0
for i in a:
if str(i) in c:
count+=1
print(count)
I know the ask is to count a particular letter. I am writing here generic code without using any method.
sentence1 =" Mary had a little lamb"
count = {}
for i in sentence1:
if i in count:
count[i.lower()] = count[i.lower()] + 1
else:
count[i.lower()] = 1
print(count)
output
{' ': 5, 'm': 2, 'a': 4, 'r': 1, 'y': 1, 'h': 1, 'd': 1, 'l': 3, 'i': 1, 't': 2, 'e': 1, 'b': 1}
Now if you want any particular letter frequency, you can print like below.
print(count['m'])
2
the easiest way is to code in one line:
'Mary had a little lamb'.count("a")
but if you want can use this too:
sentence ='Mary had a little lamb'
count=0;
for letter in sentence :
if letter=="a":
count+=1
print (count)
To find the occurrence of characters in a sentence you may use the below code
Firstly, I have taken out the unique characters from the sentence and then I counted the occurrence of each character in the sentence these includes the occurrence of blank space too.
ab = set("Mary had a little lamb")
test_str = "Mary had a little lamb"
for i in ab:
counter = test_str.count(i)
if i == ' ':
i = 'Space'
print(counter, i)
Output of the above code is below.
1 : r ,
1 : h ,
1 : e ,
1 : M ,
4 : a ,
1 : b ,
1 : d ,
2 : t ,
3 : l ,
1 : i ,
4 : Space ,
1 : y ,
1 : m ,
"Without using count to find you want character in string" method.
import re
def count(s, ch):
pass
def main():
s = raw_input ("Enter strings what you like, for example, 'welcome': ")
ch = raw_input ("Enter you want count characters, but best result to find one character: " )
print ( len (re.findall ( ch, s ) ) )
main()
Python 3
Ther are two ways to achieve this:
1) With built-in function count()
sentence = 'Mary had a little lamb'
print(sentence.count('a'))`
2) Without using a function
sentence = 'Mary had a little lamb'
count = 0
for i in sentence:
if i == "a":
count = count + 1
print(count)
Use count:
sentence = 'A man walked up to a door'
print(sentence.count('a'))
# 4
Taking up a comment of this user:
import numpy as np
sample = 'samplestring'
np.unique(list(sample), return_counts=True)
Out:
(array(['a', 'e', 'g', 'i', 'l', 'm', 'n', 'p', 'r', 's', 't'], dtype='<U1'),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1]))
Check 's'. You can filter this tuple of two arrays as follows:
a[1][a[0]=='s']
Side-note: It works like Counter() of the collections package, just in numpy, which you often import anyway. You could as well count the unique words in a list of words instead.
This is an extension of the accepted answer, should you look for the count of all the characters in the text.
# Objective: we will only count for non-empty characters
text = "count a character occurrence"
unique_letters = set(text)
result = dict((x, text.count(x)) for x in unique_letters if x.strip())
print(result)
# {'a': 3, 'c': 6, 'e': 3, 'u': 2, 'n': 2, 't': 2, 'r': 3, 'h': 1, 'o': 2}
No more than this IMHO - you can add the upper or lower methods
def count_letter_in_str(string,letter):
return string.count(letter)
You can use loop and dictionary.
def count_letter(text):
result = {}
for letter in text:
if letter not in result:
result[letter] = 0
result[letter] += 1
return result
spam = 'have a nice day'
var = 'd'
def count(spam, var):
found = 0
for key in spam:
if key == var:
found += 1
return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))
I have a system log that looks like the following:
{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}
I want to parse this in Python into a dictionary object with = splitting key-value pairs. At the same time, the array that is enclosed by [] is also preserved. I want to keep this as generic as possible so the parsing can also hold some future variations.
What I tried so far (code will be written): split each line by "=" into key-value pair, determine where [ and ] starts and end and then split the lines in between by ":" into key-value pairs. That seems a little hard-coded.. Any better idea?
This could be pretty easily simplified to YAML. pip install pyyaml, then set up like so:
import string, yaml
data = """
{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}
"""
With this setup, you can use the following to parse your data:
data2 = data.replace(":", ": ").replace("=", ":").replace("[","{").replace("]","}")
lines = data2.splitlines()
for i, line in enumerate(lines):
if len(line)>0 and line[-1] in string.digits and not line.endswith(",") or i < len(lines) - 1 and line.endswith("}"):
lines[i] += ","
data3 = "\n".join(lines)
yaml.load(data3) # {'a': 1, 'b': 2, 'c': {'x': 1, 'y': 2, 'z': 3}, 'd': 4}
Explanation
In the first line, we perform some simple substitutions:
YAML requires that there is a space after colons in key/value pairs. So with replace(":", ": "), we can ensure this.
Since YAML key/value pairs are always denoted by a colon and your format sometimes uses equals signs, we replace equal signs with commas using .replace("=", ":")
Your format sometimes uses square brackets where curly brackets should be used in YAML. We fix using .replace("[","{").replace("]","}")
At this point, your data looks like this:
{
a : 1
b : 2
c : {
x: 1,
y: 2,
z: 3,
}
d : 4
}
Next, we have a for loop. This is simply responsible for adding commas after lines where they're missing. The two cases in which for loops are missing are:
- They're absent after a numeric value
- They're absent after a closing bracket
We match the first of these cases using len(line)>0 and line[-1] in string.digits (the last character in the line is a digit)
The second case is matched using i < len(lines) - 1 and line.endswith("}"). This checks if the line ends with }, and also checks that the line is not the last, since YAML won't allow a comma after the last bracket.
After the loop, we have:
{
a : 1,
b : 2,
c : {
x: 1,
y: 2,
z: 3,
},
d : 4,
}
which is valid YAML. All that's left is yaml.load, and you've got yourself a python dict.
If anything isn't clear please leave a comment and I'll happily elaborate.
There is probably a better answer, but I would take advantage of all your dictionary keys being at the same indentation level. There's not an obvious way to be to do this with newline splitting, JSON loading, or that sort of thing since the list structure is a bit weird (it seems like a cross between a list and a dictionary).
Here's an implementation that parses keys based on indentation level:
import re
log = '''{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}'''
log_lines = log.split('\n')[1:-1] # strip bracket lines
KEY_REGEX = re.compile(r' [^ ]')
d = {}
current_pair = ''
for i, line in enumerate(log_lines):
if KEY_REGEX.match(line):
if current_pair:
key, value = current_pair.split('=')
d[key.strip()] = value.strip()
current_pair = line
else:
current_pair += line.strip()
if current_pair:
key, value = current_pair.split('=')
d[key.strip()] = value.strip()
print(d)
Output:
{'d': '4', 'c': '[x:1,y:2,z:3,]', 'a': '1', 'b': '2'}
I have a sample list
a = ['be','see','tree'....]
The user will provide a raw_input as name. Then, it has to print the name along with each and every word in the list, then need to find the total number of characters for the name along with each and every word in list.
Atlast, I need to store it in a dictionary.
Eg:-
raw_input name = 'jean', then it has to print:
jean be
jean see
jean tree
I then need to store in dictionary as :-
{'jean be':'6','jean see':'7','jean tree':'8'}
My coding:
a=['be','see','tree']
x = raw_input("Enter the query x ")
for item in a:
length =len(item[i] + x)
I am not sure,how far it is correct and I dont know how to store it in a dict.
You can use a dict comprehension for save your items in a dictionary :
>>> inp=raw_input()
>>> {inp+' '+i:len(inp+i) for i in a}
{'jean see': 7, 'jean be': 6, 'jean tree': 8}
and use a for loop to print the desire pairs :
>>> for i in a:
... print inp+' '+i
...
jean be
jean see
jean tree
But as dictionaries are not ordered you can use collections.OrderedDict for create an ordered dict:
>>> from collections import OrderedDict
>>> D=OrderedDict()
>>> for k,v in sorted(((inp+' '+i,len(inp+i)) for i in a),key=lambda x:x[1]):
... D[k]=v
...
>>> D
OrderedDict([('jean be', 6), ('jean see', 7), ('jean tree', 8)])
Your error is to try using item[i] (i is undefined anyway) when you have access to item. Check this out:
a=['be','see','tree']
x = raw_input("Enter the query x ")
d = dict()
for item in a:
d[x + ' ' + item] = len(x + item)
print d
a=['be','see','tree']
somedict = {}
x = raw_input("Enter the query x ")
for i in a:
neww = x+" "+i
print neww
somedict[neww] = len(neww)
print somedict
Output :
Enter the query x {'john tree': 9, 'john be': 7, 'john see': 8}