Nested for-loops and dictionaries in finding value occurrence in string - python

I've been tasked with creating a dictionary whose keys are elements found in a string and whose values count the number of occurrences per value.
Ex.
"abracadabra" → {'r': 2, 'd': 1, 'c': 1, 'b': 2, 'a': 5}
I have the for-loop logic behind it here:
xs = "hshhsf"
xsUnique = "".join(set(xs))
occurrences = []
freq = []
counter = 0
for i in range(len(xsUnique)):
for x in range(len(xs)):
if xsUnique[i] == xs[x]:
occurrences.append(xs[x])
counter += 1
freq.append(counter)
freq.append(xsUnique[i])
counter = 0
This does exactly what I want it to do, except with lists instead of dictionaries. How can I make it so counter becomes a value, and xsUnique[i] becomes a key in a new dictionary?

The easiest way is to use a Counter:
>>> from collections import Counter
>>> Counter("abracadabra")
Counter({'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1})
If you can't use a Python library, you can use dict.get with a default value of 0 to make your own counter:
s="abracadabra"
count={}
for c in s:
count[c] = count.get(c, 0)+1
>>> count
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
Or, you can use dict.fromkeys() to set all the values in a counter to zero and then use that:
>>> counter={}.fromkeys(s, 0)
>>> counter
{'a': 0, 'r': 0, 'b': 0, 'c': 0, 'd': 0}
>>> for c in s:
... counter[c]+=1
...
>>> counter
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
If you truly want the least Pythonic, i.e., what you might do in C, you would maybe do:
create a list for all possible ascii values set to 0
loop over the string and count characters that are present
Print non zero values
Example:
ascii_counts=[0]*255
s="abracadabra"
for c in s:
ascii_counts[ord(c)]+=1
for i, e in enumerate(ascii_counts):
if e:
print chr(i), e
Prints:
a 5
b 2
c 1
d 1
r 2
That does not scale to use with Unicode, however, since you would need more than 1 million list entries...

You can use zip function to convert your list to dictionary :
>>> dict(zip(freq[1::2],freq[0::2]))
{'h': 3, 's': 2, 'f': 1}
But as more pythonic and pretty optimized way I suggest to use collections.Counter
>>> from collections import Counter
>>> Counter("hshhsf")
Counter({'h': 3, 's': 2, 'f': 1})
And as you said you don't want to import any module you can use a dictionary using dict.setdefault method and a simple loop:
>>> d={}
>>> for i in xs:
... d[i]=d.setdefault(i,0)+1
...
>>> d
{'h': 3, 's': 2, 'f': 1}

I'm guessing theres a learning reason as to why your using two forloops?
Anyway heres a few different solutions:
# Method 1
xs = 'hshhsf'
xsUnique = ''.join(set(xs))
freq1 = {}
for i in range(len(xsUnique)):
for x in range(len(xs)):
if xsUnique[i] == xs[x]:
if xs[x] in freq1:
freq1[xs[x]] += 1
else:
freq1[xs[x]] = 1 # Introduce a new key, value pair
# Method 2
# Or use a defaultdict that auto initialize new values in a dictionary
# https://docs.python.org/2/library/collections.html#collections.defaultdict
from collections import defaultdict
freq2 = defaultdict(int) # new values initialize to 0
for i in range(len(xsUnique)):
for x in range(len(xs)):
if xsUnique[i] == xs[x]:
# no need to check if xs[x] is in the dict because
# defaultdict(int) will set any new key to zero, then
# preforms it's operation.
freq2[xs[x]] += 1
# I don't understand why your using 2 forloops though
# Method 3
string = 'hshhsf' # the variable name `xs` confuses me, sorry
freq3 = defaultdict(int)
for char in string:
freq3[char] += 1
# Method 4
freq4 = {}
for char in string:
if char in freq4:
freq4[char] += 1
else:
freq4[char] = 1
print 'freq1: %r\n' % freq1
print 'freq2: %r\n' % freq2
print 'freq3: %r\n' % freq3
print 'freq4: %r\n' % freq4
print '\nDo all the dictionaries equal each other as they stand?'
print 'Answer: %r\n\n' % (freq1 == freq2 and freq1 == freq3 and freq1 == freq4)
# convert the defaultdict's to a dict for consistency
freq2 = dict(freq2)
freq3 = dict(freq3)
print 'freq1: %r' % freq2
print 'freq2: %r' % freq2
print 'freq3: %r' % freq3
print 'freq4: %r' % freq4
Output
freq1: {'h': 3, 's': 2, 'f': 1}
freq2: defaultdict(<type 'int'>, {'h': 3, 's': 2, 'f': 1})
freq3: defaultdict(<type 'int'>, {'h': 3, 's': 2, 'f': 1})
freq4: {'h': 3, 's': 2, 'f': 1}
Do all the dictionaries equal each other as they stand?
Answer: True
freq1: {'h': 3, 's': 2, 'f': 1}
freq2: {'h': 3, 's': 2, 'f': 1}
freq3: {'h': 3, 's': 2, 'f': 1}
freq4: {'h': 3, 's': 2, 'f': 1}
[Finished in 0.1s]
Or like dawg stated, use Counter from the collections standard library
counter docs
https://docs.python.org/2/library/collections.html#collections.Counter
defaultdict docs
https://docs.python.org/2/library/collections.html#collections.defaultdict
collections library docs
https://docs.python.org/2/library/collections.html

Related

How to return nested dictionary from function

Is it possible to make a function that will return a nested dict depending on the arguments?
def foo(key):
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
return d[key]
foo(['c']['d'])
I waiting for:
3
I'm getting:
TypeError: list indices must be integers or slices, not str
I understanding that it possible to return a whole dict, or hard code it to return a particular part of dict, like
if 'c' and 'd' in kwargs:
return d['c']['d']
elif 'c' and 'e' in kwargs:
return d['c']['e']
but it will be very inflexible
When you give ['c']['d'], you slice the list ['c'] using the letter d, which isin't possible. So what you can do is, correct the slicing:
foo('c')['d']
Or you could alter your function to slice it:
def foo(*args):
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
d_old = dict(d) # if case you have to store the dict for other operations in the fucntion
for i in args:
d = d[i]
return d
>>> foo('c','d')
3
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
def funt(keys):
val = d
for key in keys:
if val:
val = val.get(key)
return val
funt(['c', 'd'])
Additionally to handle key not present state.
One possible solution would be to iterate over multiple keys -
def foo(keys, d=None):
if d is None:
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
if len(keys) == 1:
return d[keys[0]]
return foo(keys[1:], d[keys[0]])
foo(['c', 'd'])

count characters of a string using function

Write a function named count_letters that takes as a parameter a string and returns a dictionary that tabulates how many of each letter is in that string. The string can contain characters other than letters, but only the letters should be counted. The string could even be the empty string. Lower-case and upper-case versions of a letter should be part of the same count. The keys of the dictionary should be the upper-case letters. If a letter does not appear in the string, then it would not get added to the dictionary. For example, if the string is
"AaBb"
then the dictionary that is returned should contain these key-value pairs:
{'A': 2, 'B': 2}
def count_letters(string):
"""counts all the letters in a given string"""
your_dict = dict()
for x in string:
x = x.upper() # makes lowercase upper
if x not in your_dict:
your_dict[x]= 1
else:
your_dict[x] += 1
return your_dict
I am getting the following error when I go to upload:
Test Failed: {'Q': 1, 'U': 3, 'I': 3, 'S': 6, ' ': 3, 'C[48 chars]': 1} != {'S': 6, 'U': 3, 'I': 3, 'T': 3, 'O': 3, 'C[32 chars]': 1}
+ {'C': 2, 'D': 2, 'E': 2, 'I': 3, 'O': 3, 'P': 1, 'Q': 1, 'S': 6, 'T': 3, 'U': 3}
- {' ': 3,
- '?': 1,
- 'C': 2,
- 'D': 2,
- 'E': 2,
- 'I': 3,
- 'O': 3,
- 'P': 1,
- 'Q': 1,
- 'S': 6,
- 'T': 3,
- 'U': 3}
Try something like this. Feel free to adjust it to your requirements:
import collections
def count_letters(string):
return collections.Counter(string.upper())
print(count_letters('Google'))
Output: Counter({'G': 2, 'O': 2, 'L': 1, 'E': 1})
For documentation of the Counter dict subclass in collections module, check this.
Update without using collections module:
def count_letters(string):
your_dict={}
for i in string.upper():
if i in your_dict:
your_dict[i] += 1
else:
your_dict[i] = 1
return your_dict
Output: {'G': 2, 'O': 2, 'L': 1, 'E': 1}
This solution does use collections, but unlike with Counter we aren’t getting the entire solution from a single library function. I hope it’s permitted, and if it isn’t, that it will at least be informative in some way.
import collections as colls
def count_letters(str_in):
str_folded = str_in.casefold()
counts = colls.defaultdict(int)
for curr_char in str_folded:
counts[curr_char] += 1
return counts
defaultdict is extremely practical. As the name indicates, when we try to index a dictionary with a key that doesn’t exist, it creates a default value for that key and carries out our original operation. In this case, since we declare that our defaultdict will use integers for its keys, the default value is 0.
str.casefold() is a method designed specifically for the complex problem that is case-insensitive comparison. While it is unlikely to make a difference here, it’s a good function to know.
Let me know if you have any questions :)
Without using collections, here is a solution:
def count_letters(string):
string = string.upper()
counts = {}
for a in set(string):
counts[a] = string.count(a)
return counts
This function iterates over set(string), which is equal to all the letters used in your word, without duplicates, and in uppercase. Then it counts how many times each letter appears in your string, and adds it to your counts dictionary.
I hope this answers your question. :)

Program optimisation and working of dictionary in adding key value pairs

this is my program on counting the number of vowels
'''Program to count number of vowels'''
str=input("Enter a string\n")
a=0
e=0
i=0
o=0
u=0
for x in str:
if x=='a':
a=a+1
continue
if x=='e':
e=e+1
continue
if x=='i':
i=i+1
continue
if x=='o':
o=o+1
continue
if x=='u':
u=u+1
continue
count={}
if a>0:
count['a']=a
if e>0:
count['e']=e
if i>0:
count['i']=i
if o>0:
count['o']=o
if u>0:
count['u']=u
print(count)
How can I improve the initial loop for comparison along with the process of filling the dictionary.
While running the program several times I have obtained the following output:
>>>
Enter a string
abcdefgh
{'e': 1, 'a': 1}
>>> ================================ RESTART ================================
>>>
Enter a string
abcdefghijklmnopqrstuvwxyz
{'u': 1, 'a': 1, 'o': 1, 'e': 1, 'i': 1}
>>> ================================ RESTART ================================
>>>
Enter a string
abcdeabcdeiopiop
{'a': 2, 'o': 2, 'i': 2, 'e': 2}
From this I could not figure out how exactly are the key value pairs being added to the dictionary count against my expectation of:
Case 1:
{'a':1, 'e':1}
Case 2:
{'a':1, 'e':1, 'i':1, 'o':1, 'u':1}
Case 3:
{'a':2, 'e':2, 'i':2, 'o':2}
Any help is appreciated.
>>> import collections
>>> s = "aacbed"
>>> count = collections.Counter(c for c in s if c in "aeiou")
>>> count
Counter({'a': 2, 'e': 1})
Or - if you really need to maintain insertion order:
>>> s = 'debcaa'
>>> count=collections.OrderedDict((c, s.count(c)) for c in s if c in "aeiou")
>>> count
OrderedDict([('e', 1), ('a', 2)])
Finally if you want lexicographic ordering, you can either turn your dict/counter/ OrderedDict into a list of tuples:
>>> sorted(count.items())
[('a', 2), ('e', 1)]
and if you want a lexicographically OrderedDict:
>>> sorted_count = collections.OrderedDict(sorted(count.items()))
>>> sorted_count
OrderedDict([('a', 2), ('e', 1)])
A more Pythonic way to do what you want is:
'''Program to count number of vowels'''
s = input("Enter a string\n")
count = {v: s.count(v) for v in "aeiou" if s.count(v) > 0}
print(count)
You shouldn't use str as a variable name, as that is the name of the built-in string type.
Just put a=0 e=0 i=0 o=0 u=0 inside a dictionary like that:
myDict = {'a':0, 'e':0, 'i':0, 'o':0, 'u':0}
for x in string:
myDict[x] += 1
print myDict
If the value is not one of the following then a raise of KeyError will come up.
So you can do something like that:
myDict = {'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0}
for x in string:
try:
myDict[x] += 1
except KeyError:
continue
print myDict
Note: I've changed the name str to string
You can also see a very good solution by #Amber here

I have a list of words. I want to add a counter variable associated with each word. How do I do this?

I have a list of words, let's say it's
['a', 'b', 'c', 'd']
I have a document where I've already pre-processed a text file into a matrix, and it goes like this:
a,b,c,d
0,1,1,0
1,1,0,0
1,1,1,1
Where 1 is the presence of the word in a sentence, and 0 is the absence of that word in a sentence. I would like to go through that matrix, line by line, and increment some sort of counter associated with the original word list up above, so I can know how many of each word was found in the sentences at the end.
How can I make this? Do I have to create an associative array, or a 2d array? Is there a way to create a new variable within the array associated with each word that I can increment?
Thanks!
All you have to do is sum each column since it's just 0s and 1s!
import numpy as np
array = numpy.array((matrix))
answer = np.apply_along_axis(sum,0,array[1::])
my_dict = dict(zip(matrix[0],answer))
now you have a dictionary where the keys are the words and the values is the total number of appearances!
You can use collections.Counter to tally the word counts:
>>> from collections import Counter
>>> filedata = '''\
0,1,1,0
1,1,0,0
1,1,1,1
'''
>>> counter = Counter()
>>> for line in filedata.splitlines():
a, b, c, d = map(int, line.split(','))
counter['a'] += a
counter['b'] += b
counter['c'] += c
counter['d'] += d
>>> counter
Counter({'b': 3, 'a': 2, 'c': 2, 'd': 1})
I'd prefer not to hardcode the keys, so maybe something like:
import csv
from collections import Counter
with open("abcd.txt", "rb") as fp:
reader = csv.DictReader(fp)
c = Counter()
for row in reader:
c.update({k: int(v) for k,v in row.iteritems()})
which produces
>>> c
Counter({'b': 3, 'a': 2, 'c': 2, 'd': 1})
from collections import defaultdict
with open("abc") as f:
next(f) # skip header
dic = defaultdict(int)
for line in f:
for x,y in zip("abcd",map(int,line.split(","))):
dic[x] += y
print dic
output:
defaultdict(<type 'int'>, {'a': 2, 'c': 2, 'b': 3, 'd': 1})
using collections.Counter:
from collections import Counter
with open("abc") as f:
next(f)
c = Counter()
for line in f:
c.update( dict(zip ("abcd", map(int,line.split(",")) )) )
print c
output:
Counter({'b': 3, 'a': 2, 'c': 2, 'd': 1})
If you already have the matrix described, you can do this:
mat=[['a','b','c','d'],
[ 0, 1, 1, 0],
[ 1, 1, 0, 0],
[ 1, 1, 1, 1]]
print {t[0]:sum(t[1:]) for t in zip(*mat)}
prints:
{'a': 2, 'c': 2, 'b': 3, 'd': 1}

How to convert dictionary into string

I'm trying to use the solution provided here
Instead of getting a dictionary, how can I get a string with the same output i.e. character followed by the number of occurrences
Example:d2m2e2s3
To convert from the dict to the string in the format you want:
''.join('{}{}'.format(key, val) for key, val in adict.items())
if you want them alphabetically ordered by key:
''.join('{}{}'.format(key, val) for key, val in sorted(adict.items()))
Is this what you are looking for?
#!/usr/bin/python
dt={'d': 2, 'f': 2, 'g': 2, 'q': 5, 'w': 3}
st=""
for key,val in dt.iteritems():
st = st + key + str(val)
print st
output: q5w3d2g2f2
Or this?
#!/usr/bin/python
dt={'d': 2, 'f': 2, 'g': 2, 'q': 5, 'w': 3}
dt=sorted(dt.iteritems())
st=""
for key,val in dt:
st = st + key + str(val)
print st
output: d2f2g2q5w3
Example with join:
#!/usr/bin/python
adict=dt={'d': 2, 'f': 2, 'g': 2, 'q': 5, 'w': 3}
' '.join('{0}{1}'.format(key, val) for key, val in sorted(adict.items()))
output: 'd2 f2 g2 q5 w3'
>>> result = {'d': 2, 'f': 2, 'g': 2, 'q': 5, 'w': 3}
>>> ''.join('%s%d' % (k,v) for k,v in result.iteritems())
'q5w3d2g2f2'
or if you want them alphabetically...
>>> ''.join('%s%d' % (k,v) for k,v in sorted(result.iteritems()))
'd2f2g2q5w3'
or if you want them in increasing order of count...
>>> ''.join('%s%d' % (k,v) for k,v in sorted(result.iteritems(),key=lambda x:x[1]))
'd2g2f2w3q5'
Once you have the dict solution, just use join to join them into a string:
''.join([k+str(v) for k,v in result.iteritems()])
You can replace the '' with whatever separater (including none) you want between numbers
Another approach, avoiding the % interpolation (or format()) by using join() only:
''.join(''.join((k, str(v))) for k,v in mydict.items())

Categories