How to alter a function so it can count any alphabet character?

How to alter a function so it can count any alphabet character? - python

I have a function that counts DNA bases within a sequence and returns a count of them separately. The function is
def baseCounts(DNA):
for base in DNA:
numofAs = DNA.count('A')
numofCs = DNA.count('C')
numofGs = DNA.count('G')
numofTs = DNA.count('T')
return numofAs, numofCs, numofGs, numofTs
Now, I need to alter the function so it is not restricted to just the DNA alphabet of A, C, G and T.
I know I need to add the alphabet argument to the function
BaseCounts(DNA, alphabet):
However, I don't know what or how to code the rest of the for loop for any character? Keep in mind they have to be added separately?

You can use counter:
from collections import Counter
DNA = 'ATCGBBHHTTCCGGHH'
c = Counter(DNA)
print(c)
Output:
Counter({'C': 3, 'B': 2, 'H': 4, 'A': 1, 'G': 3, 'T': 3})
will return a Counter object which is a specialized dictionary where the keys correspond to the values encountered in the sequence DNA, and constitute your alphabet, and the values are the count of these values in DNA

Related

Keeping the same order for a dictionary

I need to write a function called updateHand(hand, word) which does this:
Assumes that 'hand' has all the letters in word. In other words, this
assumes that however many times a letter appears in 'word', 'hand' has
at least as many of that letter in it.
Updates the hand: uses up the letters in the given word and returns
the new hand, without those letters in it.
Has no side effects: does not modify hand.
word: string hand: dictionary (string -> int) returns: dictionary
(string -> int)
I wrote the code and everything is working except the fact that when 'hand' is returned, it is not in the same order:
updateHand({'u': 1, 'q': 1, 'a': 1, 'm': 1, 'l': 2, 'i': 1}, 'quail')
{'u': 0, 'i': 0, 'm': 1, 'a': 0, 'l': 1, 'q': 0}
Could someone give me the solution or even just a hint to this problem because I don't understand...

A dict will not returns the items by insertion order.
What you need is OrderedDict:
from collections import OrderedDict
my_dict = OrderedDict()
my_dict['a'] = 1
...
This concerns only python version < 3.6. From python 3.6, the insertion order is kept.

count_bases return as dictionary

So I was wondering if anybody wants to help me with this. I don't even understand where to begin? Any help would be appreciated.
Write a function called count_bases that counts the number of times each letter occurs in a given string. The results should be returned as a dictionary, with letters in upper case as keys and the number of occurrences as (integer) values
For example when the function is called with the string 'ATGATAGG', it should return {'A': 3, 'T': 2, 'G': 3, 'C': 0}. Please ensure your function uses return, not print(). The order of the keys in the dictionary does not need to follow this order (2 marks).
Make sure that your function works when passed any lower and/or uppercase DNA characters in the sequence string. (2 marks)
DNA sequences sometimes contain letters other than A, C, G to T to indicate degenerate nucleotides. For example, R can represent A or G (the purine bases). If the program encounters any letter other than A, C, G or T, it should also count the frequency of that letter and return within the dictionary object. (2 marks).

Use following code:
def count_bases(input_str):
result = {}
for s in input_str:
try:
result[s]+=1
except:
result[s] = 1
return result
print(count_bases('ATGATAGG'))
Output:
{'A': 3, 'T': 2, 'G': 3}

Try it:
def f(input):
d = {}
for s in input:
d[s] = d.get(s,0)+1
return d

from collections import Counter
def count_bases(sequence):
# since you want to count both lower and upper case letters,
# it'd be better if you convert the input sequence to either upper or lower.
sequence = sequence.upper()
# Counter (from collections) does the counting for you. It takes list as input.
# So, list(sequence) will separate letters from your sequence into a list of letters ('abc' => ['a', 'b', 'c'])
# It returns you a Counter object. Since you want a dictionary, cast it to dict.
return dict(Counter(list(sequence)))
count_bases('ATGATAGGaatdga')
{'A': 6, 'T': 3, 'G': 4, 'D': 1}

How does dict.setdefault() count the number of characters?

I got this code from the book Automate the boring stuff with Python, and I don't understand how the setdefault() method counts the number of unique characters.
Code:
message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = {}
for character in message:
count.setdefault(character, 0)
count[character] = count[character] + 1
print(count)
According to the book, the setdefault() method searches for the key in the dictionary and if not found updates the dictionary, if found does nothing.
But I don't understand the counting behaviour of setdefault and how it is done?
Output:
{' ': 13, ',': 1, '.': 1, 'A': 1, 'I': 1, 'a': 4, 'c': 3, 'b': 1, 'e': 5, 'd': 3, 'g': 2,
'i': 6, 'h': 3, 'k': 2, 'l': 3, 'o': 2, 'n': 4, 'p': 1, 's': 3, 'r': 5, 't': 6, 'w': 2, 'y': 1}
Please explain this to me.

In your example setdefault() is equivalent to this code...
if character not in count:
count[character] = 0
This is a nicer way (arguably) to do the same thing:
from collections import defaultdict
message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = defaultdict(int)
for character in message:
count[character] = count[character] + 1
print(count)
It works because the default int is 0.
An even nicer way is as follows:
from collections import Counter
print(Counter(
'It was a bright cold day in April, '
'and the clocks were striking thirteen.'))

It would be better to use defaultdict in at least this case.
from collections import defaultdict
count = defaultdict(int)
for character in message:
count[character] += 1
A defaultdict is constructed with a no argument function which creates an instance of whatever default value should be. If a key is not there then this function provides a value for it and inserts the key, value in the dictionary for you. Since int() returns 0 it is initialized correctly in this case. If you wanted it initialized to some other value, n, then you would do something like
count = defaultdict(lambda : n)

I am using the same textbook and I had the same problem. The answers provided are more sophisticated than the example in question, so they don't actually address the issue: the question above is - how does the code understand that it should count the number of occurrences. It turns out, it doesn't really "count". It just keeps changing the values, until it stops. So here is how I explained it to myself after a long and painful research:
message='It was a bright,\
cold day in April,\
and the clocks were \
striking thirteen.'
count={} # "count" is set as an empty dictionary, which we want to fill out
for character in message: # for each character, do the following:
count.setdefault(character,0) # if the character is not there,
# take it from the message above
# and set it in the dictionary
# so the new key is a letter (e.g. 'a') and value is 0
# (zero is the only value that we can set by default
# otherwise we would gladly set it to 1 (to start with the 1st occurrence))
# but if the character already exists - this line will do nothing! (this is pointed out in the same book, page 110)
# the next time it finds the same character
# - which means, its second occurrence -
# it won't change the key (letter)
# But we still want to change the value, so we write the following line:
count[character]=count[character]+1 # and this line will change the value e.g. increase it by 1
# because "count[character]",
# is a number, e.g. count['a'] is 1 after its first occurrence
# This is not an "n=n+1" line that we remember from while loops
# it doesn't mean "increase the number by 1
# and do the same operation from the start"
# it simply changes the value (which is an integer) ,
# which we are currently processing in our dictionary, by 1
# to summarize: we want the code to go through the characters
# and only react to the first occurence of each of them; so
# the setdefault does exactly that; it ignores the values;
# second, we want the code to increase the value by 1
# each time it encounters the same key;
# So in short:
# setdefault deals with the key only if it is new (first occurence)
# and the value can be set to change at each occurence,
# by a simple statement with a "+" operator
# the most important thing to understand here is that
# setdefault ignores the values, so to speak,
# and only takes keys, and even them only if they are newly introduced.
print(count) # prints the whole obtained dictionary

answer by #egon is very good and it answers the doubts raised here. I just modified the code little bit and hope it will be easy to understand now.
message = 'naga'
count = {}
for character in message:
count.setdefault(character,0)
print(count)
count[character] = count[character] + 1
print(count)
print(count)
and the output will be as follows
{'n': 0} # first key and value set in count
{'n': 1} # set to this value when count[character] = count[character] + 1 is executed.
{'n': 1, 'a': 0} # so on
{'n': 1, 'a': 1}
{'g': 0, 'n': 1, 'a': 1}
{'g': 1, 'n': 1, 'a': 1}
{'g': 1, 'n': 1, 'a': 1}
{'g': 1, 'n': 1, 'a': 2}
{'g': 1, 'n': 1, 'a': 2}

message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = {} #This is an empty dictionary. We will add key-value pairs to this dictionary with the help of the following lines of code (The for-loop and its code block).
for character in message: #The loop will run for the number of single characters (Each letter & each space between the words are characters) in the string assigned to 'message'.
#That means the loop will run for 73 times.
#In each iteration of the loop, 'character' will be set to the current character of the string for the running iteration.
#That means the loop will start with 'I' and end with '.'(period). 'I' is the current character of the first iteration and '.' is the current character of the last iteration of the for-loop.
count.setdefault(character, 0) #If the character assigned to 'character' is not in the 'count' dictionary, then the character will be added to the dictionary as a key with its value being set to 0.
count[character] = count[character] + 1 #The value of the key (character added as key) of 'count' in the running iteration of the loop is incremented by one.
#As a result of a key's value being incremented, we can track how many times a particular character in the string was iterated.
#^It's because the existing value of the existing key will be incremented by 1 for the number of times the particular character is iterated.
#The accuracy of exactly how many times a value should be incremented is ensured because already existing keys in the dictionary aren't updated with new values by set.default(), as it does so only if the key is missing in the dictionary.
print(count) #Prints out the dictionary with all the key-value pairs added.
#The key and its value in each key-value pair represent a specific character from the string assigned to 'message' and the number of times it's found in the string, respectively.

Key error in dictionary. How to make Python print my dictionary?

In my homework, this question is asking me to make a function where Python should create dictionary of how many words that start with a certain letter in the long string is symmetrical. Symmetrical means the word starts with one letter and ends in the same letter. I do not need help with the algorithm for this. I definitely know I have it right, but however I just need to fix this Key error that I cannot figure out. I wrote d[word[0]] += 1, which is to add 1 to the frequency of words that start with that particular letter.
The output should look like this (using the string I provided below):
{'d': 1, 'i': 3, 't': 1}
t = '''The sun did not shine
it was too wet to play
so we sat in the house
all that cold cold wet day
I sat there with Sally
we sat there we two
and I said how I wish
we had something to do'''
def symmetry(text):
from collections import defaultdict
d = {}
wordList = text.split()
for word in wordList:
if word[0] == word[-1]:
d[word[0]] += 1
print(d)
print(symmetry(t))

You're trying to increase the value of an entry which has yet to be made resulting in the KeyError. You could use get() for when there is no entry for a key yet; a default of 0 will be made (or any other value you choose). With this method, you would not need defaultdict (although very useful in certain cases).
def symmetry(text):
d = {}
wordList = text.split()
for word in wordList:
key = word[0]
if key == word[-1]:
d[key] = d.get(key, 0) + 1
print(d)
print(symmetry(t))
Sample Output
{'I': 3, 'd': 1, 't': 1}

You never actually use collections.defaultdict, although you import it. Initialize d as defaultdict(int), instead of as {}, and you're good to go.
def symmetry(text):
from collections import defaultdict
d = defaultdict(int)
wordList = text.split()
for word in wordList:
if word[0] == word[-1]:
d[word[0]] += 1
print(d)
print(symmetry(t))
Results in:
defaultdict(<class 'int'>, {'I': 3, 't': 1, 'd': 1})

How do I create a dictionary from a string returning the number of characters [duplicate]

This question already has answers here:
Count the number of occurrences of a character in a string
(26 answers)
Closed 8 years ago.
I want a string such as 'ddxxx' to be returned as ('d': 2, 'x': 3). So far I've attempted
result = {}
for i in s:
if i in s:
result[i] += 1
else:
result[i] = 1
return result
where s is the string, however I keep getting a KeyError. E.g. if I put s as 'hello' the error returned is:
result[i] += 1
KeyError: 'h'

The problem is with your second condition. if i in s is checking for the character in the string itself and not in the dictionary. It should instead be if i in result.keys() or as Neil mentioned It can just be if i in result
Example:
def fun(s):
result = {}
for i in s:
if i in result:
result[i] += 1
else:
result[i] = 1
return result
print (fun('hello'))
This would print
{'h': 1, 'e': 1, 'l': 2, 'o': 1}

You can solve this easily by using collections.Counter. Counter is a subtype of the standard dict that is made to count things. It will automatically make sure that indexes are created when you try to increment something that hasn’t been in the dictionary before, so you don’t need to check it yourself.
You can also pass any iterable to the constructor to make it automatically count the occurrences of the items in that iterable. Since a string is an iterable of characters, you can just pass your string to it, to count all characters:
>>> import collections
>>> s = 'ddxxx'
>>> result = collections.Counter(s)
>>> result
Counter({'x': 3, 'd': 2})
>>> result['x']
3
>>> result['d']
2
Of course, doing it the manual way is fine too, and your code almost works fine for that. Since you get a KeyError, you are trying to access a key in the dictionary that does not exist. This happens when you happen to come accross a new character that you haven’t counted before. You already tried to handle that with your if i in s check but you are checking the containment in the wrong thing. s is your string, and since you are iterating the character i of the string, i in s will always be true. What you want to check instead is whether i already exists as a key in the dictionary result. Because if it doesn’t you add it as a new key with a count of 1:
if i in result:
result[i] += 1
else:
result[i] = 1

Using collections.Counter is the sensible solution. But if you do want to reinvent the wheel, you can use the dict.get() method, which allows you to supply a default value for missing keys:
s = 'hello'
result = {}
for c in s:
result[c] = result.get(c, 0) + 1
print result
output
{'h': 1, 'e': 1, 'l': 2, 'o': 1}

Here is a simple way of doing this if you don't want to use collections module:
>>> st = 'ddxxx'
>>> {i:st.count(i) for i in set(st)}
{'x': 3, 'd': 2}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.