So I was wondering if anybody wants to help me with this. I don't even understand where to begin? Any help would be appreciated.
Write a function called count_bases that counts the number of times each letter occurs in a given string. The results should be returned as a dictionary, with letters in upper case as keys and the number of occurrences as (integer) values
For example when the function is called with the string 'ATGATAGG', it should return {'A': 3, 'T': 2, 'G': 3, 'C': 0}. Please ensure your function uses return, not print(). The order of the keys in the dictionary does not need to follow this order (2 marks).
Make sure that your function works when passed any lower and/or uppercase DNA characters in the sequence string. (2 marks)
DNA sequences sometimes contain letters other than A, C, G to T to indicate degenerate nucleotides. For example, R can represent A or G (the purine bases). If the program encounters any letter other than A, C, G or T, it should also count the frequency of that letter and return within the dictionary object. (2 marks).
Use following code:
def count_bases(input_str):
result = {}
for s in input_str:
try:
result[s]+=1
except:
result[s] = 1
return result
print(count_bases('ATGATAGG'))
Output:
{'A': 3, 'T': 2, 'G': 3}
Try it:
def f(input):
d = {}
for s in input:
d[s] = d.get(s,0)+1
return d
from collections import Counter
def count_bases(sequence):
# since you want to count both lower and upper case letters,
# it'd be better if you convert the input sequence to either upper or lower.
sequence = sequence.upper()
# Counter (from collections) does the counting for you. It takes list as input.
# So, list(sequence) will separate letters from your sequence into a list of letters ('abc' => ['a', 'b', 'c'])
# It returns you a Counter object. Since you want a dictionary, cast it to dict.
return dict(Counter(list(sequence)))
count_bases('ATGATAGGaatdga')
{'A': 6, 'T': 3, 'G': 4, 'D': 1}
Related
I have a function that counts DNA bases within a sequence and returns a count of them separately. The function is
def baseCounts(DNA):
for base in DNA:
numofAs = DNA.count('A')
numofCs = DNA.count('C')
numofGs = DNA.count('G')
numofTs = DNA.count('T')
return numofAs, numofCs, numofGs, numofTs
Now, I need to alter the function so it is not restricted to just the DNA alphabet of A, C, G and T.
I know I need to add the alphabet argument to the function
BaseCounts(DNA, alphabet):
However, I don't know what or how to code the rest of the for loop for any character? Keep in mind they have to be added separately?
You can use counter:
from collections import Counter
DNA = 'ATCGBBHHTTCCGGHH'
c = Counter(DNA)
print(c)
Output:
Counter({'C': 3, 'B': 2, 'H': 4, 'A': 1, 'G': 3, 'T': 3})
will return a Counter object which is a specialized dictionary where the keys correspond to the values encountered in the sequence DNA, and constitute your alphabet, and the values are the count of these values in DNA
In order to optimize a code in one single line, I am trying to write a determinate statement in my code without calling any function or method. While I was thinking about this I wondered if this is even possible in my case. I was searching some information about this but it seems to be very rarely, but in my current work I must be able to keep the code intact except that optimize section.
Hope you could give me a hand. Any help is welcome.
This is my current progress.
def count_chars(s):
'''(str) -> dict of {str: int}
Return a dictionary where the keys are the characters in s and the values
are how many times those characters appear in s.
>>> count_chars('abracadabra')
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
'''
d = {}
for c in s:
if not (c in d):
# This is the line it is assumed to be modified without calling function or method
else:
d[c] = d[c] + 1
return d
How about this, as mentioned in the comments, it does implicitly use functions, but I think it may be the sort of thing you are looking for?
s='abcab'
chars={}
for char in s:
if char not in chars:
chars[char]=0
chars[char]+=1
Result
{'a': 2, 'b': 2, 'c': 1}
This question already has answers here:
Count the number of occurrences of a character in a string
(26 answers)
Closed 8 years ago.
I want a string such as 'ddxxx' to be returned as ('d': 2, 'x': 3). So far I've attempted
result = {}
for i in s:
if i in s:
result[i] += 1
else:
result[i] = 1
return result
where s is the string, however I keep getting a KeyError. E.g. if I put s as 'hello' the error returned is:
result[i] += 1
KeyError: 'h'
The problem is with your second condition. if i in s is checking for the character in the string itself and not in the dictionary. It should instead be if i in result.keys() or as Neil mentioned It can just be if i in result
Example:
def fun(s):
result = {}
for i in s:
if i in result:
result[i] += 1
else:
result[i] = 1
return result
print (fun('hello'))
This would print
{'h': 1, 'e': 1, 'l': 2, 'o': 1}
You can solve this easily by using collections.Counter. Counter is a subtype of the standard dict that is made to count things. It will automatically make sure that indexes are created when you try to increment something that hasn’t been in the dictionary before, so you don’t need to check it yourself.
You can also pass any iterable to the constructor to make it automatically count the occurrences of the items in that iterable. Since a string is an iterable of characters, you can just pass your string to it, to count all characters:
>>> import collections
>>> s = 'ddxxx'
>>> result = collections.Counter(s)
>>> result
Counter({'x': 3, 'd': 2})
>>> result['x']
3
>>> result['d']
2
Of course, doing it the manual way is fine too, and your code almost works fine for that. Since you get a KeyError, you are trying to access a key in the dictionary that does not exist. This happens when you happen to come accross a new character that you haven’t counted before. You already tried to handle that with your if i in s check but you are checking the containment in the wrong thing. s is your string, and since you are iterating the character i of the string, i in s will always be true. What you want to check instead is whether i already exists as a key in the dictionary result. Because if it doesn’t you add it as a new key with a count of 1:
if i in result:
result[i] += 1
else:
result[i] = 1
Using collections.Counter is the sensible solution. But if you do want to reinvent the wheel, you can use the dict.get() method, which allows you to supply a default value for missing keys:
s = 'hello'
result = {}
for c in s:
result[c] = result.get(c, 0) + 1
print result
output
{'h': 1, 'e': 1, 'l': 2, 'o': 1}
Here is a simple way of doing this if you don't want to use collections module:
>>> st = 'ddxxx'
>>> {i:st.count(i) for i in set(st)}
{'x': 3, 'd': 2}
I have a dict and a string here, with the dict containing char-count as key value pair. I want to check if the all the characters in the string are completely contained in the dict.
This means that the dict should contain all the chars of the string, with their counts less than or equal to their corresponding values in the dict.
def isValidWord(strng, dct):
"""
Returns True if strng is entirely
composed of letters in the dct.
Otherwise, returns False.
Does not mutate hand or dct.
"""
d={}
for x in strng:
d[x]=d.get(x,0)
for x in d:
if d[x]> dct.get(x,0):
return False
return True
It seems to work well for most cases, but for some cases it doesn't. For example -
isValidWord('chayote', {'a': 1, 'c': 2, 'u': 2, 't': 2, 'y': 1, 'h': 1, 'z': 1,
'o': 2})
This gives output True, however the correct output is False.
This is because there is no e in the dict.
Where is the bug here ? And how can I check if all the pairs in a dict also exist in another dict, possibly with equal or lower corresponding values (of keys).
You meant for the line
d[x]=d.get(x,0)
to be
d[x]=d.get(x,0) + 1
otherwise, all the values in the dictionary would be 0, and the function would always return True (unless the string were empty or any values in the given dictionary were 0.
Also note that it would be easier to use collections.Counter for your first loop:
d = collections.Counter(strng)
As for your question of testing whether one dict is in another, you can do:
all(k in dct and v < dct[k] for k, v in d.items())
I have been trying to build a function to get letter frequencies from a string and store them in a dictionary.
I have done something like that :
s="today the weather was really nice"
def get_letter_freq(s):
for letter in(s):
x=letter.split()
f=dict()
for each_letter in x:
if f.has_key(x):
f[x]+=1
else:
f[x]=1
print f
Could you help me put things into order and find my mistakes?
Why I get an error that my 'f' is not defined?
In your code, your first for loop, where you have your letter.split() statement seems useless. Why you want to split a single character, you get in your loop?
Secondly, you have defined your f = dict() inside your function and
using it ouside. It will not be visible outside.
Third, your should not use f.has_key. Just do, key in my_dict to
check for availability of key in dict.
And at last, you can pass your dictionary as parameter to your
function. Then modify it there, and finally return it. (Although you can also do it without passing the dict in your function. just create a new one there, and return it).
So, in your code, almost everything is fine. You just need to remove your first for loop in function, and move f = dict() outside the function, before invoking it. And pass it as a paramter.
Way 1:
So, you can rather try the following modified code of yours: -
def get_letter_freq(my_dict, s):
for letter in s:
if letter in my_dict:
my_dict[letter] += 1
else:
my_dict[letter] = 1
return my_dict
my_dict = dict()
my_str = "today the weather was really nice"
print get_letter_freq(my_dict, my_str)
Way 2: -
Alternatively, you can also use a pre-defined library function Counter from collections, which does exactly what you want.
WAY 3: -
As suggested by #thebjorn in comment, you can also use defaultdict, which will make your task easier, in that, you won't have to check for the availability of key in dictionary before adding it. The count will automatically default to 0: -
from collections import defaultdict
def get_letter_freq(s):
my_dict = defaultdict(int)
for letter in s:
my_dict[letter] += 1
return my_dict
my_str = "today the weather was really nice"
print list(get_letter_freq(my_str).items())
Besides that indentation error your program has many other problems, like:
s = "today the weather was really nice"
def get_letter_freq(s):
f = dict()
for each_letter in s: #you can directly iterate over a string, so no need of split()
if each_letter in f: #has_key() has been deprecated
f[each_letter]+=1
else:
f[each_letter]=1
return f #better return the output from function
print get_letter_freq(s)
By the way collections.Counter() is good for this purpose:
In [61]: from collections import Counter
In [62]: strs = "today the weather was really nice"
In [63]: Counter(strs)
Out[63]: Counter({' ': 5, 'e': 5, 'a': 4, 't': 3, 'h': 2, 'l': 2, 'r': 2, 'w': 2, 'y': 2, 'c': 1, 'd': 1, 'i': 1, 'o': 1, 'n': 1, 's': 1})
f is defined inside get_letter_freq, you can't access it from outside.
Your function should return the constructed dictionary.
You should actually call the function.
What do you expect from splitting a single letter? Just leave that part out, and you don't need the inner loop.
print f needs to be indented, if it has to be part of get_letter_freq.
& f does not exist outside get_letter_freq. Hence the error.
import string
s="today the weather was really nice"
print dict([ ( letter, s.count(letter)) for letter in string.lowercase[:25]])
If case sensitivity is important use s.lower().count(letter) instead.