How to store parts of a string into a dictionary - python

For example I have
from collections import Counter
cnt = Counter()
text = 'CTGGAT'
def freqWords(text, k):
for i in text:
cnt [i] += 1
print cnt
Outputs: Counter({'A': 10, 'C': 9, 'T': 8, 'G': 4})
Which returns a nice dictionary, however, I want to store my items by the value of k. Like so, if k=2, then the dict will populate with the values of:
CT, TG, GG, GA, AT. If k=3 then: CTG, TGG, GGA, GAT.

Your for i in text iterates over the characters of text. You have to iterate over the length of text minus k and take a substring of text:
def freqWords(text, k):
return Counter(text[i:i+k] for i in xrange(len(text) - k))
works like this:
freqWords('CTGGAT', 2)
# Counter({'GG': 1, 'TG': 1, 'GA': 1, 'CT': 1})

Related

Histogram of lists enteries

I have a number of lists as follows:
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
I wish to generate a histogram based on the labels, ignoring the numbering, that is, a has 4 entries over all the lists, ba 1 entry, u has 1 entry, and so on. The labels, are file names from a specific folder, before adding the numbers, so it is a finite known list.
How can I perform such a count without a bunch of ugly loops? Can I use unique here, somehow?
You cannot acheive it without a loop. But you can instead use list comphrension to make it into a single line. Something like this.
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
lst = [x.split('_')[0] for x in (list1 + list2 + list3)]
print({x: lst.count(x) for x in lst})
You can use a defaultdict initialized to 0 to count the occurrence and get a nice container with the required information.
So, define the container:
from collections import defaultdict
histo = defaultdict(int)
I'd like to split the operation into methods.
First get the prefix from the string, to be used as key in the dictionary:
def get_prefix(string):
return string.split('_')[0]
This works like
get_prefix('az_1')
#=> 'az'
Then a method to update de dictionary:
def count_elements(lst):
for e in lst:
histo[get_prefix(e)] += 1
Finally you can call this way:
count_elements(list1)
count_elements(list2)
count_elements(list3)
dict(histo)
#=> {'a': 5, 'b': 2, 'c': 1, 'aa': 1, 'd': 1, 'ba': 1, 'u': 1}
Or directly
count_elements(list1 + list2 + list3)
To get the unique count, call it using set:
count_elements(set(list1 + list2 + list3))
dict(histo)
{'ba': 1, 'a': 4, 'aa': 1, 'b': 2, 'u': 1, 'd': 1, 'c': 1}

Character count in string

def charcount(stri):
for i in stri:
count = 0
for j in stri:
if stri[i] == stri[j]:
count += 1
I am new to python and currently learning string operations, can anyone tell me what is wrong in this program? The function tries to print a count of each character in given string.
For eg: string ="There is shadow behind you"
I want to count how many times each character have come in string
Counting characters in a string can be done with the Counter() class like:
Code:
from collections import Counter
def charcount(stri):
return Counter(stri)
print(charcount('The function try to print count of each character '
'in given string . Please help'))
Results:
Counter({' ': 14, 'e': 7, 'n': 7, 't': 7, 'c': 5, 'i': 5,
'r': 5, 'h': 4, 'o': 4, 'a': 4, 'f': 2, 'u': 2,
'p': 2, 'g': 2, 's': 2, 'l': 2, 'T': 1, 'y': 1,
'v': 1, '.': 1, 'P': 1})
Feedback on code:
In these lines:
for i in stri:
count = 0
for j in stri:
The outer loop is looping over each character in stri, and the inner loop is looping over every character in stri. This is like a Cartesian product of the elements in the list, and is not necessary here.
Secondly, in this line:
if stri[i] == stri[j]:
You are accessing stri by its indices, but i and j are not indices, they are the characters themselves. So treating them as indices does not work here, since characters are not valid indices for lists. If you wanted to access just the indices, you could access them with range(len()):
for i in range(len(stri)):
count = 0
for j in range(len(stri)):
if stri[i] == stri[j]:
Or if you want to access the elements and their indices, you can use enumerate().
Having said this, your approach is too complicated and needs to be redone. You need to group your characters and count them. Using nested loops is overkill here.
Alternative approaches:
There are lots of better ways to do this such as using collections.Counter() and dictionaries. These data structures are very good for counting.
Since it also looks like your struggling with loops, I suggest going back to the basics, and then attempt doing this problem with a dictionary.
This is what you need to do. Iterate through the input string and use a hash to keep track of the counts. In python, the basic hash is a dictionary.
def charCounter(string):
d = {} # initialize a new dictionary
for s in string:
if s not in d:
d[s] = 1
else:
d[s] += 1
return d
print charCounter("apple")
# returns {'a': 1, 'p': 2, 'e': 1, 'l': 1}
Just little modification in your solution
first you are looping wrong:-
Take a look:-
def charcount(stri):
d = {}
for i in stri:
if i in d:
d[i] = d[i] +1
else:
d[i] = 1
return d
print (charcount("hello")) #expected outpu
Counting each characters in a string
>>> from collections import Counter
>>> string ="There is shadow behind you"
>>> Counter(string)
Counter({' ': 4, 'h': 3, 'e': 3, 'i': 2, 's': 2, 'd': 2, 'o': 2, 'T': 1, 'r':
1, 'a': 1, 'w': 1, 'b': 1, 'n': 1, 'y': 1, 'u': 1})
If you don't want to use any import :
def charcount(string):
occurenceDict = dict()
for char in string:
if char not in occurenceDict:
occurenceDict[char] = 1
else :
occurenceDict[char] += 1
return(occurenceDict)
You can use the following code.
in_l = ','.join(str(input('Put a string: '))).split(',')
d1={}
for i in set(in_l):
d1[i] = in_l.count(i)
print(d1)
public class Z {
public static void main(String[] args) {
int count=0;
String str="aabaaaababa";
for(int i=0;i<str.length();i++) {
if(str.charAt(i)=='a') {
count++;
}
}
System.out.println(count);
}
}

Finding item frequency in list of lists

Let's say I have a list of lists and I want to find the frequency in which pairs (or more) of elements appears in total.
For example, if i have [[a,b,c],[b,c,d],[c,d,e]
I want :(a,b) = 1, (b,c) = 2, (c,d) = 2, etc.
I tried finding a usable apriori algorithm that would allow me to do this, but i couldn't find a easy to implement one in python.
How would I approach this problem in a better way?
This is a way to do it:
from itertools import combinations
l = [['a','b','c'],['b','c','d'],['c','d','e']]
d = {}
for i in l:
# for every item on l take all the possible combinations of 2
comb = combinations(i, 2)
for c in comb:
k = ''.join(c)
if d.get(k):
d[k] += 1
else:
d[k] = 1
Result:
>>> d
{'bd': 1, 'ac': 1, 'ab': 1, 'bc': 2, 'de': 1, 'ce': 1, 'cd': 2}

Python count/dictionary count

dct = {}
with open("grades_single.txt","r") as g:
content = g.readlines()[1].strip('\n')
for item in content:
dct[item] = content.count(item)
LetterA = max(dct.values())
print(dct)
I'm very new to python so please excuse me. This is my code so far and it works but not as it's intended to. I'm trying to count the frequency off certain letters on new lines so I can do a mathematical function with each letter. The program counts all the letters and prints them but I'd like to be able to count each letter one by one I.E 7As, new fuction 4Bs etc.
At the moment the program is printing them off in one function but yeah I'd like to split them up so I can work with each letter one by one. {'A': 9, 'C': 12, 'B': 19, 'E': 4, 'D': 5, 'F': 1}
Does anyone know how to count the frequency of each letter by letter?
ADCBCBBBADEBCCBADBBBCDCCBEDCBACCFEABBCBBBCCEAABCBB
Example of what I'd like to count.
>>> from collections import Counter
>>> s = "ADCBCBBBADEBCCBADBBBCDCCBEDCBACCFEABBCBBBCCEAABCBB"
>>> Counter(s)
Counter({'B': 19, 'C': 14, 'A': 7, 'D': 5, 'E': 4, 'F': 1})
collections.Counter is clean, but if you were in a hurry, you could iterate over all of the elements and place them into a dictionary yousrelf.
s = 'ADCBCBBBADEBCCBADBBBCDCCBEDCBACCFEABBCBBBCCEAABCBB'
grades = {}
for letter in s:
grades[letter] = grades.get(letter, 0) + 1

How to return the number of characters whose frequency is above a threshold

How do I print the number of upper case characters whose frequency is above a threshold (in the tutorial)?
The homework question is:
Your task is to write a function which takes as input a single non-negative number and returns (not print) the number of characters in the tally whose count is strictly greater than the argument of the function. Your function should be called freq_threshold.
My answer is:
mobyDick = "Blah blah A B C A RE."
def freq_threshold(threshold):
tally = {}
for char in mobyDick:
if char in tally:
tally[char] += 1
else:
tally[char] = 1
for key in tally.keys():
if key.isupper():
print tally[key],tally.keys
if threshold>tally[key]:return threshold
else:return tally[key]
It doesn't work, but I don't know where it is wrong.
Your task is to return number of characters that satisfy the condition. You're trying to return count of occurrences of some character. Try this:
result = 0
for key in tally.keys():
if key.isupper() and tally[key] > threshold:
result += 1
return result
You can make this code more pythonic. I wrote it this way to make it more clear.
The part where you tally up the number of each character is fine:
>>> pprint.pprint ( tally )
{' ': 5,
'.': 1,
'A': 2,
'B': 2,
'C': 1,
'E': 1,
'R': 1,
'a': 2,
'b': 1,
'h': 2,
'l': 2,
'\x80': 2,
'\xe3': 1}
The error is in how you are summarising the tally.
Your assignment asked you to print the number of characters occurring more than n times in the string.
What you are returning is either n or the number of times one particular character occurred.
You instead need to step through your tally of characters and character counts, and count how many characters have frequencies exceeding n.
Do not reinvent the wheel, but use a counter object, e.g.:
>>> from collections import Counter
>>> mobyDick = "Blah blah A B C A RE."
>>> c = Counter(mobyDick)
>>> c
Counter({' ': 6, 'a': 2, 'B': 2, 'h': 2, 'l': 2, 'A': 2, 'C': 1, 'E': 1, '.': 1, 'b': 1, 'R': 1})
from collections import Counter
def freq_threshold(s, n):
cnt = Counter(s)
return [i for i in cnt if cnt[i]>n and i.isupper()]
To reinvent the wheel:
def freq_threshold(s, n):
d = {}
for i in s:
d[i] = d.get(i, 0)+1
return [i for i in d if d[i]>n and i.isupper()]

Categories