Incrementing a dictionary across multiple strings - not using any import functions

Incrementing a dictionary across multiple strings - not using any import functions - python

I am trying to increment a dictionary in Python 3, counting the number of occurrences of certain characters in a string.
I cannot use any import/count function to do so, and must iterate through one character at a time. All other queries on here seem to use an import functionality to answer the questions.
So far I have a dictionary that doesnt count duplicate characters nor does it update when presented with a new string:
def IncrementalCount(counts, line, charList):
counts={}
for letter in charList:
if letter in line:
if letter in counts.keys():
counts[letter] += 1
else:
counts[letter] = 1
if letter not in line:
counts[letter] = 0
return counts
counts = {}
counts= IncrementalCount(counts,"{hello!{}","}{#")
print(counts)
counts= IncrementalCount(counts,"#Goodbye!}","}{#!#")
print(counts)
current result
{'}': 1, '{': 1, '#': 0}
{'}': 1, '{': 0, '#': 1, '!': 1, '#': 0}
desired result
{'}': 1, '{': 2, '#': 0}
{'}': 2, '{': 2, '#': 1, '!': 1, '#': 0}
Any help would be really appreciated on what edits i need to make. I dont understand why my "counts[letter] +=1" doesnt count duplicate entries.

You iterate over every letter and test if letter in line, but you do not count the number of times it occurs in the output.
Nevertheless, you make things too complex, we can construct a dictionary by using line.count(..) instead and write this as list comprehension:
def IncrementalCount(line, charList):
return { letter: line.count(letter) for letter in charList }
This then produces:
>>> IncrementalCount("{hello!{}","}{#")
{'}': 1, '{': 2, '#': 0}
>>> IncrementalCount("#Goodbye!}","}{#!#")
{'}': 1, '{': 0, '#': 1, '!': 1, '#': 0}
In case we wish to increment an existing dictionary, we can use:
def IncrementalCount(counts, line, charList):
for letter in charList:
counts[letter] += line.count(letter)
Or in case it is possible that not all keys are present, we can use for instane a defaultdict (which is usually a more compact and efficient way), or we can retrieve it
def IncrementalCount(counts, line, charList):
for letter in charList:
counts.setdefault(letter, 0)
counts[letter] += line.count(letter)
Or we can use .get(..) to retrieve it with a default value, but usually a defaultdict is a better design decision here.
N.B.: usually function names in Python are all lower case and underscores (_) are used for spaces, so here it would by Pythonic to name it incremental_count.
N.B.: there exist more effective ways to count: right now we iterate through the entire string for every character.

Try this
def IncrementalCount(counts, line, charList):
for letter in charList:
counts[letter]=counts.get(letter,0)+line.count(letter)
return counts
counts = {}
counts= IncrementalCount(counts,"{hello!{}","}{#")
print(counts)
counts= IncrementalCount(counts,"#Goodbye!}","}{#!#")
print(counts)
out put
{'}': 1, '{': 2, '#': 0}
{'}': 2, '{': 2, '#': 1, '!': 1, '#': 0}

Related

How to loop through dictionary to get both frequency of words and symbols?

I have set up a function that finds the frequency of the number of times words appear in a text file, but the frequency is wrong for a couple of words because the function is not separating words from symbols like "happy,".
I have already tried to use the split function to split it with every "," and every "." but that does not work, I am also not allowed to import anything into the function as the professor does not want us to.
The code belows turns the text file into a dictionary and then uses the word or symbol as the key and the frequency as the value.
def getTokensFreq(file):
dict = {}
with open(file, 'r') as text:
wholetext = text.read().split()
for word in wholetext:
if word in dict:
dict[word] += 1
else:
dict[word] = 1
return dict
We are using the text file with the name of "f". This what is inside the file.
I felt happy because I saw the others were happy and because I knew I should feel happy, but I was not really happy.
The desired results is this where both words and symbols are counted.
{'i': 5, 'felt': 1, 'happy': 4, 'because': 2, 'saw': 1,
'the': 1, 'others': 1, 'were': 1, 'and': 1, 'knew': 1, 'should': 1,
'feel': 1, ',': 1, 'but': 1, 'was': 1, 'not': 1, 'really': 1, '.': 1}
This is what I am getting, where some words and symbols are counted as a separate word
{'I': 5, 'felt': 1, 'happy': 2, 'because': 2, 'saw': 1, 'the': 1, 'others': 1, 'were': 1, 'and': 1, 'knew': 1, 'should': 1, 'feel': 1, 'happy,': 1, 'but': 1, 'was': 1, 'not': 1, 'really': 1, 'happy.': 1}

This is how to generate your desired frequency dictionary for one sentence. To do for the whole file, just call this code for each line to update the content of your dictionary.
# init vars
f = "I felt happy because I saw the others were happy and because I knew I should feel happy, but I was not really happy."
d = {}
# count punctuation chars
d['.'] = f.count('.')
d[','] = f.count(',')
# remove . and ,
for word in f.replace(',', '').replace('.','').split(' '):
if word not in d.keys():
d[word] = 1
else:
d[word] += 1
Alternatively, you can use a mix of regex and list expressions, like the following:
import re
# filter words and symbols
words = re.sub('[^A-Za-z0-9\s]+', '', f).split(' ')
symbols = re.sub('[A-Za-z0-9\s]+', ' ', f).strip().split(' ')
# count occurrences
count_words = dict(zip(set(words), [words.count(w) for w in set(words)]))
count_symbols = dict(zip(set(symbols), [symbols.count(s) for s in set(symbols)]))
# parse results in dict
d = count_symbols.copy()
d.update(count_words)
Output:
{',': 1,
'.': 1,
'I': 5,
'and': 1,
'because': 2,
'but': 1,
'feel': 1,
'felt': 1,
'happy': 4,
'knew': 1,
'not': 1,
'others': 1,
'really': 1,
'saw': 1,
'should': 1,
'the': 1,
'was': 1,
'were': 1}
Running the previous 2 approaches a 1000x times using a loop and capturing the run-times, proves that the second approach is faster than the first approach.

My solution is firstly replace all symbols into a space and then split by space. We will need a little help from regular expression.
import re
a = 'I felt happy because I saw the others were happy and because I knew I should feel happy, but I was not really happy.'
b = re.sub('[^A-Za-z0-9]+', ' ', a)
print(b)
wholetext = b.split(' ')
print(wholetext)

My solution is similar to Verse's but it also takes makes an array of the symbols in the sentence. Afterwards, you can use the for loop and the dictionary to determine the counts.
import re
a = 'I felt happy because I saw the others were happy and because I knew I should feel happy, but I was not really happy.'
b = re.sub('[^A-Za-z0-9\s]+', ' ', a)
print(b)
wholetext = b.split(' ')
print(wholetext)
c = re.sub('[A-Za-z0-9\s]+', ' ', a)
symbols = c.strip().split(' ')
print(symbols)
# do the for loop stuff you did in your question but with wholetext and symbols
Oh, I missed that you couldn't import anything :(

Character count in string

def charcount(stri):
for i in stri:
count = 0
for j in stri:
if stri[i] == stri[j]:
count += 1
I am new to python and currently learning string operations, can anyone tell me what is wrong in this program? The function tries to print a count of each character in given string.
For eg: string ="There is shadow behind you"
I want to count how many times each character have come in string

Counting characters in a string can be done with the Counter() class like:
Code:
from collections import Counter
def charcount(stri):
return Counter(stri)
print(charcount('The function try to print count of each character '
'in given string . Please help'))
Results:
Counter({' ': 14, 'e': 7, 'n': 7, 't': 7, 'c': 5, 'i': 5,
'r': 5, 'h': 4, 'o': 4, 'a': 4, 'f': 2, 'u': 2,
'p': 2, 'g': 2, 's': 2, 'l': 2, 'T': 1, 'y': 1,
'v': 1, '.': 1, 'P': 1})

Feedback on code:
In these lines:
for i in stri:
count = 0
for j in stri:
The outer loop is looping over each character in stri, and the inner loop is looping over every character in stri. This is like a Cartesian product of the elements in the list, and is not necessary here.
Secondly, in this line:
if stri[i] == stri[j]:
You are accessing stri by its indices, but i and j are not indices, they are the characters themselves. So treating them as indices does not work here, since characters are not valid indices for lists. If you wanted to access just the indices, you could access them with range(len()):
for i in range(len(stri)):
count = 0
for j in range(len(stri)):
if stri[i] == stri[j]:
Or if you want to access the elements and their indices, you can use enumerate().
Having said this, your approach is too complicated and needs to be redone. You need to group your characters and count them. Using nested loops is overkill here.
Alternative approaches:
There are lots of better ways to do this such as using collections.Counter() and dictionaries. These data structures are very good for counting.
Since it also looks like your struggling with loops, I suggest going back to the basics, and then attempt doing this problem with a dictionary.

This is what you need to do. Iterate through the input string and use a hash to keep track of the counts. In python, the basic hash is a dictionary.
def charCounter(string):
d = {} # initialize a new dictionary
for s in string:
if s not in d:
d[s] = 1
else:
d[s] += 1
return d
print charCounter("apple")
# returns {'a': 1, 'p': 2, 'e': 1, 'l': 1}

Just little modification in your solution
first you are looping wrong:-
Take a look:-
def charcount(stri):
d = {}
for i in stri:
if i in d:
d[i] = d[i] +1
else:
d[i] = 1
return d
print (charcount("hello")) #expected outpu

Counting each characters in a string
>>> from collections import Counter
>>> string ="There is shadow behind you"
>>> Counter(string)
Counter({' ': 4, 'h': 3, 'e': 3, 'i': 2, 's': 2, 'd': 2, 'o': 2, 'T': 1, 'r':
1, 'a': 1, 'w': 1, 'b': 1, 'n': 1, 'y': 1, 'u': 1})

If you don't want to use any import :
def charcount(string):
occurenceDict = dict()
for char in string:
if char not in occurenceDict:
occurenceDict[char] = 1
else :
occurenceDict[char] += 1
return(occurenceDict)

You can use the following code.
in_l = ','.join(str(input('Put a string: '))).split(',')
d1={}
for i in set(in_l):
d1[i] = in_l.count(i)
print(d1)

public class Z {
public static void main(String[] args) {
int count=0;
String str="aabaaaababa";
for(int i=0;i<str.length();i++) {
if(str.charAt(i)=='a') {
count++;
}
}
System.out.println(count);
}
}

Using hashing to find a repeated substring inside a string

Given the problem: Find a repeated substring in a string, is it possible to use hashing? I want to create a dictionary with the substrings as keys and the number of repeated instances as values. Here is what I have so far. I am getting an error because I am using a substring as a key for the dictionary. Can anyone spot my mistake? Thank you!!!
def findsubs(str):
d={}
for i in range(len(str)-1):
for j in range(i+2, len(str)-2):
if d[str[i:j]]>1:
return str[i:j]
else:
d[str[i:j]] = d[str[i:j]] +1
return 0
print findsubs("abcbc")

The general idea should work. It's just that if a key isn't found in the dictionary when you do a lookup, you get an error - so you have to check whether the key exists before doing a look-up and initialize if it is doesn't:
def findsubs(str):
d={}
for i in range(len(str)-1):
for j in range(i+2, len(str)-2):
if str[i:j] not in d:
d[str[i:j]] = 0
if d[str[i:j]]>1:
return str[i:j]
else:
d[str[i:j]] = d[str[i:j]] +1
return 0
Note that instead of if str[i:j] not in d: d[str[i:j]] = 0, you can do d.setdefault(str[i:j], 0), which sets the value to 0 if the key isn't in the dict, and leaves it unchanged if it does.
A few more comments though:
You should return None, not 0, if you don't find anything.
You shouldn't call a variable str since that's a built-in function.
You want to iterate j until the end of the string.
As-written, it'll only return a substring if it's been found 3 times. Really you can re-write it using a set of previously-found substrings instead:
So:
def findsubs(s):
found = set()
for i in range(len(s)-1):
for j in range(i+2, len(s)+1):
substr = s[i:j]
if substr in found:
return substr
found.add(substr)
return None

You were almost there
def findsubs(instr):
d={}
for i in range(len(instr)):
for j in range(i+2, len(instr)+1):
print instr[i:j]
d[instr[i:j]] = d.get(instr[i:j],0) + 1
return d
instr = 'abcdbcab'
print instr
print findsubs('abcdbcab')
This will work, i added an inside print for debug purposes, remove it after you test it.
The result is the dict with the substring count has you asked for :)
{'abcd': 1, 'ab': 2, 'cdb': 1, 'dbc': 1, 'cdbcab': 1, 'cd': 1, 'abc': 1, 'cdbc': 1, 'bcab': 1, 'abcdbc': 1, 'ca': 1, 'db
ca': 1, 'bc': 2, 'dbcab': 1, 'db': 1, 'cab': 1, 'bcdbcab': 1, 'bcdbc': 1, 'abcdbca': 1, 'cdbca': 1, 'abcdbcab': 1, 'bcdb
': 1, 'bcd': 1, 'abcdb': 1, 'bca': 1, 'bcdbca': 1}

Word frequency in a string without spaces and with special characters?

Let's say I have the following string:
"hello&^uevfehello!`.<hellohow*howdhAreyou"
How would I go about counting the frequency of english words that are substrings of it? In this case I would want a result such as:
{'hello': 3, 'how': 2, 'are': 1, 'you': 1}
I searched previous question which were similar to this one but I couldn't really find anything that works. A close solution seemed to be using regular expressions, but it didn't work either. It might be because I was implementing it wrong since I'm not familiar with how it actually works.
How to find the count of a word in a string?
it's the last answer
from collections import *
import re
Counter(re.findall(r"[\w']+", text.lower()))
I also tried creating a very bad function that iterates through every single possible arrangement of consecutive letters in the string (up to a max of 8 letters or so). The problem with doing that is
1) it's way longer than it should be and
2) it adds extra words. ex: if "hello" was in the string, "hell" would also be found.
I'm not very familiar with regex which is probably the right way to do this.

d, w = "hello&^uevfehello!`.<hellohow*howdhAreyou", ["hello","how","are","you"]
import re, collections
pattern = re.compile("|".join(w), flags = re.IGNORECASE)
print collections.Counter(pattern.findall(d))
Output
Counter({'hello': 3, 'how': 2, 'you': 1, 'Are': 1})

from collections import defaultdict
s = 'hello&^uevfehello!`.<hellohow*howdhAreyou'
word_counts = defaultdict(lambda: 0)
i = 0
while i < len(s):
j = len(s)
while j > i:
if is_english_word(s[i:j]):
word_counts[s[i:j]] += 1
break
j -= 1
if j == i:
i += 1
else:
i = j
print word_counts

You need to extract all words from the string, then for each word you need to find substrings and then check if any of the substring is english word. I have used english dictionary from answer in How to check if a word is an English word with Python?
There are some false positives in the result however so you may want to use better dictionary or have a custom method to check for desired words.
import re
import enchant
from collections import defaultdict
# Get all substrings in given string.
def get_substrings(string):
for i in range(0, len(string)):
for j in range(i, len(string)):
yield s[i:j+1]
text = "hello&^uevfehello!`.<hellohow*howdhAreyou"
strings = re.split(r"[^\w']+", text.lower())
# Use english dictionary to check if a word exists.
dictionary = enchant.Dict("en_US")
counts = defaultdict(int)
for s in strings:
for word in get_substrings(s):
if (len(word) > 1 and dictionary.check(word)):
counts[word] += 1
print counts
Output:
defaultdict(, {'are': 1, 'oho': 1, 'eh': 1, 'ell': 3,
'oh': 1, 'lo': 3, 'll': 3, 'yo': 1, 'how': 2, 'hare': 1, 'ho': 2,
'ow': 2, 'hell': 3, 'you': 1, 'ha': 1, 'hello': 3, 're': 1, 'he': 3})

How to return the number of characters whose frequency is above a threshold

How do I print the number of upper case characters whose frequency is above a threshold (in the tutorial)?
The homework question is:
Your task is to write a function which takes as input a single non-negative number and returns (not print) the number of characters in the tally whose count is strictly greater than the argument of the function. Your function should be called freq_threshold.
My answer is:
mobyDick = "Blah blah A　B C A RE."
def freq_threshold(threshold):
tally = {}
for char in mobyDick:
if char in tally:
tally[char] += 1
else:
tally[char] = 1
for key in tally.keys():
if key.isupper():
print tally[key],tally.keys
if threshold>tally[key]:return threshold
else:return tally[key]
It doesn't work, but I don't know where it is wrong.

Your task is to return number of characters that satisfy the condition. You're trying to return count of occurrences of some character. Try this:
result = 0
for key in tally.keys():
if key.isupper() and tally[key] > threshold:
result += 1
return result
You can make this code more pythonic. I wrote it this way to make it more clear.

The part where you tally up the number of each character is fine:
>>> pprint.pprint ( tally )
{' ': 5,
'.': 1,
'A': 2,
'B': 2,
'C': 1,
'E': 1,
'R': 1,
'a': 2,
'b': 1,
'h': 2,
'l': 2,
'\x80': 2,
'\xe3': 1}
The error is in how you are summarising the tally.
Your assignment asked you to print the number of characters occurring more than n times in the string.
What you are returning is either n or the number of times one particular character occurred.
You instead need to step through your tally of characters and character counts, and count how many characters have frequencies exceeding n.

Do not reinvent the wheel, but use a counter object, e.g.:
>>> from collections import Counter
>>> mobyDick = "Blah blah A B C A RE."
>>> c = Counter(mobyDick)
>>> c
Counter({' ': 6, 'a': 2, 'B': 2, 'h': 2, 'l': 2, 'A': 2, 'C': 1, 'E': 1, '.': 1, 'b': 1, 'R': 1})

from collections import Counter
def freq_threshold(s, n):
cnt = Counter(s)
return [i for i in cnt if cnt[i]>n and i.isupper()]
To reinvent the wheel:
def freq_threshold(s, n):
d = {}
for i in s:
d[i] = d.get(i, 0)+1
return [i for i in d if d[i]>n and i.isupper()]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Incrementing a dictionary across multiple strings - not using any import functions - python

Related

How to loop through dictionary to get both frequency of words and symbols?

Character count in string

Using hashing to find a repeated substring inside a string

Word frequency in a string without spaces and with special characters?

How to return the number of characters whose frequency is above a threshold

Categories

Resources