function for word frequency + dictionary - python

I am trying to create a function to take in a string and return how many times a word in it has been used (with the word) as a dictionary. I also want it to look for a specific list of words to search up the string when provided and return the frequency of the words in the given list found in the string.
Example,
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc))
should return
{'i':1 , 'went':1, 'to':2, 'school':1, 'today':1, 'learn':1}
And,
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc,wordlist=["I", "feel", "Great"]))
should return
{'i':1, 'feel':0, 'great':0}
This is what I have so far
def wordfunc(stringfunc,wordlist=[]):
count_dict = dict()
stringfunc=stringfunc.lower() # i want it to be case insensitive
word = stringfunc.split()
for i in range(len(word)):
x = ord(word[i][-1]) # in the next few lines I am trying to get rid of special characters
if (not(x>=97 and x<=112) or (x>=65 and x<= 90)):
word[i]=word[i][:-1] # if a word ends with , or ! i want it to discount last character
for i in wordlist:
if (i not in word):
count_dict[i]=0
else:
count_dict[i]=word.count(i)
return count_dict
When I try
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc,wordlist=["I", "feel", "Great"]))
I get
{'I':1, 'feel':0, 'Great':0} # i can't get a lower case i don't know why
and when I try
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc))
I get an empty dictionary {}
Can you help me identify my error? Thanks!

You "can't get lower case" because you didn't program it. If the input supplies wordlist, then you blithely accept whatever is there. In the given case, you have two words capitalized, so that's what comes out. Instead, you need to convert every element of wordlist to lower case, just as you did with the input string.
BTW, do not give misleading names to variables: stringfunc is not a function.
The main loop will be much easier to read if you quit playing games with ASCII code values. Instead, simply use isletter. If this is new to you, then I strongly recommend that you repeat your tutorial on string processing; you missed some useful things that you will now recognize.
That said, also look up the collections package, notably the Counter type. Once you've cleaned out all but letters and spaces in your input string, you can do the main processing with
count_dict = Counter(stringfunc.split())

Related

How to understand the flaw in my simple three part python code?

My Python exercise in 'classes' is as follows:
You have been recruited by your friend, a linguistics enthusiast, to create a utility tool that can perform analysis on a given piece of text. Complete the class "analyzedText" with the following methods:
Constructor (_init_) - This method should take the argument text, make is lowercase and remove all punctuation. Assume only the following punctuation is used: period (.), exclamation mark (!), comma (,), and question mark (?). Assign this newly formatted text to a new attribute called fmtText.
freqAll - This method should create and return dictionary of all unique words in the text along with the number of times they occur in the text. Each key in the dictionary should be the unique word appearing in the text and the associated value should be the number of times it occurs in the text. Create this dictionary from the fmtText attribute.
This was my code:
class analysedText(object)
def __init__ (self, text):
formattedText = text.replace('.',' ').replace(',',' ').replace('!',' ').replace('?',' ')
formattedText = formattedText.lower()
self.fmtText = formattedText
def freqAll(self):
wordList = self.fmtText.split(' ')
wordDict = {}
for word in set(wordList):
wordDict[word] = wordList(word)
return wordDict
I get errors on both of these and I can't seem to figure it out after a lot of little adjustments. I suspect the issue in the first part is when I try to assign a value to the newly formatted text but I cannot think of a workable solution. As for the second part, I am at a complete loss - I was wrongfully confident my answer was correct but I received a fail error when I ran it through the classroom's code cell to test it.
On the assumption that by 'errors' you mean a TypeError, this is caused because of line 13, wordDict[word] = wordList(word).
wordList is a list, and by using the ()/brackets you're telling Python that you want to call that list as a function. Which it cannot do.
According to your task, you are to instead find the occurrences of words in the list, which you could achieve with the .count() method. This method basically returns the total number of occurrences of an element in a list. (Feel free to read more about it here)
With this modification, (this is assuming you want wordDict to contain a dictionary with the word as the key, and the occurrence as the value) your freqAll function would look something like this:
def freqAll(self):
wordList = self.fmtText.split()
wordDict = {}
for word in set(wordList):
wordDict[word] = wordList.count(word) # wordList.count(word) returns the number of times the string word appears as an element in wordList
return wordDict
Although you could also achieve this same task with a class known as collections.Counter, (of course this means you have to import collections) which you can read more about here

Python - Finding all uppercase letters in string

im a really beginner with python and I'm trying to modify codes that I have seen in lessons.I have tried the find all uppercase letters in string.But the problem is it only gives me one uppercase letter in string even there is more than one.
def finding_upppercase_itterative(string_input):
for i in range(len(string_input)):
if string_input[i].isupper:
return string_input[i]
return "No uppercases found"
How should i modify this code to give me all uppercase letters in given string. If someone can explain me with the logic behind I would be glad.
Thank You!
Edit 1: Thank to S3DEV i have misstyped the binary search algorithm.
If you are looking for only small changes that make your code work, one way is to use a generator function, using the yield keyword:
def finding_upppercase_itterative(string_input):
for i in range(len(string_input)):
if string_input[i].isupper():
yield string_input[i]
print(list(finding_upppercase_itterative('test THINGy')))
If you just print finding_upppercase_itterative('test THINGy'), it shows a generator object, so you need to convert it to a list in order to view the results.
For more about generators, see here: https://wiki.python.org/moin/Generators
This is the fixed code written out with a lot of detail to each step. There are some other answers with more complicated/'pythonic' ways to do the same thing.
def finding_upppercase_itterative(string_input):
uppercase = []
for i in range(len(string_input)):
if string_input[i].isupper():
uppercase.append(string_input[i])
if(len(uppercase) > 0):
return "".join(uppercase)
else:
return "No uppercases found"
# Try the function
test_string = input("Enter a string to get the uppercase letters from: ")
uppercase_letters = finding_upppercase_itterative(test_string)
print(uppercase_letters)
Here's the explanation:
create a function that takes string_input as a parameter
create an empty list called uppercase
loop through every character in string_input
[in the loop] if it is an uppercase letter, add it to the uppercase list
[out of the loop] if the length of the uppercase list is more than 0
[in the if] return the list characters all joined together with nothing as the separator ("")
[in the else] otherwise, return "No uppercases found"
[out of the function] get a test_string and store it in a variable
get the uppercase_letters from test_string
print the uppercase_letters to the user
There are shorter (and more complex) ways to do this, but this is just a way that is easier for beginners to understand.
Also: you may want to fix your spelling, because it makes code harder to read and understand, and also makes it more difficult to type the name of that misspelled identifier. For example, upppercase and itterative should be uppercase and iterative.
Something simple like this would work:
s = "My Word"
s = ''.join(ch for ch in s if ch.isupper())
return(s)
Inverse idea behind other StackOverflow question: Removing capital letters from a python string
The return statement in a function will stop the function from executing. When it finds an uppercase letter, it will see the return statement and stop.
One way to do this is to append letters to list and return them at the end:
def finding_uppercase_iterative(string_input):
letters = []
for i in range(len(string_input)):
if string_input[i].isupper():
letters.append(string_input[i])
if letters:
return letters
return "No uppercases found"

Looping and Lists - BaSe fOO ThE AttAcK

In the war against Skynet, humans are trying to pass messages to each other without the computers realising what's happening.
To do this, they are using a simple code:
They read the words in reverse order They only pay attention to the words in the message that start with an uppercase letter So, something like:
BaSe fOO ThE AttAcK contains the message:
attack the base
However, the computers have captured you and forced you to write a program so they can understand all the human messages (we won't go into what terrible tortures you've undergone). Your program must work as follows:
soMe SuPPLies liKE Ice-cREAm aRe iMPORtant oNly tO THeir cReaTORS. tO DestroY thEm iS pOInTLess.
code: soMe SuPPLies liKE Ice-cREAm aRe iMPORtant oNly tO THeir cReaTORS. tO DestroY thEm iS pOInTLess.
says: destroy their ice-cream supplies ​
Notice that, as well as extracting the message, we make every word lowercase so it's easier to read.
Could you please help me with my code? This is my code so far:
output=[]
b=0
d=0
code=input("code: ")
code=code.split()
print(code)
a=len(code)
print(a)
while b<a:
c=code[b]
if c.isupper:
output.append(c)
b=b+1
elif c.islower:
b=b+1
else:
b=b+1
print(output)
I need the last line to say "BaSe ThE AttAck" eliminating "fOO" and I will be reversing the string in the last step to make sense, but it is not differentiating between a lowercase word and an uppercase word.
I have rewritten your code.
#code=input("code: ")
code = "soMe SuPPLies liKE Ice-cREAm aRe iMPORtant oNly tO THeir cReaTORS. tO DestroY thEm iS pOInTLess"
code=code.split()
output = []
for word in reversed(code): #iterate over the list in reverse
if word[0].isupper(): #check if the FIRST letter (word[0]) is uppercase.
output.append(word.lower()) #append word in lowercase to list.
output = " ".join(output) #join the elements of the list together in a string seperated by a space " "
print(output)
output
destroy their ice-cream supplies
Here's My answer, Tested on grok learning and green across the board:
code = input('code: ')
code=code.split()
output = []
for word in reversed(code):
if word[0].isupper():
output.append(word.lower())
output = " ".join(output)
print('says:', output)
There are two issues with your code:
isupper and islower are methods, i.e. you need to call them by writing ().
c.isupper() will check if the entire word is upper-case. However, your problem description says to just consider the first character of each word. Hence, try using c[0].isupper().
Now, after that has been fixed, you're still left with reversing the output list (and making each word lowercase) but I suppose you didn't get to that just yet. :-)

How to think about generating all words and retreving the best word given letters for an input

I decided to write a little application in python to help me learn to type using the dvorak keyboard layout. In my algorithms class, we discussed trees, and tries, and implemented an autocomplete function.
I grabbed a word list from this site. Then I loaded all the words in it into a trie, (which surprisingly only took about a third of a second) and now I am trying to figure out how to make words that are relevant.
I currently am maintaining a priority queue to keep track of which letters the user is typing wrongly the most, and so I remove say 3 letters from this queue to start. If I wanted all the words that started with each of these letters, I could do this, and then probably just filter out all words that don't have any of the other letters that the user types wrongly the most.
is it possible to efficiently (or maybe even not efficiently) get a list of all words with the letters from the priority queue in them, and then filter out so that I get the word that will be the biggest challenge to the typer?
I was able to do this with characters, but the words present an interesting challenge, because the nature of the trie only gets words that have prefixes that start with the letters we have in the queue.
Do you need a trie here at all? I think you either don't need any advanced structure, or you need something else.
How much words do you want to process? If it takes only a third of a second to load them to a trie, then it will take not much longer to just go through all of them and chose whatever you want. You will have to do this every time, but if it's just 1/3 of a second, it will not be a problem.
You could re-calculate the TRIE to hold all the sub strings (on top of the real words themselves) as well, where the end of the sub string points to the real word in the TRIE.
This way you can use the code you already have and apply it to sub strings.
Okay. The solution I came up with combined #shapiro-yaacov's answer with code I wrote.
I scrapped the trie, and used a thing with bins for each letter. Each word is put into a bin for each letter, and then the algorithm adds up letters to find which words have the most wanted letters. I also take a 10th of a point away from words for each letter that I don't want, to encourage my program to give reasonable words, because if I simply were to add up all words with the most letters, I would get huge words.
Here is my Words.py:
import string
import random
import operator
class Bin:
"""
A bin is a container that stores words given in a dictionary file.
It is designed to retrieve all words in this file with the given letters.
The words are stored in this container in an array and when new words get added,
the container automatically adds the word to the words list,
and places them into as many bins as need be.
For example,
>>> bin=Bin("words.txt") #get all words from bin.txt
>>>bin.addWord("about")
now, the bins for a, b, o, u, t will have a pointer to "about".
Now immagine the bin has the words "king", "fish", and "dish" in it.
>>> d=bin.getWordWithLetters("sh")
>>> print d
["fish", "dish"]
"""
def __init__(self, wordsFile):
"""initialize the container from the given file,
if None, just initialize an empty container.
"""
self.bins={}
for i in string.ascii_lowercase+".'&": #these are the letters I need.
self.bins[i]=[] #initialize an empty list for each bin.
if wordsFile == None:
return
with open(wordsFile) as words:
for i in words:
self.addWord(i.strip("\n"))
def addWord(self, word):
for i in word:
self.bins[i].append(word) #add the word to the bin for each letter in that word.
def getWordsWithLetters(self, lrs):
"""Gets best word that has the letters lrs in it.
For example, if abcdef is given, and the words [has, babe, shame] are there,
[babe] would be returned because it is the word with the maximum return,
since it contains b,a,e."""
words=[]
for i in lrs:
words+=self.bins[i]
#Now we go through the words, and calculate the score of each word.
#a score is calculated by adding up the number of times a letter from lrs appears in each word.
# Then we will subtract out the number of
for index, item in enumerate(words):
score=random.randint(0,10) #give some randomness for the typing thing.
#print(score)
#score = 0 #to make it deterministic.
base=score
itCounts={}
for i in lrs:
itCounts[i]=False
for letter in item:
if letter in lrs and (not itCounts[letter]):
score+=1
itCounts[letter]= True
else:
score-=.1
words[index] = (item, score)
words = sorted(words, key=operator.itemgetter(1), reverse=True)
w=[]
for i in words:
if i[1] > base:
w.append(i[0])
return w[:50]

Using Python to check words

I'm stuck on a simple problem. I've got a dictionary of words in the English language, and a sample text that is to be checked. I've got to check every word in the sample against the dictionary, and the code I'm using is wrong.
for word in checkList: # iterates through every word in the sample
if word not in refDict: # checks if word is not in the dictionary
print word # just to see if it's recognizing misspelled words
The only problem is, as it goes through the loop it prints out every word, not just the misspelled ones. Can someone explain this and offer a solution possibly? Thank you so much!
The snippet you have is functional. See for example
>>> refDict = {'alpha':1, 'bravo':2, 'charlie':3, 'delta':4}
>>> s = 'he said bravo to charlie O\'Brian and jack Alpha'
>>> for word in s.split():
... if word not in refDict:
... print(repr(word)) # by temporarily using repr() we can see exactly
... # what the words are like
...
'he'
'said'
'to'
"O'Brian"
'and'
'jack'
'Alpha' # note how Alpha was not found in refDict (u/l case difference)
Therefore, the dictionary contents must differ from what you think, or the words out of checklist are not exactly as they appear (eg. with whitespace or capitalization; see the use of repr() (*) in print statement to help identify cases of the former).
Debugging suggestion: FOCUS on the first word from checklist (or the first that you suspect is to be found in dictionary). Then for this word and this word only, print it in details, with its length, with bracket on either side etc., for both the word out of checklist and the corresponding key in the dictionary...
(*) repr() was a suggestion from John Machin. Instead I often use brackets or other characters as in print('[' + word + ']'), but repr() is more exacting in its output.
Consider stripping your words of any whitespace that might be there, and changing all the words of both sets to the same case. Like this:
word.strip().lower()
That way you can make sure you're comparing apples to apples.
Clearly "word not in refDict" always evaluates to True. This is probably because the contents of refDict or checkList are not what you think they are. Are they both tuples or lists of strings?
The code you have would work if the keys in refDict are the correctly spelt words. If the correctly spelt words are the values in your dict then you need something like this:
for word in checkList:
if word not in refDict.values():
print word
Is there a reason you dictionary is stored as a mapping as opposed to a list or a set? A python dict contains name-value pairs for example I could use this mapping: {"dog":23, "cat":45, "pony":67} to store an index of a word and page number it is found in some book. In your case your dict is a mapping of what to what?
Are the words in the refDict the keys or the values?
Your code will only see keys: e.g.:
refDict = { 'w':'x', 'y':'z' }
for word in [ 'w','x','y','z' ]:
if word not in refDict:
print word
prints:
x
z
Othewise you want;
if word not in refDict.values()
Of course this rather assumes that your dictionary is an actual python dictionary which seems an odd way to store a list of words.
Your refDict is probably wrong. The in keyword checks if the value is in the keys of the dictionary. I believe you've put your words in as values.
I'd propose using a set instead of a dictionary.
knownwords = set("dog", "cat")
knownwords.add("apple")
text = "The dog eats an apple."
for word in text.split(" "):
# to ignore case word is converted to lowercase
if word.lower() not in knownwords:
print word
# The
# eats
# an
# apple. <- doesn't work because of the dot

Categories