How to understand the flaw in my simple three part python code? - python

My Python exercise in 'classes' is as follows:
You have been recruited by your friend, a linguistics enthusiast, to create a utility tool that can perform analysis on a given piece of text. Complete the class "analyzedText" with the following methods:
Constructor (_init_) - This method should take the argument text, make is lowercase and remove all punctuation. Assume only the following punctuation is used: period (.), exclamation mark (!), comma (,), and question mark (?). Assign this newly formatted text to a new attribute called fmtText.
freqAll - This method should create and return dictionary of all unique words in the text along with the number of times they occur in the text. Each key in the dictionary should be the unique word appearing in the text and the associated value should be the number of times it occurs in the text. Create this dictionary from the fmtText attribute.
This was my code:
class analysedText(object)
def __init__ (self, text):
formattedText = text.replace('.',' ').replace(',',' ').replace('!',' ').replace('?',' ')
formattedText = formattedText.lower()
self.fmtText = formattedText
def freqAll(self):
wordList = self.fmtText.split(' ')
wordDict = {}
for word in set(wordList):
wordDict[word] = wordList(word)
return wordDict
I get errors on both of these and I can't seem to figure it out after a lot of little adjustments. I suspect the issue in the first part is when I try to assign a value to the newly formatted text but I cannot think of a workable solution. As for the second part, I am at a complete loss - I was wrongfully confident my answer was correct but I received a fail error when I ran it through the classroom's code cell to test it.

On the assumption that by 'errors' you mean a TypeError, this is caused because of line 13, wordDict[word] = wordList(word).
wordList is a list, and by using the ()/brackets you're telling Python that you want to call that list as a function. Which it cannot do.
According to your task, you are to instead find the occurrences of words in the list, which you could achieve with the .count() method. This method basically returns the total number of occurrences of an element in a list. (Feel free to read more about it here)
With this modification, (this is assuming you want wordDict to contain a dictionary with the word as the key, and the occurrence as the value) your freqAll function would look something like this:
def freqAll(self):
wordList = self.fmtText.split()
wordDict = {}
for word in set(wordList):
wordDict[word] = wordList.count(word) # wordList.count(word) returns the number of times the string word appears as an element in wordList
return wordDict
Although you could also achieve this same task with a class known as collections.Counter, (of course this means you have to import collections) which you can read more about here

Related

Using difflib.get_close_matches to replace word in string - Python

If difflib.get_close_matches can return a single close match. Where I supply the sample string and close match. How can I utilize the 'close match' to replace the string token found?
# difflibQuestion.py
import difflib
word = ['Summerdalerise', 'Winterstreamrise']
line = 'I went up to Winterstreamrose.'
result = difflib.get_close_matches(line,word,n=1)
print(result)
Output:
['Winterstreamrise']
I want to produce the line:
I went up to Winterstreamrise.
For many lines and words.
I have checked the docs
can't find any ref to string index of found match difflib.getget_close_matches
the other module classes & functions return lists
I Googled "python replace word in line using difflib" etc. I can't find any reference to anyone else asking/writing about it. It would seem a common scenario to me.
This example is of course a simplified version of my 'real world' scenario. Which may be of help. Since I am dealing more with table data (rather than line)
Surname, First names, Street Address, Town, Job Description
And my 'words' are a large list of street base names eg MAIN, EVERY, EASY, LOVERS (without the Road, Street, Lane) So my difflib.get_close_matches could be used to substitute the string of column x 'line' with the closest match 'word'.
However I would appreciate anyone suggesting an approach to either of these examples.
You could try something like this:
import difflib
possibilities = ['Summerdalerise', 'Winterstreamrise']
line = 'I went up to Winterstreamrose.'
newWords = []
for word in line.split():
result = difflib.get_close_matches(word, possibilities, n=1)
newWords.append(result[0] if result else word)
result = ' '.join(newWords)
print(result)
Output:
I went up to Winterstreamrise
Explanation:
The docs show a first argument named word, and there is no suggestion that get_close_matches() has any awareness of sub-words within this argument; rather, it reports on the closeness of a match between this word atomically and the list of possibilities supplied as the second argument.
We can add the awareness of words within line by splitting it into a list of such words which we iterate over, calling get_close_matches() for each word separately and modifying the word in our result only if there is a match.

Can't get an acronym generating function to work for the last input word on python

I'm trying to write a function that takes a string of words as an input and prints the first letter of each word, upper-cased, as an acronym. the closest I have got is writing this function that works only for the first two words - how can I get it to work for every word in the list, no matter how many words separated by a space the input string contains?
Here's the code I am running:
def fxn(stng):
out=stng[0]
for i in range(1, len(stng)):
if stng[i-1]==' ':
out+=stng[i]
out=out.upper()
return out
input1=input()
print(fxn(input1))
This is an example of an input and the output I'm currently getting. I would expect it to be SOS.
save our souls
SO
The problem with your code is that it doesn't go through your whole string. As soon as it arrives on a new word (it goes into your if statement), it adds the initial of this word and immediately exits the function.
So, you have to go through your whole string (ie your for loop must end) then you return the result.
def fxn(stng):
out=stng[0]
for i in range(1, len(stng)):
if stng[i-1]==' ':
out+=stng[i]
out=out.upper()
# return outside of for loop
return out
input1=input()
print(fxn(input1))
There is a lot easier way to go about this, which is to split your string into words based on the space delimiter and then take the first of each:
first_letters = [w[0].upper() for w in input_str.split(' ')]
output = "".join(first_letters)
All this is doing is using the split function to split up your string into words, and then for each word, in a list comprehension, it is taking the first letter (w[0]), upper-casing it and saving it to a list. Then we can use join to concatenate them together.

function for word frequency + dictionary

I am trying to create a function to take in a string and return how many times a word in it has been used (with the word) as a dictionary. I also want it to look for a specific list of words to search up the string when provided and return the frequency of the words in the given list found in the string.
Example,
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc))
should return
{'i':1 , 'went':1, 'to':2, 'school':1, 'today':1, 'learn':1}
And,
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc,wordlist=["I", "feel", "Great"]))
should return
{'i':1, 'feel':0, 'great':0}
This is what I have so far
def wordfunc(stringfunc,wordlist=[]):
count_dict = dict()
stringfunc=stringfunc.lower() # i want it to be case insensitive
word = stringfunc.split()
for i in range(len(word)):
x = ord(word[i][-1]) # in the next few lines I am trying to get rid of special characters
if (not(x>=97 and x<=112) or (x>=65 and x<= 90)):
word[i]=word[i][:-1] # if a word ends with , or ! i want it to discount last character
for i in wordlist:
if (i not in word):
count_dict[i]=0
else:
count_dict[i]=word.count(i)
return count_dict
When I try
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc,wordlist=["I", "feel", "Great"]))
I get
{'I':1, 'feel':0, 'Great':0} # i can't get a lower case i don't know why
and when I try
stringfunc = "I went to school today, to learn!"
print(wordfunc(stringfunc))
I get an empty dictionary {}
Can you help me identify my error? Thanks!
You "can't get lower case" because you didn't program it. If the input supplies wordlist, then you blithely accept whatever is there. In the given case, you have two words capitalized, so that's what comes out. Instead, you need to convert every element of wordlist to lower case, just as you did with the input string.
BTW, do not give misleading names to variables: stringfunc is not a function.
The main loop will be much easier to read if you quit playing games with ASCII code values. Instead, simply use isletter. If this is new to you, then I strongly recommend that you repeat your tutorial on string processing; you missed some useful things that you will now recognize.
That said, also look up the collections package, notably the Counter type. Once you've cleaned out all but letters and spaces in your input string, you can do the main processing with
count_dict = Counter(stringfunc.split())

Replacing a list of words with a certain word in python [duplicate]

This question already has answers here:
How to replace multiple substrings of a string?
(28 answers)
Closed 2 years ago.
For say if I have a paragraph and I wanna find and replace certain words in it with one certain word.
And I'm trying to do this using a for loop, after defining my word list.
Here's my code
script = """ In this sense, netting can represent , which gives Howie return on Zachary."""
ROE = ["In", "this"] #the word list I'm defining (the list of words I want it replaced)
for ROE in script:
script.replace(ROE, "ROE")
#desired output = ROE ROE sense, netting can represent , which gives Howie return on Zachary.
It doesn't really work, can someone help me fix it?
You have several problems:
You're not looping over the list of words to replace, you're looping over the characters in script.
You're not assigning the result of replace anywhere. It's not an in-place operation, since strings are immutable.
You're reassigning the ROE variable.
for word in ROE:
script = script.replace(word, 'ROE')
Note that replace() doesn't know anything about word boundaries. Your code will convert Inside to ROEside. If you want better, you can use regular expressions and wrap the words in \b boundaries. A regular expression would also allow you to perform all the replacements at once.
import re
regex = re.compile(r'\b(?:' + '|'.join(re.escape(word) for word in ROE) + r')\b')
script = regex.sub('ROE', script)
This creates a regular expression \b(?:In|this)\b, which matches either word.
The string str data type in Python is immutable. This means that if you want to change a string, you basically have to create a new string that has the changes and then you can assign the result to a variable.
Of course, you can assign the result to the same variable the original string was assigned to, which may have had the last reference to the old string, causing it to get cleaned up. But for a brief moment, there will always be a new copy of the string.
For example:
s = 'Hello'
s += ' world!'
print(s)
This seem to add ' world!' onto the existing s with 'Hello', but it really just creates a new string 'Hello world!' and assigns that to s, replacing the old one.
In your case, this explains why you can't just call .replace() on a string and expect it to change. Instead, that method returns the new string you want and you can assign it to a variable:
script = """ In this sense, netting can represent , which gives Howie return on Zachary."""
roe = ["In", "this"]
for word_to_replace in roe:
script = script.replace(word_to_replace, 'ROE')
(note that there were some other issues as well, but the above should work)
I found a solution that is relatively easy
stopwords=['In','this','to']
for i in stopwords:
n=a.replace(i,'ROE')
a=n
and I was helped by this link: Removing list of words from a string
script = """ In this sense, netting can represent , which gives Howie return on Zachary."""
ROE = ["In", "this"] #the word list I'm defining (the list of words I want it replaced)
for word in ROE:
script = script.replace(word, "ROE")
print(script)
Output:
ROE ROE sense, netting can represent , which gives Howie return on Zachary.
i.e. identical with your desired one.

Remove strings containing words from list, without duplicate strings

I'm trying to get my code to extract sentences from a file that contain certain words. I have the code seen here below:
import re
f = open('RedCircle.txt', 'r')
text = ' '.join(f.readlines())
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)
def finding(q):
for item in sentences:
if item.lower().find(q.lower()) != -1:
list.append(item)
for sentence in list:
outfile.write(sentence+'\r\n')
finding('cats')
finding('apples')
finding('doggs')
But this will of course give me (in the outfile) three times the same sentence if the sentences is:
'I saw doggs and cats eating apples'
Is there a way to easily remove these duplicates, or make the code so that there will not be any duplicates in the file?
There are few options in Python that you can leverage to remove duplicate elements (In this case I believe its sentence).
Using Set.
Using itertools.groupby
OrderedDict as an OrderedSet, if Order is important
All you need to do, is to collect the result in a single list and use the links provided in this answer, to create your own recipe to remove duplicates.
Also instead of dumping the result after each search to the file, defer it until all duplicates has been removed.
Few Suggestive Changes
Using Sets
Convert Your function to a Generator
def finding(q):
return (item for item in sentences
if item.lower().find(q.lower()) != -1)
Chain the result of each search
from itertools import chain
chain.from_iterable(finding(key) for key in ['cats', 'apples'. 'doggs'])
Pass the result to a Set
set(chain.from_iterable(finding(key) for key in ['cats', 'apples'. 'doggs']))
Using Decorators
def uniq(fn):
uniq_elems = set()
def handler(*args, **kwargs):
uniq_elems.update(fn(*args, **kwargs))
return uniq_elems
return handler
#uniq
def finding(q):
return (item for item in sentences
if item.lower().find(q.lower()) != -1)
If Order is Important
Change the Decorator to use OrderedDict
def uniq(fn):
uniq_elems = OrderedDict()
def handler(*args, **kwargs):
uniq_elems.update(uniq_elems.fromkeys(fn(*args, **kwargs)))
return uniq_elems.keys()
return handler
Note
Refrain from naming variables that conflicts with reserve words in Python (like naming the variable as list)
Firstly, does the order matter?
Second, should duplicates appear if they're actually duplicated in the original text file?
If no to the first and yes to the second:
If you rewrite the function to take a list of search strings and iterate over that (such that it checks the current sentence for each of the words you're after), then you could break out of the loop once you find it.
If yes to the first and yes to the second,
Before adding an item to the list, check whether it's already there. Specifically, keep a note of which list items you've passed in the original text file and which is going to be the next one you'll see. That way you don't have to check the whole list, but only a single item.
A set as Abhijit suggests would work if you answer no to the first question and yes to the second.

Categories