I'm trying to take a string input, like a sentence, and find all the words that have their reverse words in the sentence. I have this so far:
s = "Although he was stressed when he saw his desserts burnt, he managed to stop the pots from getting ruined"
def semordnilap(s):
s = s.lower()
b = "!##$,"
for char in b:
s = s.replace(char,"")
s = s.split(' ')
dict = {}
index=0
for i in range(0,len(s)):
originalfirst = s[index]
sortedfirst = ''.join(sorted(str(s[index])))
for j in range(index+1,len(s)):
next = ''.join(sorted(str(s[j])))
if sortedfirst == next:
dict.update({originalfirst:s[j]})
index+=1
print (dict)
semordnilap(s)
So this works for the most part, but if you run it, you can see that it's also pairing "he" and "he" as an anagram, but it's not what I am looking for. Any suggestions on how to fix it, and also if it's possible to make the run time faster, if I was to input a large text file instead.
You could split the string into a list of words and then compare lowercase versions of all combinations where one of the pair is reversed. Following example uses re.findall() to split the string into a list of words and itertools.combinations() to compare them:
import itertools
import re
s = "Although he was stressed when he saw his desserts burnt, he managed to stop the pots from getting ruined"
words = re.findall(r'\w+', s)
pairs = [(a, b) for a, b in itertools.combinations(words, 2) if a.lower() == b.lower()[::-1]]
print(pairs)
# OUTPUT
# [('was', 'saw'), ('stressed', 'desserts'), ('stop', 'pots')]
EDIT: I still prefer the solution above, but per your comment regarding doing this without importing any packages, see below. However, note that str.translate() used this way may have unintended consequences depending on the nature of your text (like stripping # from email addresses) - in other words, you may need to deal with punctuation more carefully than this. Also, I would typically import string and use string.punctuation rather than the literal string of punctuation characters I am passing to str.translate(), but avoided that below in keeping with your request to do this without imports.
s = "Although he was stressed when he saw his desserts burnt, he managed to stop the pots from getting ruined"
words = s.translate(None, '!"#$%&\'()*+,-./:;<=>?#[\]^_`{|}~').split()
length = len(words)
pairs = []
for i in range(length - 1):
for j in range(i + 1, length):
if words[i].lower() == words[j].lower()[::-1]:
pairs.append((words[i], words[j]))
print(pairs)
# OUTPUT
# [('was', 'saw'), ('stressed', 'desserts'), ('stop', 'pots')]
Related
I am trying to write a CLI for generating python classes. Part of this requires validating the identifiers provided in user input, and for python this requires making sure that identifiers conform to the pep8 best practices/standards for identifiers- classes with CapsCases, fields with all_lowercase_with_underscores, packages and modules with so on so fourth-
# it is easy to correct when there is a identifier
# with underscores or whitespace and correcting for a class
def package_correct_convention(item):
return item.strip().lower().replace(" ","").replace("_","")
But when there is no whitespaces or underscores between tokens, I'm not sure how to how to correctly capitalize the first letter of each word in an identifier. Is it possible to implement something like that without using AI or something like that:
say for example:
# providing "ClassA" returns "classa" because there is no delimiter between "class" and "a"
def class_correct_convention(item):
if item.count(" ") or item.count("_"):
# checking whether space or underscore was used as word delimiter.
if item.count(" ") > item.count("_"):
item = item.split(" ")
elif item.count(" ") < item.count("_"):
item = item.split("_")
item = list(map(lambda x: x.title(), item))
return ("".join(item)).replace("_", "").replace(" ","")
# if there is no white space, best we can do it capitalize first letter
return item[0].upper() + item[1:]
Well, with AI-based approach it will be difficult, not perfect, a lot of work. If it does not worth it, there is maybe simpler and certainly comparably efficient.
I understand the worst scenario is "todelineatewordsinastringlikethat".
I would recommend you to download a text file for english language, one word by line, and to proceed this way:
import re
string = "todelineatewordsinastringlikethat"
#with open("mydic.dat", "r") as msg:
# lst = msg.read().splitlines()
lst = ['to','string','in'] #Let's say the dict contains 3 words
lst = sorted(lst, key=len, reverse = True)
replaced = []
for elem in lst:
if elem in string: #Very fast
replaced_str = " ".join(replaced) #Faster to check elem in a string than elem in a list
capitalized = elem[0].upper()+elem[1:] #Prepare your capitalized word
if elem not in replaced_str: #Check if elem could be a substring of something you replaced already
string = re.sub(elem,capitalized,string)
elif elem in replaced_str: #If elem is a sub of something you replaced, you'll protect
protect_replaced = [item for item in replaced if elem in item] #Get the list of replaced items containing the substring elem
for protect in protect_replaced: #Uppercase the whole word to protect, as we do a case sensitive re.sub()
string = re.sub(protect,protect.upper(),string)
string = re.sub(elem,capitalized,string)
for protect in protect_replaced: #Deprotect by doing the reverse, full uppercase to capitalized
string = re.sub(protect.upper(),protect,string)
replaced.append(capitalized) #Append replaced element in the list
print (string)
Output:
TodelIneatewordsInaStringlikethat
#You see that String has been protected but not delIneate, cause it was not in our dict.
This is certainly not optimal, but will perform certainly comparably to AI for a problem which would certainly not be presented as it is for AI anyway (input prep are very important in AI).
Note it is important to reverse sort the list of words. Cause you want to detect full string words first, not sub. Like in beforehand you want the full one, not before or and.
I want to count only words of a dictionary.
For example :
There is a text :
Children can bye (paid) by credit card.
I want to count just paid.
But my code counts (paid).
import re, sys
d = {}
m = "children can bye (paid) by credit card."
n = m.split()
for i in n:
d[i] = 0
for j in n:
d[j] = d[j] + 1
Is there any advice ?
You can split the string with the following regex to split by nonword chars:
import re
n = re.split('\W+', m)
You can check the syntax here.
You just need to remove the punctuation from your individual tokens. Assuming you want to remove all the punctuation, take a look at the string module. Then (for example), you can go through each token and remove the punctuation. You can do this with one list comprehension:
words = [''.join(ch for ch in token if ch not in string.punctuation)
for token in m.split()]
All this code does is run through each character (ch) in each token (the results of m.split()). It allows all characters except it'll strip out any characters in string.punctuation. Of course if you want a different set of characters (say, maybe you want to allow apostrophes), you can just define that set of characters and use that instead.
I am trying to count the number of times punctuation characters appear in a novel. For example, I want to find the occurrences of question marks and periods along with all the other non alphanumeric characters. Then I want to insert them into a csv file. I am not sure how to do the regex because I don't have that much experience with python. Can someone help me out?
texts=string.punctuation
counts=dict(Counter(w.lower() for w in re.findall(r"\w+", open(cwd+"/"+book).read())))
writer = csv.writer(open("author.csv", 'a'))
writer.writerow([counts.get(fieldname,0) for fieldname in texts])
In [1]: from string import punctuation
In [2]: from collections import Counter
In [3]: counts = Counter(open('novel.txt').read())
In [4]: punctuation_counts = {k:v for k, v in counts.iteritems() if k in punctuation}
from string import punctuation
from collections import Counter
with open('novel.txt') as f: # closes the file for you which is important!
c = Counter(c for line in f for c in line if c in punctuation)
This also avoids loading the whole novel into memory at once.
Btw this is what string.punctuation looks like:
>>> punctuation
'!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
You may want to add or detract symbols from here depending on your needs.
Also Counter defines a __missing__ with simply does return 0. So instead of down-initialising it into a dictionary and then calling .get(x, 0). Just leave it as a counter and access it like c[x], if it doesn't exist, its count is 0. I'm not sure why everybody has the sudden urge to downgrade all their Counters into dicts just because of the scary looking Counter([...]) you see when you print one, when in fact Counters are dictionaries too and deserve respect.
writer.writerow([counts.get(c, 0) for c in punctuation])
If you leave your counter you can just do this:
writer.writerow([counts[c] for c in punctuation])
and that was much easier.
import re
def count_puncts(x):
# sub. punct. with '' and returns the new string with the no. of replacements.
new_str, count = re.subn(r'\W', '', x)
return count
The code you have is very close to what you'd need if you were counting words. If you were trying to count words, the only modification you'd have to make would probably be to change the last line to this:
writer.writerows(counts.items())
Unfortunately, you're not trying to count words here. If you're looking for counts of single characters, I'd avoid using regular expressions and go straight to count. Your code might look like this:
book_text = open(cwd+"/"+book).read()
counts = {}
for character in texts:
counts[character] = book_text.count(character)
writer.writerows(counts.items())
As you might be able to tell, this makes a dictionary with the characters as keys and the number of times that character appears in the text as the value. Then we write it as we would have done for counting words.
Using curses:
import curses.ascii
str1 = "real, and? or, and? what."
t = (c for c in str1 if curses.ascii.ispunct(c))
d = dict()
for p in t:
d[p] = 1 if not p in d else d[p] + 1 for p in t
I'm trying to write a function process(s,d) to replace abbreviations in a string with their full meaning by using a dictionary. where s is the string input and d is the dictionary. For example:
>>>d = {'ASAP':'as soon as possible'}
>>>s = "I will do this ASAP. Regards, X"
>>>process(s,d)
>>>"I will do this as soon as possible. Regards, X"
I have tried using the split function to separate the string and compare each part with the dictionary.
def process(s):
return ''.join(d[ch] if ch in d else ch for ch in s)
However, it returns me the same exact string. I have a suspicion that the code doesn't work because of the full stop behind ASAP in the original string. If so, how do I ignore the punctuation and get ASAP to be replaced?
Here is a way to do it with a single regex:
In [24]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}
In [25]: s = 'I will do this ASAP, AFAIK. Regards, X'
In [26]: re.sub(r'\b' + '|'.join(d.keys()) + r'\b', lambda m: d[m.group(0)], s)
Out[26]: 'I will do this as soon as possible, as far as I know. Regards, X'
Unlike versions based on str.replace(), this observes word boundaries and therefore won't replace abbreviations that happen to appear in the middle of other words (e.g. "etc" in "fetch").
Also, unlike most (all?) other solutions presented thus far, it iterates over the input string just once, regardless of how many search terms there are in the dictionary.
You can do something like this:
def process(s,d):
for key in d:
s = s.replace(key,d[key])
return s
Here is a working solution: use re.split(), and split by word boundaries (preserving the interstitial characters):
''.join( d.get( word, word ) for word in re.split( '(\W+)', s ) )
One significant difference that this code has from Vaughn's or Sheena's answer is that this code takes advantage of the O(1) lookup time of the dictionary, while their solutions look at every key in the dictionary. This means that when s is short and d is very large, their code will take significantly longer to run. Furthermore, parts of words will still be replaced in their solutions: if d = { "lol": "laugh out loud" } and s="lollipop" their solutions will incorrectly produce "laugh out loudlipop".
use regular expressions:
re.sub(pattern,replacement,s)
In your application:
ret = s
for key in d:
ret = re.sub(r'\b'+key+r'\b',d[key],ret)
return ret
\b matches the beginning or end of a word. Thanks Paul for the comment
Instead of splitting by spaces, use:
split("\W")
It will split by anything that's not a character that would be part of a word.
python 3.2
[s.replace(i,v) for i,v in d.items()]
This is string replacement as well (+1 to #VaughnCato). This uses the reduce function to iterate through your dictionary, replacing any instances of the keys in the string with the values. s in this case is the accumulator, which is reduced (i.e. fed to the replace function) on every iteration, maintaining all past replacements (also, per #PaulMcGuire's point above, this replaces keys starting with the longest and ending with the shortest).
In [1]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}
In [2]: s = 'I will do this ASAP, AFAIK. Regards, X'
In [3]: reduce(lambda x, y: x.replace(y, d[y]), sorted(d, key=lambda i: len(i), reverse=True), s)
Out[3]: 'I will do this as soon as possible, as far as I know. Regards, X'
As for why your function didn't return what you expected - when you iterate through s, you are actually iterating through the characters of the string - not the words. Your version could be tweaked by iterating over s.split() (which would be a list of the words), but you then run into an issue where the punctuation is causing words to not match your dictionary. You can get it to match by importing string and stripping out string.punctuation from each word, but that will remove the punctuation from the final string (so regex would be likely be the best option if replacement doesn't work).
So I've been learning Python for some months now and was wondering how I would go about writing a function that will count the number of times a word occurs in a sentence. I would appreciate if someone could please give me a step-by-step method for doing this.
Quick answer:
def count_occurrences(word, sentence):
return sentence.lower().split().count(word)
'some string.split() will split the string on whitespace (spaces, tabs and linefeeds) into a list of word-ish things. Then ['some', 'string'].count(item) returns the number of times item occurs in the list.
That doesn't handle removing punctuation. You could do that using string.maketrans and str.translate.
# Make collection of chars to keep (don't translate them)
import string
keep = string.lowercase + string.digits + string.whitespace
table = string.maketrans(keep, keep)
delete = ''.join(set(string.printable) - set(keep))
def count_occurrences(word, sentence):
return sentence.lower().translate(table, delete).split().count(word)
The key here is that we've constructed the string delete so that it contains all the ascii characters except letters, numbers and spaces. Then str.translate in this case takes a translation table that doesn't change the string, but also a string of chars to strip out.
wilberforce has the quick, correct answer, and I'll give the long winded 'how to get to that conclusion' answer.
First, here are some tools to get you started, and some questions you need to ask yourself.
You need to read the section on Sequence Types, in the python docs, because it is your best friend for solving this problem. Seriously, read it. Once you have read that, you should have some ideas. For example you can take a long string and break it up using the split() function. To be explicit:
mystring = "This sentence is a simple sentence."
result = mystring.split()
print result
print "The total number of words is: " + str(len(result))
print "The word 'sentence' occurs: " + str(result.count("sentence"))
Takes the input string and splits it on any whitespace, and will give you:
["This", "sentence", "is", "a", "simple", "sentence."]
The total number of words is 6
The word 'sentence' occurs: 1
Now note here that you do have the period still at the end of the second 'sentence'. This is a problem because 'sentence' is not the same as 'sentence.'. If you are going to go over your list and count words, you need to make sure that the strings are identical. You may need to find and remove some punctuation.
A naieve approach to this might be:
no_period_string = mystring.replace(".", " ")
print no_period_string
To get me a period-less sentence:
"This sentence is a simple sentence"
You also need to decide if your input going to be just a single sentence, or maybe a paragraph of text. If you have many sentences in your input, you might want to find a way to break them up into individual sentences, and find the periods (or question marks, or exclamation marks, or other punctuation that ends a sentence). Once you find out where in the string the 'sentence terminator' is you could maybe split up the string at that point, or something like that.
You should give this a try yourself - hopefully I've peppered in enough hints to get you to look at some specific functions in the documentation.
Simplest way:
def count_occurrences(word, sentence):
return sentence.count(word)
text=input("Enter your sentence:")
print("'the' appears", text.count("the"),"times")
simplest way to do it
Problem with using count() method is that it not always gives the correct number of occurrence when there is overlapping, for example
print('banana'.count('ana'))
output
1
but 'ana' occurs twice in 'banana'
To solve this issue, i used
def total_occurrence(string,word):
count = 0
tempsting = string
while(word in tempsting):
count +=1
tempsting = tempsting[tempsting.index(word)+1:]
return count
You can do it like this:
def countWord(word):
numWord = 0
for i in range(1, len(word)-1):
if word[i-1:i+3] == 'word':
numWord += 1
print 'Number of times "word" occurs is:', numWord
then calling the string:
countWord('wordetcetcetcetcetcetcetcword')
will return: Number of times "word" occurs is: 2
def check_Search_WordCount(mySearchStr, mySentence):
len_mySentence = len(mySentence)
len_Sentence_without_Find_Word = len(mySentence.replace(mySearchStr,""))
len_Remaining_Sentence = len_mySentence - len_Sentence_without_Find_Word
count = len_Remaining_Sentence/len(mySearchStr)
return (int(count))
I assume that you just know about python string and for loop.
def count_occurences(s,word):
count = 0
for i in range(len(s)):
if s[i:i+len(word)] == word:
count += 1
return count
mystring = "This sentence is a simple sentence."
myword = "sentence"
print(count_occurences(mystring,myword))
explanation:
s[i:i+len(word)]: slicing the string s to extract a word having the same length with the word (argument)
count += 1 : increase the counter whenever matched.