I am iterating through a .txt file and trying to find the palindromic phrases in it, but when I run this it only prints an empty list.
file = open("dictionary.txt", "r")# Load digital dictionary as a list of words
def find_palingram():
palingram_list = [] # Start an empty list to hold palingrams
for word in file: # For word in list
word = word.split()
end = len(word) # Get length of word
rev_word = word[::-1]
if(end > 1):#If Length > 1
for i in range(end): # Loop through the letters in the word
"""If reversed word fragment at front of word is in word list and letters after form a
palindromic sequence"""
if(word[i:] == rev_word[:end-i] and rev_word[end-i:] in file):
palingram_list.append(word, rev_word[end-i:])#Append word and reversed word to palingram list
"""If reversed word fragment at end of word is in word list and letters
before form a palindromic sequence"""
if(word[:i] == rev_word[end-i:] and rev_word[:end-i] in file):
palingram_list.append(rev_word[:end-i], word) # Append reversed word and word to palingram list
return palingram_list
file.close()
# Sort palingram list alphabetically
palingram = find_palingram()
palingram_sorted = sorted(palingram)
print(palingram_sorted)
print(file.read())
Checking if a word is a palindrom is really easy:
word[::-1] == word
or, if your definition of palindrom included, say, Eve:
word_lower = word.lower()
word_lower[::-1] == word_lower
So, you program could be reduced to:
def find_palindroms(text):
palindrom_list = []
for line in text:
for word in line.rstrip().split():
word_lower = word.lower() # might be unnecessary
if word_lower[::-1] == word_lower:
palindrom_list.append(word)
return palindrom_list
with open("dictionary.txt", "r") as file:
print(find_palindroms(file))
you should pass the file between the function .Also file.close() will close the file and will never execute since it is in the function..
Related
Here is my code (a palindrome checker which also checks if the input is a word or not by comparing it with a text file). I am trying to make it so that the code only says it's a word if it's actually a word because now parts of words like 'pr' and 'la' are counting as words.
a = 0
with open('wordlist.txt') as file:
contents = file.read()
print('Hi. This program will check if the word that you enetered is a palindrome.')
while a==0:
wordlist = open('wordlist.txt' , 'r+')
index = 0
letters = []
lettersreversed = []
word = input('Enter a word: ')
wordwithspace = word + '\n'
for i in range(len(word)):
letters.append(word[index])
index = index + 1
index1 = len(word) -1
for i in range(len(word)):
lettersreversed.append(word[index1])
index1 = index1 - 1
if letters == lettersreversed and wordwithspace in contents:
print(word, 'is a palindrome!','\n')
elif letters == lettersreversed and wordwithspace not in contents:
print(word, 'is a palindrome however it is not a word.','\n')
elif letters != lettersreversed and wordwithspace in contents:
print(word, 'is not a palindrome however it is a word.','\n')
else:
print(word, 'is not a palindrome.', '\n')
Your contents = file.read() is one big string. So if you check if a string is in contents then that will be true for any sequence of characters in the string, even if they are not a whole word.
Alternatively,
contents = file.read().split()
Now contents is a list of words, so in contents will only be true for a word in that list.
Even better,
contents = set(file.read().split())
Now it is a set of words, which is much faster to check containment.
I've refactored some of your code. Your issue was checking if it's in contents as it's a huge string. Adding the \n doesn't make sure the word doesn't end with the given word (e.g. toast and ast).
First, load your wordlist as a set:
with open('wordlist.txt') as file:
# Strip all lines off whitespaces, and move them into a set for fast lookup.
wordlist = frozenset(map(str.strip, file))
Then ask for a word:
word = input('Enter a word: ')
Lastly, check if it's a palindrome:
if word == word[::-1] and word in contents:
print("Palindrome and a real word!")
All together:
with open('wordlist.txt') as file:
# Strip all lines off whitespaces, and move them into a set for fast lookup.
wordlist = frozenset(map(str.strip, file))
while True:
word = input('Enter a word: ')
if word == word[::-1]:
if word in contents:
print("Palindrome and a real word!")
else:
print("Palindrome and a fake word!")
else:
if word in contents:
print("Not a palindrome but a real word!")
else:
print("Not a palindrome and not a word!")
def is_palendromic(string: str) -> bool:
"""
| The first `if` statement checks if the
| length of the string is even, if it is
| then it splits the string in half.
| If the string is odd then it still splits
| the string but ignores the middle letter
| as it is erelivant.
"""
if len(string) % 2 == 0:
firstpart, secondpart = string[:len(string)//2], string[len(string)//2:]
else:
firstpart = string[0:len(string)//2]
secondpart = string[len(string)//2 if len(string)%2 == 0 else ((len(string)//2)+1):]
# will ignore the middle letter if the string is odd
return firstpart[::-1] == secondpart
"""
[:-1] reverses the string and then compares it to the
second part of the string, the doble equals sign
is used to compare the two strings, returning a
blloean
"""
# example
print(is_palendromic('racecar'))
# will print True
print(is_palendromic('racecar_'))
# will print False
I am working on a small problem for fun, sent to me by a friend. The problem requires me to populate an array with common words from a text file, and then print all the words from this list containing certain characters provided by the user. I am able to populate my array no problem, but it seems the part of the code that actually compares the two lists is not working. Below is the function I've written to compare the 2 lists.
#Function that prompts user for the set of letters to match and then compares that list of letters to each word in our wordList.
def getLetters():
#Prompt user for list of letters and convert that string into a list of characters
string = input("Enter your target letters: ")
letterList = list(string)
#For each word in the wordList, loop through each character in the word and check to see if the character is in our letter list, if it is increase matchCount by 1.
for word in wordList:
matchCount = 0
for char in word:
if char in letterList:
matchCount+=1
#If matchCount is equal to the length of the word, all of the characters in the word are present in our letter list and the word should be added to our matchList.
if matchCount == len(word):
matchList.append(word)
print(matchList)
The code runs just fine, I don't get any error output, but once the user enters their list of letters, nothing happens. To test I've tried a few inputs matching up with words I know are in my wordList (e.g. added, axe, tree, etc). But nothing ever prints after I enter my letter string.
This is how I populate my wordList:
def readWords(filename):
try:
with open(filename) as file:
#Load entire file as string, split string into word list using whitespace as delimiter
s = file.read()
wordList = s.split(" ")
getLetters()
#Error handling for invalid filename. Just prompts the user for filename again. Should change to use ospath.exists. But does the job for now
except FileNotFoundError:
print("File does not exist, check directory and try again. Dictionary file must be in program directory because I am bad and am not using ospath.")
getFile()
Edit: Changed the function to reset matchCount to 0 before it starts looping characters, still no output.
Your code only needs a simple change:
Pass wordList as a parameter for getLetters. Also if you like you could make a change in order to know if all the letters of the word are in the letter list.
def getLetters(wordList):
string = input("Enter your target letters: ")
letterList = list(string)
matchList = []
for word in wordList:
if all([letter in letterList for letter in word]):
matchList.append(word)
return matchList
And in readWords:
def readWords(filename):
try:
with open(filename) as file:
s = file.read()
wordList = s.split(" ")
result = getLetters(wordList)
except FileNotFoundError:
print("...")
else:
# No exceptions.
return result
Edit: add a global declaration to modify your list from inside a function:
wordList = [] #['axe', 'tree', 'etc']
def readWords(filename):
try:
with open(filename) as file:
s = file.read()
global wordList # must add to modify global list
wordList = s.split(" ")
except:
pass
Here is a working example:
wordList = ['axe', 'tree', 'etc']
# Function that prompts user for the set of letters to match and then compares that list of letters to each word in our wordList.
def getLetters():
# Prompt user for list of letters and convert that string into a list of characters
string = input("Enter your target letters: ")
letterList = list(string)
# For each word in the wordList, loop through each character in the word and check to see if the character is in our letter list, if it is increase matchCount by 1.
matchList = []
for word in wordList:
matchCount = 0
for char in word:
if char in letterList:
matchCount += 1
# If matchCount is equal to the length of the word, all of the characters in the word are present in our letter list and the word should be added to our matchList.
if matchCount == len(word):
matchList.append(word)
print(matchList)
getLetters()
output:
Enter your target letters: xae
['axe']
I would like to de-merge hastags from a Twitter dataset. For instance: "#sunnyday" would be "sunny day".
I have found the following code:
The code finds the hastags and looks into a file called "wordlist.txt", which is a huge txt file with a lot of words for some matching words.
The txt. file can be downloaded here:
http://www-personal.umich.edu/~jlawler/wordlist
Source: Term split by hashtag of multiple words
I modified it a bit to make sure that it works if a sentence is empty: " "
# Returns a list of common english terms (words)
def initialize_words():
content = None
with open('wordlist.txt') as f: # A file containing common english words
content = f.readlines()
return [word.rstrip('\n') for word in content]
def parse_sentence(sentence, wordlist):
new_sentence = "" # output
# MODIFICATION: If the sentence is not empty
if sentence != '':
terms = sentence.split(' ')
for term in terms:
# MODIFICATION: If the term is not empty
if term != '':
if term[0] == '#': # this is a hashtag, parse it
new_sentence += parse_tag(term, wordlist)
else: # Just append the word
new_sentence += term
new_sentence += " "
return new_sentence
def parse_tag(term, wordlist):
words = []
# Remove hashtag, split by dash
tags = term[1:].split('-')
for tag in tags:
word = find_word(tag, wordlist)
while word != None and len(tag) > 0:
words.append(word)
if len(tag) == len(word): # Special case for when eating rest of word
break
tag = tag[len(word):]
word = find_word(tag, wordlist)
return " ".join(words)
def find_word(token, wordlist):
i = len(token) + 1
while i > 1:
i -= 1
if token[:i] in wordlist:
return token[:i]
return None
The problem is that it takes for ever to run!
How can I make it faster ?
Use a set instead of a list for your wordlist variable.
This will be a massive performance improvement because with list you need to (potentially) scan the entire word list, so it's O(n). With a set, it's O(1) because membership is checked by calculating a hash of the item and using that as an index into the backing storage.
I'm new to programming and stuck on my current program. I have to read in a story from a file, sort the words, and count the number of occurrences per word. It will count the words, but it won't sort the words, remove the punctuation, or duplicate words. I'm lost to why its not working. Any advice would be helpful.
ifile = open("Story.txt",'r')
fileout = open("WordsKAI.txt",'w')
lines = ifile.readlines()
wordlist = []
countlist = []
for line in lines:
wordlist.append(line)
line = line.split()
# line.lower()
for word in line:
word = word.strip(". , ! ? : ")
# word = list(word)
wordlist.sort()
sorted(wordlist)
countlist.append(word)
print(word,countlist.count(word))
There main problem in your code is at the line (line 9):
wordlist.append(line)
You are appending the whole line into the wordlist, I doubt that is what you want. As you do this, the word added is not .strip()ed before it is added to wordlist.
What you have to do is to add the word only after you have strip()ed it and make sure you only do that after you checked that there are not other same words (no duplicates):
ifile = open("Story.txt",'r')
lines = ifile.readlines()
wordlist = []
countlist = []
for line in lines:
# Get all the words in the current line
words = line.split()
for word in words:
# Perform whatever manipulation to the word here
# Remove any punctuation from the word
word = word.strip(".,!?:;'\"")
# Make the word lowercase
word = word.lower()
# Add the word into wordlist only if it is not in wordlist
if word not in wordlist:
wordlist.append(word)
# Add the word to countlist so that it can be counted later
countlist.append(word)
# Sort the wordlist
wordlist.sort()
# Print the wordlist
for word in wordlist:
print(word, countlist.count(word))
Another way you could do this is using a dictionary, storing the word as they key and the number of occurences as the value:
ifile = open("Story.txt", "r")
lines = ifile.readlines()
word_dict = {}
for line in lines:
# Get all the words in the current line
words = line.split()
for word in words:
# Perform whatever manipulation to the word here
# Remove any punctuation from the word
word = word.strip(".,!?:;'\"")
# Make the word lowercase
word = word.lower()
# Add the word to word_dict
word_dict[word] = word_dict.get(word, 0) + 1
# Create a wordlist to display the words sorted
word_list = list(word_dict.keys())
word_list.sort()
for word in word_list:
print(word, word_dict[word])
You have to provide a key function to the sorting methods.
Try this
r = sorted(wordlist, key=str.lower)
punctuation = ".,!?: "
counts = {}
with open("Story.txt",'r') as infile:
for line in infile:
for word in line.split():
for p in punctuation:
word = word.strip(p)
if word not in counts:
counts[word] = 0
counts[word] += 1
with open("WordsKAI.txt",'w') as outfile:
for word in sorted(counts): # if you want to sort by counts instead, use sorted(counts, key=counts.get)
outfile.write("{}: {}\n".format(word, counts[word]))
I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.