list index out of range python decompressing text - python

The code I currently have is shown below. What it does first is ask the user to input a sentence. The program then finds the position of each word in the sentence and also splits the word into a list to get individual words. The program then gets rid of any repeated words to make the words in the list unique. The program then proceeds to save (using son)the position of words in the sentence (e.g 1,2,3,4,1,1,2,3,5) and unique words to a separate file (which the user can name). The next part of the program tries to decompress the unique text from the separate file and tries to recreate the original sentence from the position of words in the sentence and unique words. I know this stage works as I have tested it separately. However when i run the program now, I keep getting this error message:
File "/Users/Sid/Desktop/Task3New.py", line 70, in OutputDecompressed
decompression.append(orgwords[i])
IndexError: list index out of range
I have no idea why this isn't working, anyone care to help? All help appreciated, thanks.
import json
import os.path
def InputSentence():
global sentence
global words
sentence = input("Enter a sentence: ")
words = sentence.split(' ')
def Validation():
if sentence == (""):
print ("No sentence was inputted. \nPlease input a sentence...")
Error()
def Uniquewords():
print ("Words in the sentence: " + str(words))
for i in range(len(words)):
if words[i] not in unilist:
unilist.append(words[i])
print ("Unique words: " + str(unilist))
def PosText():
global find
global pos
find = dict((sentence, words.index(sentence)+1) for sentence in list(words))
pos = (list(map(lambda sentence: find [sentence], words)))
return (pos)
def OutputText():
print ("The positions of the word(s) in the sentence are: " + str(pos))
def SaveFile():
filename = input("We are now going to save the contents of this program into a new file. \nWhat would you like to call the new file? ")
newfile = open((filename)+'.txt', 'w')
json.dump([unilist, pos], newfile)
newfile.close
def InputFile():
global compfilename
compfilename = input("Please enter an existing compressed file to be decompressed: ")
def Validation2():
if compfilename == (""):
print ("Nothing was entered for the filename. Please re-enter a valid filename.")
Error()
if os.path.exists(filename + ".txt") == False:
print ("No such file exists. Please enter a valid existing file.")
Error()
def OutputDecompressed():
newfile = open((compfilename)+'.txt', 'r')
saveddata = json.load(newfile)
orgpos = saveddata[1]
orgwords = saveddata[0]
print ("Unique words in the original sentence: " + str(orgwords) + "\nPosition of words in the sentence: " + str(orgpos))
decompression = []
prev = orgpos[0]
x=0
#decomposing the index locations
for cur in range(1,len(orgpos)):
if (prev == orgpos[cur]): x+= 1
else:
orgpos[cur]-=x
x=0
prev = orgpos[cur]
#Getting the output
for i in orgpos:
decompression.append(orgwords[i-1])
finalsentence = (' '.join(decompression))
print ("Original sentence from file: " + finalsentence)
def Error():
MainCompression()
def MainCompression():
global unilist
unilist = []
InputSentence()
Uniquewords()
PosText()
OutputText()
SaveFile()
InputFile()
Validation()
OutputDecompressed()
MainCompression()

The problem is that you are using the indices form words as indices for unilist/orgwords.
Let's take a look at the problem:
def PosText():
global find
global pos
find = dict((sentence, words.index(sentence)+1) for sentence in list(words))
pos = (list(map(lambda sentence: find [sentence], words)))
return (pos)
Here find maps every word to its position in the list words. (BTW why is the variable that iterates over words called sentence?) Then, for every word this position is stored in a new list. This process could be expressed in one line: pos = [words.index(word)+1 for word in words]
When you now look at OutputDecompressed, you see:
for i in orgpos:
decompression.append(orgwords[i-1])
Here orgpos is pos and orgwords is the list of unique words. Now every stored index is used to get back the original words, but this is flawed because orgpos contains indices of words even though they are used to access orgwords.
The solution to this problem is to rewrite PosText and parts of OutputDecompressed:
def PosText():
global pos
pos = [unilist.index(word)+1 for word in words]
return pos
def OutputDecompressed():
newfile = open((compfilename)+'.txt', 'r')
saveddata = json.load(newfile)
orgpos = saveddata[1]
orgwords = saveddata[0]
print ("Unique words in the original sentence: " + str(orgwords) + "\nPosition of words in the sentence: " + str(orgpos))
decompression = []
# I could not figure out what this middle part was doing, so I left it out
for i in orgpos:
decompression.append(orgwords[i-1])
finalsentence = (' '.join(decompression))
print ("Original sentence from file: " + finalsentence)
Some comments on your code:
After InputSentence() Validation() should be called to validate it
After InputFile() you must call Validation2() and not Validation()
In Validation2() it should be compfilename and not filename
You should use parameters instead of global variables. This makes it more clear what the functions are supposed to do. For example Uniquewords could accept the list of words and return the list of unique words. It also makes the program much easier to debug by just testing every function one-by-one, which is currently not possible.
To make it easier for other Python programmers to read your code you could use the Python coding style specified in PEP 8

Related

Is there any way to obtain a random word from PyEnchant?

Is there a way to obtain a random word from PyEnchant's dictionaries?
I've tried doing the following:
enchant.Dict("<language>").keys() #Type 'Dict' has no attribute 'keys'
list(enchant.Dict("<language>")) #Type 'Dict' is not iterable
I've also tried looking into the module to see where it gets its wordlist from but haven't had any success.
Using the separate "Random-Words" module is a workaround, but as it doesn't follow the same wordlist as PyEnchant, not all words will match. It is also quite a slow method. It is, however, the best alternative I've found so far.
Your question really got me curious so I thought of some way to make a random word using enchant.
import enchant
import random
import string
# here I am getting hold of all the letters
letters = string.ascii_lowercase
# crating a string with a random length with random letters
word = "".join([random.choice(letters) for _ in range(random.randint(3, 8))])
d = enchant.Dict("en_US")
# using the `enchant` to suggest a word based on the random string we provide
random_word = d.suggest(word)
Sometimes the suggest method will not return any suggestion so you will need to make a loop to check if random_word has any value.
With the help of #furas this question has been resolved.
Using the dict-en text file in furas' PyWordle, I wrote a short code that filters out invalid words in pyenchant's wordlist.
import enchant
wordlist = enchant.Dict("en_US")
baseWordlist = open("dict-en.txt", "r")
lines = baseWordlist.readlines()
baseWordlist.close()
newWordlist = open("dict-en_NEW.txt", "w") #write to new text file
for line in lines:
word = line.strip("\n")
if wordList.check(word) == True: #if word exists in pyenchant's dictionary
print(line + " is valid.")
newWordlist.write(line)
else:
print(line + " is invalid.")
newWordlist.close()
Afterwards, calling the text file will allow you to gather the information in that line.
validWords = open("dict-en_NEW", "r")
wordList = validWords.readlines()
myWord = wordList[<line>]
#<line> can be any int (max is .txt length), either a chosen one or a random one.
#this will return the word located at line <line>.

Split input string into single words, check the value of the word if specific value exists create variable keeping same index as original string

I am working on a program that will take a sentence, split it into individual words then check each word for specific values. What I want to happen is if the specific value exists in the input string, create a variable with that value while keeping the same index as the original input string for sentence structure when joining them back together.
Here is my code so far, I the split & rejoin sussed just cant seem to figure out how to go about the rest of it..
#emsg = "Word not found."
c_e = raw_input("Text: ")
wordblocks = c_e.split(' ',)
wordblocksrev = ' '.join(wordblocks[::1])
print wordblocks
print wordblocksrev
Edit: wordblocksrev refers to "wordblocks" after the words have been replaced & put back into a string for the output, so the output will be results of corresponding dictionary words in the same index. I've already figured the word swap out.
Example:
text = input("Enter text: ")
#For arguments sake the user enters "I have a black dog"
words = text.split(' ')
#["I", "have", "a", "black", "dog"]
#set variable with each value
wordblock1 = "I"
wordblock2 = "have"
wordblock3 = "three"
wordblock4 = "black"
wordblock5 = "dogs"
altwords = ["You", "had", "four", "white", "cats"]
#if "have" in text, replace with "had" (same index separate lists)
#if "black" in text, replace with "white"
#I want to hold each word as a variable, change some words, join back together so that it makes sense as a sentence
apologies for any bad code, im pretty new to python & still learning the basics. TIA
Ive figured it out, heres the code incase anyone else finds use in it. Probably a much better way of doing this but its fast enough for me & does the job.
txt = input("Enter sentence: ")
split = txt.split(' ')
collectedwords = []
if not txt:
print("No input entered.")
wordblock1 = ""
else:
wordblock1 = split[0]
wordtest1 = wordblock1
if wordtest1 == 'black':
wordr1 = wordtest1.replace("black", "white")
collectedwords.append(wordr1)
else:
wordr1 = split[0]
collectedwords.append(wordr1)
if "" in txt:
wordblock2 = ""
else:
wordblock2 = split[1]
wordtest2 = wordblock2
if wordtest2 == 'blue':
wordr2 = wordtest2.replace("blue", "green")
collectedwords.append(wordr2)
else:
wordr2 = split[1]
collectedwords.append(wordr2)
reform = ' '.join(collectedwords)
print(txt)
print("Collected Words:", collectedwords)
print("\nReformed Words:", reform)

How do I 'find' a string from a file and display it? [duplicate]

how do i find a string or a list in a text file. If i have a text file filled with words only, random words, not sentences, how do i find a string inside it. lets say I take a user input, how do I test or check if the string input by the user exist in the text file.
so far i have this:
x = open('sample.txt')
for a in x:
b = a.split() #converts the string from the txt to a list
c = input('enter: ') #user input
if any(c in a for a in b):
print('yes')
want to make a simple spellchecker. so from the user input string i want to check if that string matches the strings/list in the txt file
You mean, how to find a word in this file? That's
with open("sample.txt") as input:
dictionary = set(input.read().split())
then
w in dictionary
for your word w.
You asked three related but different questions:
1. "how do i find a string or a list in a text file."
text = input('enter: ')
data = open("sample.txt").read()
position = data.find(text)
if position != -1:
print("Found at position", position)
else:
print("Not found")
2. "how do I test or check if the string input by the user exist in the text file"
text = input('enter: ')
data = open("sample.txt").read()
if text in data:
print("Found")
else:
print("Not found")
3. "I want to make a simple spellchecker. so from the user input string i want to check if that string matches the strings/list in the txt file"
Make a dictionary like this, globally in a module:
dictionary = open("sample.txt").read().split()
Then you use the dictionary by importing it:
from themodule import dictionary
And you check if a word is in the dictionary like so:
'word' in dictionary

Having trouble with two of my functions for text analysis

I'm having trouble trying to find the amount of unique words in a speech text file (well actually 3 files), I'm just going to give you my full code so there is no misunderstandings.
#This program will serve to analyze text files for the number of words in
#the text file, number of characters, sentances, unique words, and the longest
#word in the text file. This program will also provide the frequency of unique
#words. In particular, the text will be three political speeches which we will
#analyze, building on searching techniques in Python.
def main():
harper = readFile("Harper's Speech.txt")
newWords = cleanUpWords(harper)
print(numCharacters(harper), "Characters.")
print(numSentances(harper), "Sentances.")
print(numWords(newWords), "Words.")
print(uniqueWords(newWords), "Unique Words.")
print("The longest word is: ", longestWord(newWords))
obama1 = readFile("Obama's 2009 Speech.txt")
newWords = cleanUpWords(obama1)
print(numCharacters(obama1), "Characters.")
print(numSentances(obama1), "Sentances.")
print(numWords(obama1), "Words.")
print(uniqueWords(newWords), "Unique Words.")
print("The longest word is: ", longestWord(newWords))
obama2 = readFile("Obama's 2008 Speech.txt")
newWords = cleanUpWords(obama2)
print(numCharacters(obama2), "Characters.")
print(numSentances(obama2), "Sentances.")
print(numWords(obama2), "Words.")
print(uniqueWords(newWords), "Unique Words.")
print("The longest word is: ", longestWord(newWords))
def readFile(filename):
'''Function that reads a text file, then prints the name of file without
'.txt'. The fuction returns the read file for main() to call, and print's
the file's name so the user knows which file is read'''
inFile1 = open(filename, "r")
fileContentsList = inFile1.read()
inFile1.close()
print("\n", filename.replace(".txt", "") + ":")
return fileContentsList
def numCharacters(file):
'''Fucntion returns the length of the READ file (not readlines because it
would only read the amount of lines and counting characters would be wrong),
which will be the correct amount of total characters in the text file.'''
return len(file)
def numSentances(file):
'''Function returns the occurances of a period, exclamation point, or
a question mark, thus counting the amount of full sentances in the text file.'''
return file.count(".") + file.count("!") + file.count("?")
def cleanUpWords(file):
words = (file.replace("-", " ").replace(" ", " ").replace("\n", " "))
onlyAlpha = ""
for i in words:
if i.isalpha() or i == " ":
onlyAlpha += i
return onlyAlpha.replace(" ", " ")
def numWords(newWords):
'''Function finds the amount of words in the text file by returning
the length of the cleaned up version of words from cleanUpWords().'''
return len(newWords.split())
def uniqueWords(newWords):
unique = sorted(newWords.split())
unique = set(unique)
return str(len(unique))
def longestWord(file):
max(file.split())
main()
So, my last two functions uniqueWords, and longestWord will not work properly, or at least my output is wrong. for unique words, i'm supposed to get 527, but i'm actually getting 567 for some odd reason. Also, my longest word function is always printing none, no matter what i do. I've tried many ways to get the longest word, the above is just one of those ways, but all return none. Please help me with my two sad functions!
Try to do it this way:
def longestWord(file):
return sorted(file.split(), key = len)[-1]
Or it would be even easier to do in uniqueWords
def uniqueWords(newWords):
unique = set(newWords.split())
return (str(len(unique)),max(unique, key=len))
info = uniqueWords("My name is Harper")
print("Unique words" + info[0])
print("Longest word" + info[1])
and you don't need sorted before set to get all unique words
because set it's an Unordered collections of unique elements
And look at cleanUpWords. Because if you will have string like that Hello I'm Harper. Harper I am.
After cleaning it up you will get 6 unique words, because you will have word Im.

Search in matrix - python

I need to write a function that will search for words in a matrix. For the moment i'm trying to search line by line to see if the word is there. This is my code:
def search(p):
w=[]
for i in p:
w.append(i)
s=read_wordsearch() #This is my matrix full of letters
for line in s:
l=[]
for letter in line:
l.append(letter)
if w==l:
return True
else:
pass
This code works only if my word begins in the first position of a line.
For example I have this matrix:
[[a,f,l,y],[h,e,r,e],[b,n,o,i]]
I want to find the word "fly" but can't because my code only works to find words like "here" or "her" because they begin in the first position of a line...
Any form of help, hint, advice would be appreciated. (and sorry if my english is bad...)
You can convert each line in the matrix to a string and try to find the search work in it.
def search(p):
s=read_wordsearch()
for line in s:
if p in ''.join(line):
return True
I'll give you a tip to search within a text for a word. I think you will be able to extrapolate to your data matrix.
s = "xxxxxxxxxhiddenxxxxxxxxxxx"
target = "hidden"
for i in xrange(len(s)-len(target)):
if s[i:i+len(target)] == target:
print "Found it at index",i
break
If you want to search for words of all length, if perhaps you had a list of possible solutions:
s = "xxxxxxxxxhiddenxxxtreasurexxxxxxxx"
targets = ["hidden","treasure"]
for i in xrange(len(s)-1):
for j in xrange(i+1,len(s)):
if s[i:j] in targets:
print "Found",s[i:j],"at index",
def search(p):
w = ''.join(p)
s=read_wordsearch() #This is my matrix full of letters
for line in s:
word = ''.join(line)
if word.find(w) >= 0:
return True
return False
Edit: there is already lot of string functions available in Python. You just need to use strings to be able to use them.
join the characters in the inner lists to create a word and search with in.
def search(word, data):
return any(word in ''.join(characters) for characters in data)
data = [['a','f','l','y'], ['h','e','r','e'], ['b','n','o','i']]
if search('fly', data):
print('found')
data contains the matrix, characters is the name of each individual inner list. any will stop after it has found the first match (short circuit).

Categories