How to search for string within another string? - python

I am trying to create a simple word search program.
I have successfully opened an external file that contains the grid of the word search. I also have successfully opened a file that contains the words that are to be searched for. I have stored every line of the grid in a list and every word from the file in a list called words[].
I am attempting to search for the words in each line of the grid. My code currently does not search for the word in each line of the grid.
gridlines_horizontal = []
gridlines_vertical = []
words = []
not_found = []
found_words = {}
def puzzle(fname) :
print ""
for line in f :
gridlines_horizontal.append(line)
for line in gridlines_horizontal :
print line,
for item in zip(*(gridlines_horizontal[::-1])):
gridlines_vertical.append(item)
Here I am trying to get each word in words[] one at a time and see if the word is in any of the lines of the word search grid. If the word is present in any of the lines I am then trying to print the word. The code currently does not do this.
def horizontal_search(word,gridlines_horizontal) :
x = 0
for line in gridlines_horizontal :
if words[0] in line or words[0] in line[::-1]:
found_words.update({words[0]:" "})
print words[0]
else :
not_found.append(words)
x = x + 1
def vertical_search(word,gridlines_vertical):
x = 0
for line in gridlines_vertical:
if words[x] in line or words[x] in line[::-1]:
print words[0]
found_words.update({words[x]:" "})
else:
not_found.append(words[x])
x = x + 1
while True:
try:
fname = input("Enter a filename between double quotation marks: ")
with open(fname) as f:
puzzle(fname)
break
except IOError as e :
print""
print("Problem opening file...")
print ""
while True:
try:
fname2 = input("Enter a filename for your words between double quotation marks: ")
with open(fname2) as f:
for line in f:
words.append(line)
""" line in words:
line = lin """
break
except IOError as e :
print("")
print("Problem opening file...")

There are a couple mistakes in your code:
- You aren't being consistent in using words[x], in your code you would want to replace every words[0] with words[x] BUT
- this isn't necessary because you can use nested 'for' loops.
So for horizontal search:
def horizontal_search(words,gridlines_horizontal):
for word in words:
for line in gridlines_horizontal:
if word in line or word in line[::-1]:
found_words.update({word : " "})
print(word)
break
else:
not_found.append(word)

Did you look at find?
a = 'this is a string'
b = 'string'
if (a.find(b) > -1):
print 'found substring in string'
else:
print 'substring not found in string'
Live demo of above code
EDIT:
I am not sure if its a typo, but you are passing word as parameter instead of words
def horizontal_search(word,gridlines_horizontal) :
x = 0 ^----------------------------------
for line in gridlines_horizontal : |
if words[0] in line or words[0] in line[::-1]: |
^-- see here <------------not matching here -----
Similar issue with def vertical_search(words,gridlines_vertical) :

Related

Is there a way to output a link to a file with Python?

I have some code to sort a text and output info on it.
How it works is you copy a text a paste it into a text(.txt) file and save the file where the python file is saved. Then you go into the command prompt and type python3 the_name_of_the_python_file.py the_name_of_the_text_file.txt. When you run it, it outputs "All counted!". After that you have a new .txt file where the python file is saved and it tells you the number of words and unique words in the text file you attached. The new file will also list out what words are the most to least used.
Is there a way to get my code to output "All counted!" and then a link like thing that I can click on to open the new file?
Here is my code:
import sys
text_file = open(sys.argv[1], "r")
word_list = text_file.read().split(",")
word_list = "".join(word_list)
word_list = word_list.split(".")
word_list = "".join(word_list)
word_list = word_list.split(" ")
file_name = []
file_name = sys.argv[1].split(".")
text_file.close()
NumWords = 0
NumUniqueWords = 0
Words = {}
for i in word_list:
if i not in Words.keys():
NumWords += 1
NumUniqueWords += 1
Words[i.lower()] = 1
else:
NumWords += 1
Words[i] += 1
def get_key(val):
for key, value in Words.items():
if value == val:
return key
newfile = open(file_name[0] + "-count.txt", "w")
newfile.write("Total Words - {}\nUnique Words - {}\n\n".format(NumWords, NumUniqueWords))
for i in range(len(Words)):
newfile.write("{} - {}\n".format(get_key(max(Words.values())), max(Words.values())))
del(Words[get_key(max(Words.values()))])
newfile.close()
print("All counted!")
I do have things in my code to eliminate ","'s and "."'s and the same word capitalized or lowercase.

How to find a phrase in a large text file in Python?

I am trying to write an algorithm to find a phrase with words on different lines in a big text file using Python.
The file contents are as follows
fkerghiohgeoihhgergerig ooetbjoptj
enbotobjeob hi how
are you lerjgoegjepogjejgpgrg]
ekrngeigoieghetghehtigehtgiethg
ieogetigheihietipgietigeitgegitie
.......
The algorithm should search for the phrase "hi how are you" and return True in this case.
Since, the file can be huge, all file contents cannot be read at once
You can read the file one character at a time and change line feeds to spaces. Then its just a question of running down the list of wanted characters.
def find_words(text, fileobj):
i = 0
while True:
c = fileobj.read(1)
if not c:
break
if c == "\n": # python combines \r\n
c = " "
if c != text[i]:
i = 0
if c == text[i]:
i += 1
if i == len(text):
return True
return False
If you want to be a little more liberal about whitespace and case sensitivity, you could remove all whitespace and lower case everything before the compare.
import re
import itertools
from string import whitespace
def find_words(text, fileobj):
chars = list(itertools.chain.from_iterable(re.split(r"\s+", text.lower())))
i = 0
while True:
c = fileobj.read(1)
if not c:
break
c = c.lower()
if c in whitespace:
continue
if c != chars[i]:
i = 0
if c == chars[i]:
i += 1
if i == len(chars):
return True
return False
Here is one way to solve the problem:
import re
def find_phrase():
phrase = "hi how are you"
words = dict(zip(phrase.split(), [False]*len(phrase.split())))
with open("data.txt", "r") as f:
for line in f:
for word in words:
if re.search( r"\b" + word + r"\b", line):
words[word] = True
if all(words.values()):
return True
return False
EDIT:
def find_phrase():
phrase = "hi how are you"
with open("data.txt", "r") as f:
for line in f:
if phrase in line:
return True
return False
If it is "pretty large" file, then access the lines sequentially and don't read the whole file into memory:
with open('largeFile', 'r') as inF:
for line in inF:
if 'myString' in line:
# do_something
break
Edit:
Since the words of the string can be on consecutive lines you would want to use a counter to keep a track of words iterated. For example,
counter = 0
words_list = ["hi","hello","how"]
with open('largeFile', 'r') as inF:
for line in inF:
# print( words_list[counter] ,line)
if words_list[counter] in line and len(line.split()) == 1 :
counter +=1
else:
counter = 0
if counter == len(words_list):
print ("here")
break;
Text File
fkerghiohgeoihhgergerig ooetbjoptj enbotobjeob
hi
hello
how
goegjepogjejgpgrg] ekrngeigoieghetghehtigehtgiethg ieoge
It gives the output here since the consecutive words are found

Counting number of words and letters from txt. file

I am beginner in programming and currently facing issue with simple task. I would need to print words, calculate number of particular words and also number of letters from txt.file. I would appreciate if someone could help me with this:
def main():
file_name = input("Input file. \n")
sum_of_letters = 0
number_words = 0
try:
words = open(file_name, "r")
print("File", file_name ,"includes the following words:")
for line in words:
line = line.rstrip()
words= line.split()
for i in words:
print(i)
sum_of_letters += len(i)
number_words += 1
print("---------------------------------------")
print("Number of words",number_words ,"ja", sum_of_letters, "kirjainta.")
close.words()
except OSError:
print("Error observed")
Your code is almost correct but you're using words to refer to both the file as well as the words in a single line. I've updated the code below to use different variables:
def main():
file_name = input("Input file. \n")
sum_of_letters = 0
number_words = 0
try:
words = open(file_name, "r")
print("File", file_name ,"includes the following words:")
for line in words:
line = line.rstrip()
words_in_line = line.split()
for i in words_in_line:
print(i)
sum_of_letters += len(i)
number_words += 1
print("---------------------------------------")
print("Number of words",number_words ,"ja", sum_of_letters, "kirjainta.")
words.close()
except OSError:
print("Error observed")

How to iterate through a file once a word is found

I am searching a text file for an input word. However, I am only meant to search the text in the file after the word "START". The first twenty-odd before "START" should be ignored. I know how to find "START", but not how to search the rest of the file once "START" is encountered. I would appreciate any guidance!
Here is what I have so far:
file = open("EnglishWords.txt", "r")
print("***** Anagram Finder *****")
word = input("Enter a word: ")
for line in file:
if "START" in line:
if word in line:
print("Yes, ", word, " is in the file.", sep="")
else:
print("Sorry, ", word, " is not in the file.", sep="")
file.close()
Here is a sample of the text file:
The name of Princeton University or Princeton may not be
used in advertising or publicity pertaining to
distribution of the software and/or database. Title to
copyright in this software, database and any associated
documentation shall at all times remain with Princeton
University and LICENSEE agrees to preserve same.
START
clobber
transversalis
squinter
cunner
damson
extrovertive
absorptive
Modifying your code, we have
file = open("EnglishWords.txt", "r")
print("***** Anagram Finder *****")
word = input("Enter a word: ")
start_looking = False
word_found = False
for line in file:
if not start_looking:
if "START" in line:
start_looking = True
else:
continue
if word in line:
print("Yes, ", word, " is in the file.", sep="")
word_found = True
break
if not word_found:
print("Sorry, ", word, " is not in the file.", sep="")
file.close()
As long as START hasn't been found, keep skipping over the lines of the file. If, however, you encounter START, reset your flag and begin looking.
Do a for after your word is found:
with open(myfile, 'r') as f:
for line in f:
if 'START' in line:
# do stuff to lines below 'START'
# you could do another for loop here to iterate
for line in f:
print (line) # just an example
Very similar to this other SO post. Credit for the syntax of my answer comes from its answer.
What about something with regex module ?
re.findall(r"START.*(word_to_search).*", entire_text)
This should return you the result only if there is a START before the word to search for. I hope that's what you're looking for.
EDIT :
For a solution line by line i would go with something like :
start_search = 0
with open(bigfile, "r") as f:
for line in f:
if "START" IN line:
start_search = 1
if start_search and word_to_search in line:
print("result foun")
return (word_to_search)
What about this ?
Keep it short, simple and explicit:
with open("EnglishWords.txt", 'r') as fin:
output = fin.readlines()
# Find the line that contains START
index = output.index("START")
# Search all the lines after that
for line in output[index+1:]:
if word in line:
print("Yes, ", word, " is in the file.", sep="")
else:
print("Sorry, ", word, " is not in the file.", sep="")
You could use Python's dropwhile() to locate the start of the words and iterate from there:
from itertools import dropwhile
print("***** Anagram Finder *****")
word = input("Enter a word: ").lower() + '\n'
with open("EnglishWords.txt") as f_words:
if word in dropwhile(lambda r: not r.startswith("START"), f_words):
print("Yes, {} is in the file".format(word.strip()))
else:
print("Sorry, {} is not in the file.".format(word.strip()))
You can use a boolean :
file = open(“testfile.txt”, “r”)
foundStart = False
for line in file:
if foundStart:
# do something...
elif line == "START":
foundStart = True

Replace four letter word in python

I am trying to write a program that opens a text document and replaces all four letter words with **. I have been messing around with this program for multiple hours now. I can not seem to get anywhere. I was hoping someone would be able to help me out with this one. Here is what I have so far. Help is greatly appreciated!
def censor():
filename = input("Enter name of file: ")
file = open(filename, 'r')
file1 = open(filename, 'w')
for element in file:
words = element.split()
if len(words) == 4:
file1 = element.replace(words, "xxxx")
alist.append(bob)
print (file)
file.close()
here is revised verison, i don't know if this is much better
def censor():
filename = input("Enter name of file: ")
file = open(filename, 'r')
file1 = open(filename, 'w')
i = 0
for element in file:
words = element.split()
for i in range(len(words)):
if len(words[i]) == 4:
file1 = element.replace(i, "xxxx")
i = i+1
file.close()
for element in file:
words = element.split()
for word in words:
if len(word) == 4:
etc etc
Here's why:
say the first line in your file is 'hello, my name is john'
then for the first iteration of the loop: element = 'hello, my name is john'
and words = ['hello,','my','name','is','john']
You need to check what is inside each word thus for word in words
Also it might be worth noting that in your current method you do not pay any attention to punctuation. Note the first word in words above...
To get rid of punctuation rather say:
import string
blah blah blah ...
for word in words:
cleaned_word = word.strip(string.punctuation)
if len(cleaned_word) == 4:
etc etc
Here is a hint: len(words) returns the number of words on the current line, not the length of any particular word. You need to add code that would look at every word on your line and decide whether it needs to be replaced.
Also, if the file is more complicated than a simple list of words (for example, if it contains punctuation characters that need to be preserved), it might be worth using a regular expression to do the job.
It can be something like this:
def censor():
filename = input("Enter name of file: ")
with open(filename, 'r') as f:
lines = f.readlines()
newLines = []
for line in lines:
words = line.split()
for i, word in enumerate(words):
if len(word) == 4:
words[i] == '**'
newLines.append(' '.join(words))
with open(filename, 'w') as f:
for line in newLines:
f.write(line + '\n')
def censor(filename):
"""Takes a file and writes it into file censored.txt with every 4-letterword replaced by xxxx"""
infile = open(filename)
content = infile.read()
infile.close()
outfile = open('censored.txt', 'w')
table = content.maketrans('.,;:!?', ' ')
noPunc = content.translate(table) #replace all punctuation marks with blanks, so they won't tie two words together
wordList = noPunc.split(' ')
for word in wordList:
if '\n' in word:
count = word.count('\n')
wordLen = len(word)-count
else:
wordLen = len(word)
if wordLen == 4:
censoredWord = word.replace(word, 'xxxx ')
outfile.write(censoredWord)
else:
outfile.write(word + ' ')
outfile.close()

Categories