Open text file and slice words based on blank spaces - python

I want to open text file and slice words based on blank spaces, but cut by \n. Why does it work like this? Is the problem in the text file or that my code is wrong?
def process(w):
output =""
for ch in w:
if ch.isalpha() :
output += ch
return output.lower()
words = set()
fname = input("file name: ")
file = open(fname, "r")
for line in file:
lineWords = line.split()
for word in lineWords:
words.add(process(lineWords))
print("Number of words used =", len(words))
print(words)
Text file:
Result:

Related

Words Search in a txt document, python

I have this simple code that reads a txt file and accepts a word from the user to check if that word is in the txt document or not. It looks like this works only for a single word. I have to modify this code so that the user can input two or more words. Example; GOING HOME instead of just HOME. Any help please.
word = input('Enter any word that you want to find in text File :')
f = open("AM30.EB","r")
if word in f.read().split():
print('Word Found in Text File')
else:
print('Word not found in Text File')
I'm not sure this is exactly what you are looking for
f = open("AM30.EB","r")
word_list = []
while True:
word = input('Enter any word that you want to find in text File or 1 to stop entering words :')
if word == "1": break
word_list.append(word)
file_list = f.read().split()
for word in word_list:
if word in file_list:
print("Found word - {}".format(word))
These are case-sensitive solutions!
All words in query separately:
words = input('Enter all words that you want to find in text File: ').split()
f_data = []
with open("AM30.EB", "r") as f:
f_data = f.read().split()
results = list(map(lambda x: any([y == x for y in f_data]), words))
print("Found ")
for i in range(len(words)):
print(f"'{words[i]}'", end="")
if i < len(words) - 1:
print("and", end="")
print(f": {all(results)}")
Any word in query:
words = input('Enter any word that you want to find in the text File: ').split()
f_data = []
with open("AM30.EB", "r") as f:
f_data = f.read().split()
results = list(map(lambda x: any([y == x for y in f_data]), words))
if any(results):
for i in range(len(words)):
print(f"Found '{words[i]}': {results[i]}")
Exact phrase in query:
phrase = input('Enter a phrase that you want to find in the text File: ')
f_data = ""
with open("AM30.EB", "r") as f:
f_data = f.read()
print(f"Found '{phrase}': {f_data.count(phrase) > 0}")
This is case sensitive and checks for each word individually. Not sure if this is what you were looking for but hope it helps!
file1 = open('file.txt', 'r').read().split()
wordsFoundList = []
userInput = input('Enter any word or words that you want to find in text File :').split()
for word in userInput:
if word in file1:
wordsFoundList.append(word)
if len(wordsFoundList) == 0:
print("No words found in text file")
else:
print("These words were found in text file: " + str(wordsFoundList))

i have 200 text file in hindi. want to remove white space the special character and find the find bigram and trigram in python

import os
dir=os.getcwd()
print(dir)
dir1=os.path.join(dir,"test")
filename=os.listdir(dir1)
bad_chars = [';', ':', '!', "*","#","%"]
for i in filename:
filepath=os.path.join(dir1,i) # the path
file=open(filepath,"r",encoding="utf8") #open first text file
read_=file.read()
fields = read_.split(" ")
print(fields)
file1=open(filepath,"w",encoding="utf8")
file2=open(filepath,"a",encoding="utf8")
for j in range(len(fields)):
for p in bad_chars :
fields[j].replace(i,' ')
file2.write(fields[j])
print ("Resultant list is : " , fields[j])
file.close()
file1.close()
file2.close()
I am trying to remove special character fro all the 200 text file
this is the code for bigram which I found
example my name is eshan.
output
my, name occurs 1
name,is occurs 1
is, advance occurs 1
occurance can be more then 1 according to text
Try this way:
for file in filename:
filepath=os.path.join(dir1,file)
with open('inp.txt','r+') as f:
texts = f.read()
for c in bad_chars:
texts=texts.replace(c,' ')
#write to the file
with open('inp.txt','w') as f:
f.write(texts)

censor word from text file and create new file

using python to censor the word "winter" from user input file - "y.txt"
wrote some code but i'm running into errors. any occurrence of the word "winter" should be gone. help and thanks!
filename = input("Enter file name (without extension): ")
file1 = filename+".txt"
file2 = filename+"_censored.txt"
word = input("Enter the word you are searching for: ")
#In this case, the input would be "winter"
print("\nLooping through the file, line by line.")
in_text_file = open(file1, "r")
out_text_file = open(file2,"w")
for line in in_text_file:
print(line)
out_text_file.write(line)
n = [ ]
def censor(word, filename):
for i in text.split(""):
if i == word:
i = "*" * len(word)
n.append(i)
else:
n.append(i)
return "".join(n)
censor(word,filename)
in_text_file.close()
out_text_file.close()
im getting the errors
the word winter is not quoted: censor(winter,filename) should be censor("winter",filename) or censor(word, filename)
Edit: also you need to open the file and read a line into the text variable.
Personally, I would use regular expressions to avoid gotchas with periods.
import re
then for each line:
line = re.sub(r'(^| )word here($| |\.|,|\;)','*****',line);

How to search for string within another string?

I am trying to create a simple word search program.
I have successfully opened an external file that contains the grid of the word search. I also have successfully opened a file that contains the words that are to be searched for. I have stored every line of the grid in a list and every word from the file in a list called words[].
I am attempting to search for the words in each line of the grid. My code currently does not search for the word in each line of the grid.
gridlines_horizontal = []
gridlines_vertical = []
words = []
not_found = []
found_words = {}
def puzzle(fname) :
print ""
for line in f :
gridlines_horizontal.append(line)
for line in gridlines_horizontal :
print line,
for item in zip(*(gridlines_horizontal[::-1])):
gridlines_vertical.append(item)
Here I am trying to get each word in words[] one at a time and see if the word is in any of the lines of the word search grid. If the word is present in any of the lines I am then trying to print the word. The code currently does not do this.
def horizontal_search(word,gridlines_horizontal) :
x = 0
for line in gridlines_horizontal :
if words[0] in line or words[0] in line[::-1]:
found_words.update({words[0]:" "})
print words[0]
else :
not_found.append(words)
x = x + 1
def vertical_search(word,gridlines_vertical):
x = 0
for line in gridlines_vertical:
if words[x] in line or words[x] in line[::-1]:
print words[0]
found_words.update({words[x]:" "})
else:
not_found.append(words[x])
x = x + 1
while True:
try:
fname = input("Enter a filename between double quotation marks: ")
with open(fname) as f:
puzzle(fname)
break
except IOError as e :
print""
print("Problem opening file...")
print ""
while True:
try:
fname2 = input("Enter a filename for your words between double quotation marks: ")
with open(fname2) as f:
for line in f:
words.append(line)
""" line in words:
line = lin """
break
except IOError as e :
print("")
print("Problem opening file...")
There are a couple mistakes in your code:
- You aren't being consistent in using words[x], in your code you would want to replace every words[0] with words[x] BUT
- this isn't necessary because you can use nested 'for' loops.
So for horizontal search:
def horizontal_search(words,gridlines_horizontal):
for word in words:
for line in gridlines_horizontal:
if word in line or word in line[::-1]:
found_words.update({word : " "})
print(word)
break
else:
not_found.append(word)
Did you look at find?
a = 'this is a string'
b = 'string'
if (a.find(b) > -1):
print 'found substring in string'
else:
print 'substring not found in string'
Live demo of above code
EDIT:
I am not sure if its a typo, but you are passing word as parameter instead of words
def horizontal_search(word,gridlines_horizontal) :
x = 0 ^----------------------------------
for line in gridlines_horizontal : |
if words[0] in line or words[0] in line[::-1]: |
^-- see here <------------not matching here -----
Similar issue with def vertical_search(words,gridlines_vertical) :

Replace four letter word in python

I am trying to write a program that opens a text document and replaces all four letter words with **. I have been messing around with this program for multiple hours now. I can not seem to get anywhere. I was hoping someone would be able to help me out with this one. Here is what I have so far. Help is greatly appreciated!
def censor():
filename = input("Enter name of file: ")
file = open(filename, 'r')
file1 = open(filename, 'w')
for element in file:
words = element.split()
if len(words) == 4:
file1 = element.replace(words, "xxxx")
alist.append(bob)
print (file)
file.close()
here is revised verison, i don't know if this is much better
def censor():
filename = input("Enter name of file: ")
file = open(filename, 'r')
file1 = open(filename, 'w')
i = 0
for element in file:
words = element.split()
for i in range(len(words)):
if len(words[i]) == 4:
file1 = element.replace(i, "xxxx")
i = i+1
file.close()
for element in file:
words = element.split()
for word in words:
if len(word) == 4:
etc etc
Here's why:
say the first line in your file is 'hello, my name is john'
then for the first iteration of the loop: element = 'hello, my name is john'
and words = ['hello,','my','name','is','john']
You need to check what is inside each word thus for word in words
Also it might be worth noting that in your current method you do not pay any attention to punctuation. Note the first word in words above...
To get rid of punctuation rather say:
import string
blah blah blah ...
for word in words:
cleaned_word = word.strip(string.punctuation)
if len(cleaned_word) == 4:
etc etc
Here is a hint: len(words) returns the number of words on the current line, not the length of any particular word. You need to add code that would look at every word on your line and decide whether it needs to be replaced.
Also, if the file is more complicated than a simple list of words (for example, if it contains punctuation characters that need to be preserved), it might be worth using a regular expression to do the job.
It can be something like this:
def censor():
filename = input("Enter name of file: ")
with open(filename, 'r') as f:
lines = f.readlines()
newLines = []
for line in lines:
words = line.split()
for i, word in enumerate(words):
if len(word) == 4:
words[i] == '**'
newLines.append(' '.join(words))
with open(filename, 'w') as f:
for line in newLines:
f.write(line + '\n')
def censor(filename):
"""Takes a file and writes it into file censored.txt with every 4-letterword replaced by xxxx"""
infile = open(filename)
content = infile.read()
infile.close()
outfile = open('censored.txt', 'w')
table = content.maketrans('.,;:!?', ' ')
noPunc = content.translate(table) #replace all punctuation marks with blanks, so they won't tie two words together
wordList = noPunc.split(' ')
for word in wordList:
if '\n' in word:
count = word.count('\n')
wordLen = len(word)-count
else:
wordLen = len(word)
if wordLen == 4:
censoredWord = word.replace(word, 'xxxx ')
outfile.write(censoredWord)
else:
outfile.write(word + ' ')
outfile.close()

Categories