Checking if string is in text file is not working - python

I am writing in python 3.6 and am having trouble making my code match strings in a short text document. this is a simple example of the exact logic that is breaking my bigger program:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
print(file.read().splitlines())
if 'bah' not in file.read().splitlines():
print("fail")
with the text document formatted like so:
bah
gah
fah
dah
mah
and it is indeed printing out fail each time I run this. Am I using the incorrect method of reading the data from the text document?

the issue is that you're printing print(file.read().splitlines())
so it exhausts the file, and the next call to file.read().splitlines() returns an empty list...
A better way to "grep" your pattern would be to iterate on the file lines instead of reading it fully. So if you find the string early in the file, you save time:
with open(PATH, 'r') as f:
for line in f:
if line.rstrip()=="bah":
break
else:
# else is reached when no break is called from the for loop: fail
print("fail")
The small catch here is not to forget to call line.rstrip() because file generator issues the line with the line terminator. Also, if there's a trailing space in your file, this code will still match the word (make it strip() if you want to match even with leading blanks)
If you want to match a lot of words, consider creating a set of lines:
lines = {line.rstrip() for line in f}
so your in lines call will be a lot faster.

Try it:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = file.read().splitlines()
print(words)
if 'bah' not in words:
print("fail")

You can't read the file two times.
When you do print(file.read().splitlines()), the file is read and the next call to this function will return nothing because you are already at the end of file.

PATH = "your_file"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
if 'bah' not in (file.read().splitlines()) :
print("fail")
as you can see output is not 'fail' you must use one 'file.read().splitlines()' in code or save it in another variable otherwise you have an 'fail' message

Related

Add 1 word after readlines()

I am still learning python and have a question about the function readlines() The following is a part of my script:
f = open("demofile.txt", "r")
text = "".join(f.readlines())
print(text)
demofile.txt contains:
This is the first line
This is the second line
This is the third line
Now I want to add a single word to this so I get:
This is the first line
This is the second line
This is the third line
Example
I thought of something easy way of doing it:
f = open("demofile.txt", "r")
text = "".join(f.readlines())."Example"
print(text)
But that doesn't work (of course) I googled and looked around here but didn't really have the good keywords to search for this issue. Hopefully someone can point me in the right direction.
.readlines() returns list you can append() to it:
with open("demofile.txt") as txt:
lines = txt.readlines()
lines.append("Example")
text = "".join(lines)
print(text)
or you can unpack the file object txt, since its an iterator to a new list with the word you wanted to add:
with open("demofile.txt") as txt:
text = "".join([*txt, "Example"])
print(text)
Firstly, the open function in python opens a file in read mode by default. Thus, you do not need to specify the mode r when opening the file. Secondly, you should always close a file after you are done with it. A with statement in python handles this for you. Moreover, instead of using . to add Example onto the end of the string, you should use the concatenation operator in python to add a newline character, \n, and the string, Example.
with open("demofile.txt") as f:
text = "".join(f.readlines()) + "\nExample"
print(text)
This should help you. While dealing with files. It is always recommended to use with open('filename','r') as f instead of f=open('filename','r'). Using ContextManager during file open is the idea that this file will be open in any case whether everything is ok or any exception is raised. And you don't need to explicitly close the file i.e f.close().
end_text='Example'
with open('test.txt','r') as f:
text=''.join(f.readlines())+'\n'+end_text
print(text)

Reading through a .m File and Python keeps reading a character in the .m File as a line?

I am trying to read the text within a .m file in Python and Python keeps reading a single character within the .m file as a line when I use file.readline(). I've also had issues with trying to remove certain parts of the line before adding it to a list.
I've tried adjusting where the readline is on for loops that I have set up since I have to read through multiple files in this program. No matter where I put it, the string always comes out separated by character. I'm new to Python so I'm trying my best to learn what to do.
# Example of what I did
with open('MyFile.m') as f:
for line in f:
text = f.readline()
if text.startswith('%'):
continue
else:
my_string = text.strip("=")
my_list.append(my_string)
This has only partially worked as it will still return parts of lines that I do not want and when trying to format the output by putting spaces between new lines it output like so:
Expected: "The String"
What happened: "T h e S t r i n g"
Without your input file I've had to make some guesses here
Input file:
%
The
%
String
%
Solution:
my_list = []
with open('MyFile.m') as f:
for line in f:
if not line.startswith('%'):
my_list.append(line.strip("=").strip())
print(' '.join(my_list))
The readLine() call was unnecessary as the for loop already gets you the line. The empty if was negated to only catch the part that you cared about. Without your actual input file I can't help with the '=' part. If you have any clarifications I'd be glad to help further.
As suggested by Xander, you shouldn't call readline since the for line in f does that for you.
my_list = []
with open('MyFile.m') as f:
for line in f:
line = line.strip() # lose the \n if you want to
if line.startswith('%'):
continue
else:
my_string = line.strip("=")
my_list.append(my_string)

How do I compare a word from a text file?

I have a text file like below:
/john
/peter
/Sam
/Jennefer
Using the the following script:
keyword_file = open(text_file)
j = keyword_file.readlines()
for i in range(len(j)):
if j[i] == "/peter":
print "yes"
although /peter is in the text file I don't get the printed yes. However when I delete "/"s , "yes" is printed. What is the problem with it?
First off you're not just looking for /peter you're looking for /peter\n.
Second, there's a lot here that you can do to improve your script:
Use with instead of forcing yourself to open and close your file:
with open(text_file) as fp:
<your code here>
Instead of reading the entire file, read it line by line:
for line in fp:
<your business logic here>
compare your string using is instead of ==: See this SO answer why I'm wrong here
if line is '/peter\n':
<condition if peter is found>
Here's the combined script that match what you're trying to do:
with open(text_file) as fp:
for line in fp:
if line == '/peter\n':
print("yes") # please use print(<what you want to print here>) instead of print <what you want here> for compatibility with 3.0 and readability.
The problem here is that you are looking for an exact match on the whole line. This includes any special ascii characters that may be included; such as a newline character.
If you instead read the text, and split it by line, and iterate over the result your code would work:
result = keyword_file.read()
for line in result.split('\n'):
if line == "/peter":
print "yes"
As an alternative you could use
for line in keyword_file:
if line.startswith("/peter"): # or "/peter" in line
print "yes"
If you want to avoid storing the whole file in memory, and still have a clean if statement you can use strip() to remove any unnecessary special characters or spaces.
with open(file_name) as file_obj:
for line in file_obj:
if line.strip() == '/peter':
print "yes"

How can I read a file to a string starting at a given word without knowing the line number?

I have test results in a log file that are formatted like:
useless info
useless info
======================
useful info
useful info
======================
test success
The number of lines in each section can vary, so I want to check for the first appearance of the double equal character '==' and read that line until the end of the file into a string. Currently I'm using the following code to read the whole file into the string.
with open ("Report.txt", "r") as myfile:
data = myfile.read()
Thanks for the help!
useful = []
with open ("Report.txt", "r") as myfile:
for line in myfile:
if "===" in line:
break
for line in myfile:
useful.append(line)
a_string = "".join(useful)
I would however prefer to hide it away in a generator, like this:
def report_iterator():
with open ("Report.txt", "r") as myfile:
for line in myfile:
if "===" in line:
break
for line in myfile:
yield line
for line in report_iterator():
# do stuff with line
All the filtering and nitpicking is done in the generator function, and you can separate the logic of "filtering input" from the logic of "working with the input".
You could read line by line, and by default not store the lines. When you get the line starting with '==', then all lines you read until you read the second '==' line you store in your string or list.
If you've got the whole file in memory, you can get "everything but the first section" like this:
useful = data.split('======================\n',1)[1]
That splits the data on the first occurrence of your delimiter, returning everything after the delimiter.
myfile = open("Report.txt", "r")
while myfile.readline()[:2] != '==':
pass
for line in myfile:
data = line

Remove whitespaces in the beginning of every string in a file in python?

How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.

Categories