How do I compare a word from a text file?

How do I compare a word from a text file? - python

I have a text file like below:
/john
/peter
/Sam
/Jennefer
Using the the following script:
keyword_file = open(text_file)
j = keyword_file.readlines()
for i in range(len(j)):
if j[i] == "/peter":
print "yes"
although /peter is in the text file I don't get the printed yes. However when I delete "/"s , "yes" is printed. What is the problem with it?

First off you're not just looking for /peter you're looking for /peter\n.
Second, there's a lot here that you can do to improve your script:
Use with instead of forcing yourself to open and close your file:
with open(text_file) as fp:
<your code here>
Instead of reading the entire file, read it line by line:
for line in fp:
<your business logic here>
compare your string using is instead of ==: See this SO answer why I'm wrong here
if line is '/peter\n':
<condition if peter is found>
Here's the combined script that match what you're trying to do:
with open(text_file) as fp:
for line in fp:
if line == '/peter\n':
print("yes") # please use print(<what you want to print here>) instead of print <what you want here> for compatibility with 3.0 and readability.

The problem here is that you are looking for an exact match on the whole line. This includes any special ascii characters that may be included; such as a newline character.
If you instead read the text, and split it by line, and iterate over the result your code would work:
result = keyword_file.read()
for line in result.split('\n'):
if line == "/peter":
print "yes"
As an alternative you could use
for line in keyword_file:
if line.startswith("/peter"): # or "/peter" in line
print "yes"
If you want to avoid storing the whole file in memory, and still have a clean if statement you can use strip() to remove any unnecessary special characters or spaces.
with open(file_name) as file_obj:
for line in file_obj:
if line.strip() == '/peter':
print "yes"

Related

Reading through a .m File and Python keeps reading a character in the .m File as a line?

I am trying to read the text within a .m file in Python and Python keeps reading a single character within the .m file as a line when I use file.readline(). I've also had issues with trying to remove certain parts of the line before adding it to a list.
I've tried adjusting where the readline is on for loops that I have set up since I have to read through multiple files in this program. No matter where I put it, the string always comes out separated by character. I'm new to Python so I'm trying my best to learn what to do.
# Example of what I did
with open('MyFile.m') as f:
for line in f:
text = f.readline()
if text.startswith('%'):
continue
else:
my_string = text.strip("=")
my_list.append(my_string)
This has only partially worked as it will still return parts of lines that I do not want and when trying to format the output by putting spaces between new lines it output like so:
Expected: "The String"
What happened: "T h e S t r i n g"

Without your input file I've had to make some guesses here
Input file:
%
The
%
String
%
Solution:
my_list = []
with open('MyFile.m') as f:
for line in f:
if not line.startswith('%'):
my_list.append(line.strip("=").strip())
print(' '.join(my_list))
The readLine() call was unnecessary as the for loop already gets you the line. The empty if was negated to only catch the part that you cared about. Without your actual input file I can't help with the '=' part. If you have any clarifications I'd be glad to help further.

As suggested by Xander, you shouldn't call readline since the for line in f does that for you.
my_list = []
with open('MyFile.m') as f:
for line in f:
line = line.strip() # lose the \n if you want to
if line.startswith('%'):
continue
else:
my_string = line.strip("=")
my_list.append(my_string)

Checking if string is in text file is not working

I am writing in python 3.6 and am having trouble making my code match strings in a short text document. this is a simple example of the exact logic that is breaking my bigger program:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
print(file.read().splitlines())
if 'bah' not in file.read().splitlines():
print("fail")
with the text document formatted like so:
bah
gah
fah
dah
mah
and it is indeed printing out fail each time I run this. Am I using the incorrect method of reading the data from the text document?

the issue is that you're printing print(file.read().splitlines())
so it exhausts the file, and the next call to file.read().splitlines() returns an empty list...
A better way to "grep" your pattern would be to iterate on the file lines instead of reading it fully. So if you find the string early in the file, you save time:
with open(PATH, 'r') as f:
for line in f:
if line.rstrip()=="bah":
break
else:
# else is reached when no break is called from the for loop: fail
print("fail")
The small catch here is not to forget to call line.rstrip() because file generator issues the line with the line terminator. Also, if there's a trailing space in your file, this code will still match the word (make it strip() if you want to match even with leading blanks)
If you want to match a lot of words, consider creating a set of lines:
lines = {line.rstrip() for line in f}
so your in lines call will be a lot faster.

Try it:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = file.read().splitlines()
print(words)
if 'bah' not in words:
print("fail")

You can't read the file two times.
When you do print(file.read().splitlines()), the file is read and the next call to this function will return nothing because you are already at the end of file.

PATH = "your_file"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
if 'bah' not in (file.read().splitlines()) :
print("fail")
as you can see output is not 'fail' you must use one 'file.read().splitlines()' in code or save it in another variable otherwise you have an 'fail' message

Add lines to text file after occurence of certain text

I have a text file that I needs to manipulate. I want to add a line after occurence of word "exactarch". Means whenever "exactarch" occurs, I want to add text in the next line.
E.g. If this is the original file content,
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
gpgcheck=1
plugins=1
I want to change it as below:
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
This is what I tried to do:
with open('file1.txt') as f:
for line in input_data:
if line.strip() == 'exactarch':
f.write('obsoletes=1')
Obviously this is not working as I can't figure out how can I count and write to this line.

You ask for a Python solution. But tasks like this are made to be solved using simpler tools.
If you are using a system that has sed, you can do this in a simle one-liner:
$ sed '/exactarch/aobsoletes=1' < in.txt
What does this mean?
sed: the executable
/exactarch/: matches all lines that contain exactarch
a: after the current line, append a new line with the following text
obsoletes=1: the text to append in a new line
Output:
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
Edit:
To modify the file in place, use the option -i and the file as an argument:
$ sed -i '/exactarch/aobsoletes=1' in.txt

Simple - read all lines, find correct line and insert desired line after found. Dump result lines to file.
import os
with open('lines.txt') as f:
lines = f.readlines()
lines.insert(lines.index('exactarch=1\n') + 1, 'obsoletes=1\n')
with open('dst.txt', 'w') as f:
for l in lines:
f.write(l)

The past says it's pretty simple - replacing words in files is not a new thing.
If you want to replace a word, you can use the solution implemented there. In your context:
import fileinput
for line in fileinput.input(fileToSearch, inplace=True):
print(line.replace("exactarch", "exactarch\nobsoletes=1"), end='')

I am hesitant using fileinput, b/c if something goes wrong during the 'analysis' phase you are left with a file in whatever conditions it was left before the failure. I would read everything in, and then do full work on it. The code below ensures that:
Your inserted value contains a newline value '\n' if it's not going to be the last item.
Will not add duplicate inserted values by checking the one below it.
Iterates through all values incase multiple "exactarch=1"s were added since the snippet last ran.
Hope this helps, albeit not as stylish as a one/two liner.
with open('test.txt') as f:
data = f.readlines()
insertValue = 'obsoletes=1'
for item in data:
if item.rstrip() == 'exactarch=1': #find it if it's in the middle or the last line (ie. no '\n')
point = data.index(item)
if point+1 == len(data): #Will be inserted as new line since current exactarch=1 is in last position, so you don't want the '\n', right?
data.insert(point+1, instertValue)
else:
if data[point + 1].rstrip() != insertValue: #make sure the value isn't already below exactarch=1
data.insert(point+1, insertValue + '\n')
print('insertValue added below "exactarch=1"')
else:
print('insertValue already exists below exactarch=1')
with open('test.txt','w') as f:
f.writelines(data)

Comparing Two Text Files, Removing the duplicate lines, and Writing results to a new text file

I have two text files (that are not equal in number of lines/size). I would like to compare each line of the shorter text file with every line of the longer text file. As it compares, if there are any duplicate strings, I would like to have those removed. Lastly, I would like write the result to a new text file and print the contents.
Is there a simply script that can do this for me?
Any help would be much appreciated.
The text files are not very large. One has about 10 lines and the other has about 5. The code I have tried (that failed miserably) is below:
for line in file2:
line1 = line
for line in file1:
requested3 = file('request2.txt','a')
if fnmatch.fnmatch(line1,line):
line2 = line.replace(line,"")
requested3.write(line2)
if not fnmatch.fnmatch(line1,line):
requested3.write(line+'\n')
requested3.close()

with open(longfilename) as longfile, open(shortfilename) as shortfile, open(newfilename, 'w') as newfile:
newfile.writelines(line for line in shortfile if line not in set(longfile))
It's as simple as that. This will copy lines from shortfile to newfile, without having to keep them all in memory, if they also exist in longfile.
If you're on Python 2.6 or older, you would need to nest the with statements:
with open(longfilename) as longfile:
with open(shortfilename) as shortfile:
with open(newfilename, 'w') as newfile:
If you're on Python 2.5, you need to either:
from __future__ import with_statement
at the very top of your file, or just use
longfile = open(longfilename)
etc. and close each file yourself.
If you need to manipulate the lines, an explicit for loop is fine, the important part is set(). Looking up an item in a set is fast, looking up a line in a long list is slow.
longlines = set(line.strip_or_whatever() for line in longfile)
for line in shortfile:
if line not in longlines:
newfile.write(line)

Assuming the files are both plain text, each string is on a new line delimited with \n newline characters:
small_file = open('file1.txt','r')
long_file = open('file2.txt','r')
output_file = open('output_file.txt','w')
try:
small_lines = small_file.readlines()
long_lines = long_file.readlines()
small_lines_cleaned = [line.rstrip().lower() for line in small_lines]
long_file_lines = long_file.readlines()
long_lines_cleaned = [line.rstrip().lower() for line in long_lines]
for line in small_lines_cleaned:
if line not in long_lines_cleaned:
output_file.writelines(line + '\n')
finally:
small_file.close()
long_file.close()
output_file.close()
Explanation:
Since you can't get 'with' statements working, we open the files first using regular open functions, then use a try...finally clause to close them at the end of the program.
We take the small file and the long file and first remove any trailing '\n' (newline) characters with .rstrip(), then make all the characters lower-case with .lower(). If you have two sentences identical in every aspect except one has upper case letters and the other doesn't, they wont' match. Forcing them lower case avoids that; if you prefer a case-sensitive compare, remove the .lower() method.
We go line by line in small_lines_cleaned (for line in...) and see if it is in the larger file.
Output each line if it is not in the longer file; we add the '\n' newline character so that each line will appear on a new line, insteadOfOneGiantLongSetOfStrings

I'd use difflib, it makes it easy to do comparisons/diffs. There is a nice tutorial for it here. If you just wanted the lines that were unique to the shorter file:
from difflib import ndiff
short = open('short.txt').readlines()
long = open('long.txt').readlines()
with open('unique.txt', 'w') as f:
f.write(''.join(x[2:] for x in ndiff(short, long) if x.startswith('-')))

Your code as it stands checks each line against the line in the other file. But that's not what you want. For each line in the first file, you need to check whether any line in the other file matches and then print it out if there are no matches.

The following code reads file two and checks it against file one.Anything that's in file one but not in file two will get printed and also written to a new text file.
If you wanted to do the opposite, you'd just get rid of the "not" from if statement below. So it'd print anything that's in file one and in file two.
It works by putting the contents of the shorter file (file two) in a variable and then reading the longer file (file one) line by line. Each line is checked against the variable and then the line is either written or not written to the text file according to it's presence in the variable.
(Remember to remove the stars surrounding the not statement if you wish to use it, or removing the not statement all together if you want it to print the matching words.)
fileOne = open("LONG FILE.ext","r")
fileTwo = open("SHORT FILE.ext","r")
fileThree = open("Results.txt","a+")
contents = fileTwo.read()
numLines = sum(1 for line in fileOne)
for i in range (numLines):
if **not** fileOne.readline(i) in contents:
print (fileOne.readline(i))
fileThree.write (fileOne.readline(i))
fileOne.close()
fileTwo.close()
fileThree.close()

Delete newline / return carriage in file output

I have a wordlist that contains returns to separate each new letter. Is there a way to programatically delete each of these returns using file I/O in Python?
Edit: I know how to manipulate strings to delete returns. I want to physically edit the file so that those returns are deleted.
I'm looking for something like this:
wfile = open("wordlist.txt", "r+")
for line in wfile:
if len(line) == 0:
# note, the following is not real... this is what I'm aiming to achieve.
wfile.delete(line)

>>> string = "testing\n"
>>> string
'testing\n'
>>> string = string[:-1]
>>> string
'testing'
This basically says "chop off the last thing in the string" The : is the "slice" operator. It would be a good idea to read up on how it works as it is very useful.
EDIT
I just read your updated question. I think I understand now. You have a file, like this:
aqua:test$ cat wordlist.txt
Testing
This
Wordlist
With
Returns
Between
Lines
and you want to get rid of the empty lines. Instead of modifying the file while you're reading from it, create a new file that you can write the non-empty lines from the old file into, like so:
# script
rf = open("wordlist.txt")
wf = open("newwordlist.txt","w")
for line in rf:
newline = line.rstrip('\r\n')
wf.write(newline)
wf.write('\n') # remove to leave out line breaks
rf.close()
wf.close()
You should get:
aqua:test$ cat newwordlist.txt
Testing
This
Wordlist
With
Returns
Between
Lines
If you want something like
TestingThisWordlistWithReturnsBetweenLines
just comment out
wf.write('\n')

You can use a string's rstrip method to remove the newline characters from a string.
>>> 'something\n'.rstrip('\r\n')
>>> 'something'

The most efficient is to not specify a strip value
'\nsomething\n'.split() will strip all special characters and whitespace from the string

simply use, it solves the issue.
string.strip("\r\n")

Remove empty lines in the file:
#!/usr/bin/env python
import fileinput
for line in fileinput.input("wordlist.txt", inplace=True):
if line != '\n':
print line,
The file is moved to a backup file and standard output is directed to the input file.

'whatever\r\r\r\r\r\r\r\r\n\n\n\n\n'.translate(None, '\r\n')
returns
'whatever'

This is also a possible solution
file1 = open('myfile.txt','r')
conv_file = open("numfile.txt","w")
temp = file1.read().splitlines()
for element in temp:
conv_file.write(element)
file1.close()
conv_file.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I compare a word from a text file? - python

Related

Reading through a .m File and Python keeps reading a character in the .m File as a line?

Checking if string is in text file is not working

Add lines to text file after occurence of certain text

Comparing Two Text Files, Removing the duplicate lines, and Writing results to a new text file

Delete newline / return carriage in file output

Categories

Resources