Python: Prevent fileinput from adding newline characters - python

I am using a Python script to find and replace certain strings in text files of a given directory. I am using the fileinput module to ease the find-and-replace operation, i.e., the file is read, text replaced and written back to the same file.
The code looks as follows:
import fileinput
def fixFile(fileName):
# Open file for in-place replace
for line in fileinput.FileInput(fileName, inplace=1):
line = line.replace("findStr", "replaceStr")
print line # Put back line into file
The problem is that the written files have:
One blank line inserted after every line.
Ctrl-M character at the end of every line.
How do I prevent these extra appendages from getting inserted into the files?

Your newlines are coming from the print function
use:
import sys
sys.stdout.write ('some stuff')
and your line breaks will go away

Use
print line,
or
file.write(line)
to fix extra newlines.
As of [Ctrl]-[M] - that is probably caused by input files in DOS encoding.

Instead of this:
print line # Put back line into file
use this:
print line, # Put back line into file

Change the first line in your for loop to:
line = line.rstrip().replace("findStr", "replaceStr")

Due to every iteration print statement ends with newline, you are getting blank line between lines.
To overcome this problem, you can use strip along with print.
import fileinput
def fixFile(fileName):
for line in fileinput.FileInput(fileName, inplace=1):
line = line.replace("findStr", "replaceStr")
print line.strip()
Now, you can see blank lines are striped.

For the update on Python 3.4, you can just use:
print(line, end = '')
to avoid the insertion of a new line.

Related

Unable to read multiline files in python using readline()

The following code is not working properly. It is unable to read multiline files in python using readline().
myobject=open("myfile.txt",'r')
while ((myobject.readline())):
print(myobject.readline())
myobject.close()
It just prints the first line and then newlines. I don't understand why?
It's because readline reads one line at a time, your code will still print a new line because readline keeps trailing newlines.
The way to fix would be to do this:
with open("myfile.txt", 'r') as f:
for line in f:
print(line)
readline() returns the line that it is currently pointing to and moves to the next line. So, the calls to the function in the while condition and in the print statement are not the same. In fact, they are pointing to adjacent lines.
First, store the line in a temporary variable, then check and print.
myobject = open('myfile.txt')
while True:
line = myobject.readline()
if line:
print(line)
else:
break
When you open the file in 'r' mode, the file object returned points at the beginning of the file.
Everytime you call readline, a line is read, and the object now points to the next line in the file
Since your loop condition also reads the file and moves it to the next line, you are getting lines only at even places, like line no 2, 4, 6. Line Numbers, 1, 3, 5, ... will be read by while ((myobject.readline())): and discarded.
A simple solution will be
myobject = open("myfile.txt",'r')
for line in myobject:
print(line, end='')
myobject.close()
OR for your case, when you want to use only readline()
myobject = open("myfile.txt",'r')
while True:
x = myobject.readline()
if len(x) == 0:
break
print(x, end='')
myobject.close()
This code works, because readline behaves in the following way.
According to python documentation, https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

using readline() and then using it in write() with a message behind it will save to next line

When you readline() from a file and try to write it to another txt file with text behind it the text behind it always goes to the next line, is there any way to not let it go to the next line and just put it behind it?
example code:
file = open('directory/whatever/file.txt', 'r')
file2 = open('directory/whatever/file2.txt', 'a')
line = file.readline()
file2.write(line + 'Thiswillprintonthenextline')
the message behind it will print on the next line
that's what I want to prevent
file2.write(line.rstrip('\n') + 'Thiswillprintonthenextline')
That is because the readline() function creates a string with the newline character at the end.
If you want to print it without changing the lines, simply strip the last character:
file2.write(line.rstrip("\n") + 'Thiswillprintonthenextline')
You could also strip it immediately when reading, so you don't have to do it every time you want to print it...
Just replace the line
line = file.readline()
with the line
line = file.readline().rstrip("\n")
and you're done. ;)
(Thanks to bruno desthuilliers for suggesting rstrip.)

How to write to a file with newline characters and avoid empty lines

I'm trying to write encoded data to a file and separate each run with a newline character. However, when doing this there is an empty line between each run -- as shown below.
Using .rstrip()/.strip() only really works when reading the file -- and obviously this cannot be used directly when writing to the file as it would write all the data to a single line.
cFile = open('compFile', 'w')
for i in range(num_lines):
line = validLine()
compressedFile.write(line + "\n")
cFile.close()
cFile = open('compFile', 'r')
for line in cFile:
print(line)
# Empty space output:
023
034
045
# Desired output:
023
034
045
I think you already did what you want if you have a look at your text file.
Be aware, that python reads the \n at the end of your file too and that print() makes a newline at the end of the printed line.
In your case that means your file should look like
023\n
034\n
045\n
When printing, you at first read 023\n and then as python does with the print() function you append a \n to your line.
Then you have the 023\n\n you get in your console. But in the file you have what you want.
If you just want to print without linebreak, you can use
import sys
sys.stdout.write('.')
You could use
for i in range(num_lines):
line = validLine()
compressedFile.write(line.strip() + "\n")
# ^^^
cFile.close()
Off-topic but consider using with () additionally.
Using .rstrip()/.strip() only really works when reading the file -- and obviously this cannot be used directly when writing to the file as it would write all the data to a single line.
This is a misconception. Using .rstrip() is exactly the correct tool if you need to write a series of strings, some of which may have a newline character attached:
with open('compFile', 'w') as cFile:
for i in range(num_lines):
line = validLine().rstrip("\n") # remove possible newline
compressedFile.write(line + "\n")
Note that if all your lines already have a newline attached, you don't have to add more newlines. Just write the string directly to the file, no stripping needed:
with open('compFile', 'w') as cFile:
for i in range(num_lines):
line = validLine() # line with "\n" newline already present
compressedFile.write(line) # no need to add a newline anymore
Next, you are reading lines with newlines from your file and then printing them with print(). By default, print() adds another newline, so you end up with double-spaced lines; your input file contains 023\n034\n045\n, but printing each line ('023\n', then '034\n', then '045\n') adds a newline afterwards and you write out 023\n\n034\n\n045\n\n out to stdout.
Either strip that newline when printing, or tell print() to not add a newline of its own by giving it an empty end parameter:
with open('compFile', 'r') as cFile:
for line in cFile:
print(line, end='')

Function call doesnt work python

Im trying to print palindrome words in a file (where each line is a word) in python.
Thats what I have so far:
I have to work in Unix so I wrote my script in file palindrome.py as below:
#!/usr/bin/python
def isPalindrome(a):
if a == a[::-1]:
print a
with open ('fileName') as f:
for line in f:
isPalindrome(line)
When I run the file it doesn't print anything even-though there are palindrome words in my file. I think the problem is related to my function call because if instead of isPalindrome(line) I have isPalindrome('aha') it will print aha. I tried to print each line after the for loop but that works as well. It does print all the lines of the file. So line does get different values so I guess there might be something related to the call but I am failing to find out what.
You need to strip newlines from the end of your lines. Try call as isPalindrome(line.strip()).
Attention: file.readlines() does not wrap end line characters!!
so if in you file you have aha in one line, the line will be aha\n (with the new line char...)...
I suggest use of replace() string method.
Your code:
#!/usr/bin/python
def isPalindrome(a):
if a == a[::-1]:
print a
with open ('fileName') as f:
for line in f:
isPalindrome(line.replace('\n', '').replace("\r", "")) # replace carriage return / line feed chars

Comparing Two Text Files, Removing the duplicate lines, and Writing results to a new text file

I have two text files (that are not equal in number of lines/size). I would like to compare each line of the shorter text file with every line of the longer text file. As it compares, if there are any duplicate strings, I would like to have those removed. Lastly, I would like write the result to a new text file and print the contents.
Is there a simply script that can do this for me?
Any help would be much appreciated.
The text files are not very large. One has about 10 lines and the other has about 5. The code I have tried (that failed miserably) is below:
for line in file2:
line1 = line
for line in file1:
requested3 = file('request2.txt','a')
if fnmatch.fnmatch(line1,line):
line2 = line.replace(line,"")
requested3.write(line2)
if not fnmatch.fnmatch(line1,line):
requested3.write(line+'\n')
requested3.close()
with open(longfilename) as longfile, open(shortfilename) as shortfile, open(newfilename, 'w') as newfile:
newfile.writelines(line for line in shortfile if line not in set(longfile))
It's as simple as that. This will copy lines from shortfile to newfile, without having to keep them all in memory, if they also exist in longfile.
If you're on Python 2.6 or older, you would need to nest the with statements:
with open(longfilename) as longfile:
with open(shortfilename) as shortfile:
with open(newfilename, 'w') as newfile:
If you're on Python 2.5, you need to either:
from __future__ import with_statement
at the very top of your file, or just use
longfile = open(longfilename)
etc. and close each file yourself.
If you need to manipulate the lines, an explicit for loop is fine, the important part is set(). Looking up an item in a set is fast, looking up a line in a long list is slow.
longlines = set(line.strip_or_whatever() for line in longfile)
for line in shortfile:
if line not in longlines:
newfile.write(line)
Assuming the files are both plain text, each string is on a new line delimited with \n newline characters:
small_file = open('file1.txt','r')
long_file = open('file2.txt','r')
output_file = open('output_file.txt','w')
try:
small_lines = small_file.readlines()
long_lines = long_file.readlines()
small_lines_cleaned = [line.rstrip().lower() for line in small_lines]
long_file_lines = long_file.readlines()
long_lines_cleaned = [line.rstrip().lower() for line in long_lines]
for line in small_lines_cleaned:
if line not in long_lines_cleaned:
output_file.writelines(line + '\n')
finally:
small_file.close()
long_file.close()
output_file.close()
Explanation:
Since you can't get 'with' statements working, we open the files first using regular open functions, then use a try...finally clause to close them at the end of the program.
We take the small file and the long file and first remove any trailing '\n' (newline) characters with .rstrip(), then make all the characters lower-case with .lower(). If you have two sentences identical in every aspect except one has upper case letters and the other doesn't, they wont' match. Forcing them lower case avoids that; if you prefer a case-sensitive compare, remove the .lower() method.
We go line by line in small_lines_cleaned (for line in...) and see if it is in the larger file.
Output each line if it is not in the longer file; we add the '\n' newline character so that each line will appear on a new line, insteadOfOneGiantLongSetOfStrings
I'd use difflib, it makes it easy to do comparisons/diffs. There is a nice tutorial for it here. If you just wanted the lines that were unique to the shorter file:
from difflib import ndiff
short = open('short.txt').readlines()
long = open('long.txt').readlines()
with open('unique.txt', 'w') as f:
f.write(''.join(x[2:] for x in ndiff(short, long) if x.startswith('-')))
Your code as it stands checks each line against the line in the other file. But that's not what you want. For each line in the first file, you need to check whether any line in the other file matches and then print it out if there are no matches.
The following code reads file two and checks it against file one.Anything that's in file one but not in file two will get printed and also written to a new text file.
If you wanted to do the opposite, you'd just get rid of the "not" from if statement below. So it'd print anything that's in file one and in file two.
It works by putting the contents of the shorter file (file two) in a variable and then reading the longer file (file one) line by line. Each line is checked against the variable and then the line is either written or not written to the text file according to it's presence in the variable.
(Remember to remove the stars surrounding the not statement if you wish to use it, or removing the not statement all together if you want it to print the matching words.)
fileOne = open("LONG FILE.ext","r")
fileTwo = open("SHORT FILE.ext","r")
fileThree = open("Results.txt","a+")
contents = fileTwo.read()
numLines = sum(1 for line in fileOne)
for i in range (numLines):
if **not** fileOne.readline(i) in contents:
print (fileOne.readline(i))
fileThree.write (fileOne.readline(i))
fileOne.close()
fileTwo.close()
fileThree.close()

Categories