I am trying to do a simple parsing on a text in python which I have no issues with in bash using tr '\n' ' '. Basically to get all of the lines on a single line. In python print line is a bit different from what I understand. re.sub cannot find my new line because it doesn't exist even though when I print to an output it does. Can someone explain how I can work around this issue in python?
Here is my code so far:
# -*- iso-8859-1 -*-
import re
def proc():
f= open('out.txt', 'r')
lines=f.readlines()
for line in lines:
line = line.strip()
if '[' in line:
line_1 = line
line_1_split = line_1.split(' ')[0]
line_2 = re.sub(r'\n',r' ', line_1_split)
print line_2
proc()
Edit: I know that "print line," will print without the newline. The issue is that I need to handle these lines both before and after doing operations line by line. My code in shell uses sed, awk and tr to do this.
You can write directly to stdout to avoid the automatic newline of print:
from sys import stdout
stdout.write("foo")
stdout.write("bar\n")
This will print foobar on a single line.
When you call the print statement, you automatically add a new line. Just add a comma:
print line_2,
And it will all print on the same line.
Mind you, if you're trying to get all lines of a file, and print them on a single line, there are more efficient ways to do this:
with open('out.txt', 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip()
# Some extra line formatting stuff goes here
print line, # Note the comma!
Alternatively, just join the lines on a string:
everything_on_one_line = ''.join(i.strip() for i in f.readlines())
print everything_on_one_line
Using with ensures you close the file after iteration.
Iterating saves memory and doesn't load the entire file.
rstrip() removes the newline in the end.
Combined:
with open('out.txt', 'r') as f:
for line in f:
print line.rstrip(),
Use replace() method.
file = open('out.txt', 'r')
data = file.read()
file.close()
data.replace('\n', '')
Related
I have a file in which each line contains a sentence. Some sentences are however empty, i.e. in this case there is just "\n" newline character on the line.
What I want to do is: if I find an empty sentence, I want to replace it with some symbol like .
If I replace "\n", it will be replaced at all places in the file.
However, I am not sure how to do this:
import sys
f = open(sys.argv[1], "wr")
for line in f:
if len(line.strip())==0:
line.replace("\n", "empty")
# Then write the line back on the file
f.write(line + "\n") # Will this replace the line in the file?
Is the above code correct? Can I simultaneously read the line and edit it too?
This is a quick way of solving the problem, but not the ideal way of doing it should you have memory constrictions.
f = open(sys.argv[1], "r")
lines = f.readlines()
f.close()
lines = ['empty' if i == '\n' else i for i in lines]
f = open(sys.argv[1], "w")
f.writelines(lines)
f.close()
Should you have memory restrictions, creating a function utilising yield would be the best way to go about this
Edit: I should also say unless there has been an update, I don't believe it is possible to overwrite a specific line in a file using python without re-writing the entire file.
I'm trying to open a file, and edit a specific line. When I concatenate a character onto one of the lines, it works, but inserts a new line. However I don't want a new line. Here is the code:
def moveCurlyInline(line, i):
with open('test.js', 'r') as inputFile:
data = inputFile.readlines()
print(data[0])
print(data[0] + ' {')
The print outputs:
function hello()
then:
function hello()
{
I need the curly bracket to be on the same line as the function hello. Any idea what's wrong with my code?
f.readline() reads a line from the file, including the newline at the end of the line.
Try stripping the extra newline:
data = [line.rstrip("\n") for line in inputFile]
You can strip new line character by
inputFile.read().striplines()
I have a simple question about Python.
I wrote a working script but when I execute it, it gives me an answer in one line, as a string.
What I am looking is an answer to be on two separate lines.
Here is a code:
Python code
def test():
fh=open('xxxxxxx.txt', 'r')
fo=open('output.txt', 'a')
for line in fh:
line=line.strip()
if(line.startswith('Total Sequences')):
fo.write(line)
fh.close()
fh2=open('xxxxxxx.txt', 'r')
fo2=open('output.txt', 'a')
for line in fh2:
line=line.strip()
if(line.startswith('Sequence length')):
fo2.write(line)
fh2.close()
print(test())
You are removing the newline characters "\n" from each line in the file with the statement:-
line=line.strip()
Just remove it and it should work correctly.
The input file: a.txt
aaaaaaaaaaaa
bbbbbbbbbbb
cccccccccccc
The python code:
with open("a.txt") as f:
for line in f:
print line
The problem:
aaaaaaaaaaaa
bbbbbbbbbbb
cccccccccccc
as you can see the output has extra line between each item.
How to prevent this?
print appends a newline, and the input lines already end with a newline.
A standard solution is to output the input lines verbatim:
import sys
with open("a.txt") as f:
for line in f:
sys.stdout.write(line)
PS: For Python 3 (or Python 2 with the print function), abarnert's print(…, end='') solution is the simplest one.
As the other answers explain, each line has a newline; when you print a bare string, it adds a line at the end. There are two ways around this; everything else is a variation on the same two ideas.
First, you can strip the newlines as you read them:
with open("a.txt") as f:
for line in f:
print line.rstrip()
This will strip any other trailing whitespace, like spaces or tabs, as well as the newline. Usually you don't care about this. If you do, you probably want to use universal newline mode, and strip off the newlines:
with open("a.txt", "rU") as f:
for line in f:
print line.rstrip('\n')
However, if you know the text file will be, say, a Windows-newline file, or a native-to-whichever-platform-I'm-running-on-right-now-newline file, you can strip the appropriate endings explicitly:
with open("a.txt") as f:
for line in f:
print line.rstrip('\r\n')
with open("a.txt") as f:
for line in f:
print line.rstrip(os.linesep)
The other way to do it is to leave the original newline, and just avoid printing an extra one. While you can do this by writing to sys.stdout with sys.stdout.write(line), you can also do it from print itself.
If you just add a comma to the end of the print statement, instead of printing a newline, it adds a "smart space". Exactly what that means is a bit tricky, but the idea is supposed to be that it adds a space when it should, and nothing when it shouldn't. Like most DWIM algorithms, it doesn't always get things right—but in this case, it does:
with open("a.txt") as f:
for line in f:
print line,
Of course we're now assuming that the file's newlines match your terminal's—if you try this with, say, classic Mac files on a Unix terminal, you'll end up with each line printing over the last one. Again, you can get around that by using universal newlines.
Anyway, you can avoid the DWIM magic of smart space by using the print function instead of the print statement. In Python 2.x, you get this by using a __future__ declaration:
from __future__ import print_function
with open("a.txt") as f:
for line in f:
print(line, end='')
Or you can use a third-party wrapper library like six, if you prefer.
What happens is that each line as a newline at the end, and print statement in python also adds a newline. You can strip the newlines:
with open("a.txt") as f:
for line in f:
print line.strip()
You could also try the splitlines() function, it strips automatically:
f = open('a.txt').read()
for l in f.splitlines():
print l
It is not adding a newline, but each scanned line from your file has a trailing one.
Try:
with open ("a.txt") as f:
for line in (x.rstrip ('\n') for x in f):
print line
How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.