How to remove whitespace and comments from a text file?

How to remove whitespace and comments from a text file? - python

I want to read a text file that contains Python source code and remove comments and extra whitespace from it.
file.txt (source code file)
#Pythonprogramtofindthefactorialofanumberprovidedbytheuser.
num= 7
factorial=1
ifnum<0:
print("Sorry,factorialdoesnotexistfornegativenumbers")
elifn um==0:
print("Thefactorialof0is1")
else:
foriinrange(1,num+1):
factorial=factorial*i
print("Thefactorialof", num," is", factorial)
I have tried reading the file and using a list comprehension to filter the lines, but it is not working to remove comments and some whitespace is being removed that I want to keep.
with open('file.txt', 'r') as file:
lines = file.readlines()
lines = [line.replace(' ', '') for line in lines]
with open('file.txt', 'w') as file:
file.writelines(lines)

To remove blank lines and trailing whitespace as well as comments, you could use:
import re
with open("file.txt", "r") as file:
for line in file:
line = line.rstrip()
if line:
if not re.match(r'\s*#', line):
file.write(line)
Output
num= 7
factorial=1
ifnum<0:
print("Sorry,factorialdoesnotexistfornegativenumbers")
elifn um==0:
print("Thefactorialof0is1")
else:
foriinrange(1,num+1):
factorial=factorial*i
print("Thefactorialof", num," is", factorial)

Related

Multiple str edits to a single .txt file python

I've scraped some comments from a webpage using selenium and saved them to a text file. Now I would like to perform multiple edits to the text file and save it again. I've tried to group the following into one smooth flow but I'm fairly new to python so I just couldn't get it right. Examples of what happened to me at the bottom. The only way I could get it to work is to open and close the file over and over.
These are the action I want to perform in the order the need to:
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace("a sample text line", ' '))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
pattern = r'\d in \d example text'
for line in lines:
f.write(re.sub(pattern, "", line))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open('results.txt','w') as file:
for line in lines:
if not line.isspace():
file.write(line)
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace(" ", '-'))
I've tried to loop them into one but I get doubled lines, words, or extra spaces.
Any help is appreciated, thank you.

If you want to do these in one smooth pass, you better open another file to write the desired results i.e.
import re
pattern = r"\d in \d example text"
# Open your results file for reading and another one for writing
with open("results.txt", "r") as fh_in, open("output.txt", "w") as fh_out:
for line in fh_in:
# Process the line
line = line.replace("a sample text line", " ")
line = re.sub(pattern, "", line)
if line.isspace():
continue
line = line.replace(" ", "-")
# Write out
fh_out.write(line)
We process each line in order you described and the resultant line goes to output file.

Python: reading a file and excluding lines with certain characters

I am trying to figure out how to write a function that opens a file and reads it, however I need it to ignore any lines that contain the character '-'
This is what I have so far:
def read_from_file(filename):
with open('filename', 'r') as file:
content = file.readlines()
Any help would be appreciated

Filter out character '-'-containing lines from your read-in lines:
filtered_lines = [x for x in content if '-' not in x]

I'd filter out while reading the file, not collect the unwanted lines in the first place.
def read_from_file(filename):
with open(filename) as file:
content = [line for line in file if '-' not in line]
Also note that the 'filename' in your open('filename', 'r') is wrong and that the 'r' is unnecessary, so I fixed/removed that.

Gwang-Jin Kim and Heap Overflow answers are both 100% right, but, I always feel that using the tools that Python give you to be a plus one, so here is a solution using the built-in filter() function:
list(filter(lambda line: "-" not in line, file.splitlines()))
def read_from_file(filename):
with open(filename, "r") as file:
content = filter(lambda line: "-" not in line, file.readlines())
return list(content)
Here is a more verbose, yet more efficient solution:
def read_from_file(filename):
content = []
with open(filename, "r") as file:
for line in file:
if "-" not in line:
content.append(line)
return content

Python Make newline after character

I would like to make a newline after a dot in a file.
For example:
Hello. I am damn cool. Lol
Output:
Hello.
I am damn cool.
Lol
I tried it like that, but somehow it's not working:
f2 = open(path, "w+")
for line in f2.readlines():
f2.write("\n".join(line))
f2.close()
Could your help me there?
I want not just a newline, I want a newline after every dot in a single file. It should iterate through the whole file and make newlines after every single dot.
Thank you in advance!

This should be enough to do the trick:
with open('file.txt', 'r') as f:
contents = f.read()
with open('file.txt', 'w') as f:
f.write(contents.replace('. ', '.\n'))

You could split your string based on . and store in a list, then just print out the list.
s = 'Hello. I am damn cool. Lol'
lines = s.split('.')
for line in lines:
print(line)
If you do this, the output will be:
Hello
I am damn cool
Lol
To remove leading spaces, you could split based on . (with a space), or else use lstrip() when printing.
So, to do this for a file:
# open file for reading
with open('file.txt') as fr:
# get the text in the file
text = fr.read()
# split up the file into lines based on '.'
lines = text.split('.')
# open the file for writing
with open('file.txt', 'w') as fw:
# loop over each line
for line in lines:
# remove leading whitespace, and write to the file with a newline
fw.write(line.lstrip() + '\n')

how can i convert surname:name to name:surname? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?

You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.

temp = open(filename,'r').read().split('\n')

Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).

I think this is the best option.
temp = [line.strip() for line in file.readlines()]

temp = open(filename,'r').read().splitlines()

My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text

Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)

To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']

You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]

my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()

This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data

import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

How to completely remove "\n" in text file using python

So the text file I have is formatted something like this:
a
b
c
I know how to strip() and rstrip() but I want to get rid of the empty lines.
I want to make it shorter like this:
a
b
c

You could remove all blank lines (lines that contain only whitespace) from stdin and/or files given at the command line using fileinput module:
#!/usr/bin/env python
import sys
import fileinput
for line in fileinput.input(inplace=True):
if line.strip(): # preserve non-blank lines
sys.stdout.write(line)

You can use regular expressions :
import re
txt = """a
b
c"""
print re.sub(r'\n+', '\n', txt) # replace one or more consecutive \n by a single one
However, lines with spaces won't be removed. A better solution is :
re.sub(r'(\n[ \t]*)+', '\n', txt)
This way, wou will also remove leading spaces.

Simply remove any line that only equals "\n":
in_filename = 'in_example.txt'
out_filename = 'out_example.txt'
with open(in_filename) as infile, open(out_filename, "w") as outfile:
for line in infile.readlines():
if line != "\n":
outfile.write(line)
If you want to simply update the same file, close and reopen it to overwrite it with the new data:
filename = 'in_example.txt'
filedata = ""
with open(filename, "r") as infile:
for line in infile.readlines():
if line != "\n":
filedata += line
with open(filename, "w") as outfile:
outfile.write(filedata)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to remove whitespace and comments from a text file? - python

Related

Multiple str edits to a single .txt file python

Python: reading a file and excluding lines with certain characters

Python Make newline after character

how can i convert surname:name to name:surname? [duplicate]

How to completely remove "\n" in text file using python

Categories

Resources