I've scraped some comments from a webpage using selenium and saved them to a text file. Now I would like to perform multiple edits to the text file and save it again. I've tried to group the following into one smooth flow but I'm fairly new to python so I just couldn't get it right. Examples of what happened to me at the bottom. The only way I could get it to work is to open and close the file over and over.
These are the action I want to perform in the order the need to:
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace("a sample text line", ' '))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
pattern = r'\d in \d example text'
for line in lines:
f.write(re.sub(pattern, "", line))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open('results.txt','w') as file:
for line in lines:
if not line.isspace():
file.write(line)
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace(" ", '-'))
I've tried to loop them into one but I get doubled lines, words, or extra spaces.
Any help is appreciated, thank you.
If you want to do these in one smooth pass, you better open another file to write the desired results i.e.
import re
pattern = r"\d in \d example text"
# Open your results file for reading and another one for writing
with open("results.txt", "r") as fh_in, open("output.txt", "w") as fh_out:
for line in fh_in:
# Process the line
line = line.replace("a sample text line", " ")
line = re.sub(pattern, "", line)
if line.isspace():
continue
line = line.replace(" ", "-")
# Write out
fh_out.write(line)
We process each line in order you described and the resultant line goes to output file.
Related
so I have a txt file that I am required to add a phrase at every end of the line.
Note that the phrase is the same added on every line
soo what I need is
here are some words
some words are also here
vlavlavlavlavl
blaaablaabalbaaa
before
here are some words, the end
some words are also here, the end
vlavlavlavlavl, the end
blaaablaabalbaaa, the end
after
i also tried this method
with open("Extracts.txt", encoding="utf-8") as f:
for line in f:
data = [line for line in f]
with open("new.txt", 'w', encoding="utf-8") as f:
for line in data:
f.write(", Deposited")
f.write(line)
but the word was shown at the beginning of the line and not the end.
line ends with a newline. Remove the newline, write the line and the addition, followed by a newline.
There's also no need to read the lines into a list first, you can just iterate over the input file directly.
with open("Extracts.txt", encoding="utf-8") as infile, open("new.txt", 'w', encoding="utf-8") as outfile:
for line in infile:
line = line.rstrip("\n")
outfile.write(f"{line}, Deposited\n")
You can first get all the lines in the text file using the readlines method, and then add the line you want to.
with open("Extracts.txt", encoding="utf-8") as f:
data = f.readlines()
new_data = []
for line in data:
line = line.replace("\n", "")
line += " , Deposited\n"
new_data.append(line)
with open("new.txt", "w", encoding="utf-8") as f:
f.writelines(new_data)
As mkrieger1 already said, the order of operations here is wrong. You are writing the ", Deposited" to the file before you're writing the content of the line in question. So a working version of the code swaps those operations:
with open("Extracts.txt", encoding="utf-8") as f:
for line in f:
data = [line for line in f]
with open("new.txt", 'w', encoding="utf-8") as f:
for line in data:
f.write(line.strip())
f.write(", Deposited\n")
Note that I also added a strip() function to handling the line of text, this removes whitespaces at the start and end of the string to get rid of any extra line changes before the ", Deposited". Then the line change was manually added to the end of the string as a string literal "\n".
I'm trying to remove one line which matches a variable. But instead it is wiping the file clean.
a_file = open("./Variables/TxtFile.txt", "r")
lines = a_file.readlines()
a_file.close()
new_file = open("./Variables/TxtFile.txt", "w")
for line in lines:
if line.strip("\n") == VariableStore:
new_file.write(line)
new_file.close()
The goal would be to remove the line that matches VariableStore rather than wiping the entire text file
In regard to my comment to your original post.
You only write to the file if you match the line you want to remove and then also close the file.
This seems not to be what you want.
You might want to change the if condition to be executed in cases that do not match your line you want to remove, i.e., to if not line.strip("\n") == VariableStore: and close the file after your loop, i.e., on the same level as your for loop.
Try the following, which incorporates these suggestions:
a_file = open("./Variables/TxtFile.txt", "r")
lines = a_file.readlines()
a_file.close()
new_file = open("./Variables/TxtFile.txt", "w")
for line in lines:
if not line.strip("\n") == VariableStore:
new_file.write(line)
new_file.close()
If your aim is to filter out the line matching VariableStore, do this:
with open("./Variables/TxtFile.txt", "r") as a_file:
lines = a_file.readlines()
with open("./Variables/TxtFile.txt", "w") as new_file:
for line in lines:
if line.strip("\n") != VariableStore:
continue # Skip the VariableStore line
new_file.write(line) # Write other lines
When you use with statements, you don't need to manually close the file.
You just need to close the file later on, when you are done parsing all the lines.
Also, you need to write the lines that don't match, not the one's that do.
Note the changes below:
# Read file
a_file = open("./Variables/TxtFile.txt", "r")
lines = a_file.readlines()
a_file.close()
# Write file
new_file = open("./Variables/TxtFile.txt", "w")
for line in lines:
if line.strip("\n") == VariableStore:
# Don't write this line
pass
else:
new_file.write(line)
new_file.close()
Let us assume that your text file TxtFile.txt contains this text
Hello
World
I'm
Python
Developer
And you have a variable var contains the string World which we want to remove from the text file.
Here is a python code does the job in few lines
var='World' # a string to remove
with open("TxtFile.txt","r+") as f:
lines = f.readlines()
lines = [line for line in lines if line.strip()!=var]
f.seek(0)
f.writelines(lines)
f.truncate()
The text file after running this code..
Hello
I'm
Python
Developer
The problem is that you're opening the file with write mode instead of append mode. Replace
new_file = open("./Variables/TxtFile.txt", "w")
with
new_file = open("./Variables/TxtFile.txt", "a")
and you'll append the data instead of overwriting it.
Also, it's generally recommended to open files using the 'with' statement, since that automatically closes the file for you.
with open("./Variables/TxtFile.txt", "a") as text_file:
...
I am trying to do what for many will be a very straight forward thing but for me is just infuriatingly difficult.
I am trying search for a line in a file that contains certain words or phrases and modify that line...that's it.
I have been through the forum and suggested similar questions and have found many hints but none do just quite what I want or are beyond my current ability to grasp.
This is the test file:
# 1st_word 2nd_word
# 3rd_word 4th_word
And this is my script so far:
############################################################
file = 'C:\lpthw\\text'
f1 = open(file, "r+")
f2 = open(file, "r+")
############################################################
def wrline():
lines = f1.readlines()
for line in lines:
if "1st_word" in line and "2nd_word" in line:
#f2.write(line.replace('#\t', '\t'))
f2.write((line.replace('#\t', '\t')).rstrip())
f1.seek(0)
wrline()
My problem is that the below inserts a \n after the line every time and adds a blank line to the file.
f2.write(line.replace('#\t', '\t'))
The file becomes:
1st_word 2nd_word
# 3rd_word 4th_word
An extra blank line between the lines of text.
If I use the following:
f2.write((line.replace('#\t', '\t')).rstrip())
I get this:
1st_word 2nd_wordd
# 3rd_word 4th_word
No new blank line inserted but and extra "d" at the end instead.
What am I doing wrong?
Thanks
Your blank line is coming from the original blank line in the file. Writing a line with nothing in it writes a newline to the file. Instead of not putting anything into the written line, you have to completely skip the iteration, so it does not write that newline. Here's what I suggest:
def wrline():
lines = open('file.txt', 'r').readlines()
f2 = open('file.txt', 'w')
for line in lines:
if '1st_word' in line and '2nd_word' in line:
f2.write((line.replace('# ', ' ')).rstrip('\n'))
else:
if line != '\n':
f2.write(line)
f2.close()
I would keep read and write operations separate.
#read
with open(file, 'r') as f:
lines = f.readlines()
#parse, change and write back
with open(file, 'w') as f:
for line in lines:
if line.startswith('#\t'):
line = line[1:]
f.write(line)
You have not closed the files and there is no need for the \t
Also get rid of the rstrip()
Read in the file, replace the data and write it back.. open and close each time.
fn = 'example.txt'
new_data = []
# Read in the file
with open(fn, 'r+') as file:
filedata = file.readlines()
# Replace the target string
for line in filedata:
if "1st_word" in line and "2nd_word" in line:
line = line.replace('#', '')
new_data.append(line)
# Write the file out again
with open(fn, 'w+') as file:
for line in new_data:
file.write(line)
I have a large 11 GB .txt file with email addresses. I would like to save only the strings till the # symbol among each other. My output only generate the first line.I have used this code of a earlier project. I would like to save the output in a different .txt file. I hope someone could help me out.
my code:
import re
def get_html_string(file,start_string,end_string):
answer="nothing"
with open(file, 'rb') as open_file:
for line in open_file:
line = line.rstrip()
if re.search(start_string, line) :
answer=line
break
start=answer.find(start_string)+len(start_string)
end=answer.find(end_string)
#print(start,end,answer)
return answer[start:end]
beginstr=''
end='#'
file='test.txt'
readstring=str(get_html_string(file,beginstr,end))
print readstring
Your file is quite big (11G) so you shouldn't keep all those strings in memory. Instead, process the file line by line and write the result before reading next line.
This should be efficient :
with open('test.txt', 'r') as input_file:
with open('result.txt', 'w') as output_file:
for line in input_file:
prefix = line.split('#')[0]
output_file.write(prefix + '\n')
If your file looks like this example:
user#google.com
user2#jshds.com
Useruser#jsnl.com
You can use this:
def get_email_name(file_name):
with open(file_name) as file:
lines = file.readlines()
result = list()
for line in lines:
result.append(line.split('#')[0])
return result
get_email_name('emails.txt')
Out:
['user', 'user2', 'Useruser']
I've managed to create data extraction and output to file,but how add more events to file?
have this code:
f = open('input.csv', "r")
f2 = open('output.txt', "w+")
lines = f.readlines()
for line in lines:
words = line.split("|")
f2.writelines(words[5]+"|"+"\n")
f.close()
f2.close()
(need to check and remove blanks in input file before processing to output, need to check for duplicates before processing to output, need to remove certain matching lines before processing to output)
I have input file:
hello|one|good|bad|weird|man|world|
hello|one|good|bad|weird|man|world|
hi|jungle|12345|present|small|ladie|world|
I need output file:
man|
ladie|
This will remove blank lines and duplicates before writing it to the output file. What kind of matching lines you want to remove?
f = open('input.csv', "r")
f2 = open('output.txt', "w+")
lines = f.readlines()
lines = set(lines)
for line in lines:
if line != '\n' and line != "|||REMOVE ME|||\n":
f2.writelines(line)
f.close()
f2.close()