I wish to create a function that will search each line of a input file, search through each line of this file looking for a particular string sequence and if it finds it, delete the whole line from the input file and output this line into a newly created text file with a similar format.
The input files format is always like so:
Firstname:DOB
Firstname:DOB
Firstname:DOB
Firstname:DOB
etc...
I want it so this file is input, then search for the DOB (19111991) and if it finds this string in the line then delete it from the input file and finally dump it into a new .txt document .
I'm pretty clueless if I'm being honest but I guess this would be my logically attempt even though some of the code may be wrong:
def snipper(iFile)
with open(iFile, "a") as iFile:
lines = iFile.readlines()
for line in lines:
string = line.split(':')
if string[1] == "19111991":
iFile.strip_line()
with open("newfile.txt", "w") as oFile:
iFile.write(string[0] + ':' + '19 November' + '\n')
Any help would be great.
Try this code instead:
def snipper(filename)
with open(filename, "r") as f:
lines = f.readlines()
new_data = filter(lambda x: "19111991" in x, lines)
remaining_old_data = filter(lambda x: "19111991" not in x, lines)
with open("newfile.txt", "w") as oFile:
for line in new_data:
oFile.write(line.replace("19111991", "19th November 1991'"))
with open(filename, "w") as iFile:
for line in remaining_old_data:
iFile.write(line)
Related
here is what I got txt and open
txt file looks like
f = open('data.txt', 'r')
print(f.read())
the show['Cat\n','Dog\n','Cat\n','Dog\n'........]
output
But I would like to get this
['C\n','D\n','C\n','D\n'........]
First you'll want to open the file in read mode (r flag in open), then you can iterate through the file object with a for loop to read each line one at a time. Lastly, you want to access the first element of each line at index 0 to get the first letter.
first_letters = []
with open('data.txt', 'r') as f:
for line in f:
first_letters.append(line[0])
print(first_letters)
If you want to have the newline character still present in the string you can modify line 5 from above to:
first_letters.append(line[0] + '\n')
f = open("data.txt", "r")
for x in f:
print(x[0])
f.close()
I've scraped some comments from a webpage using selenium and saved them to a text file. Now I would like to perform multiple edits to the text file and save it again. I've tried to group the following into one smooth flow but I'm fairly new to python so I just couldn't get it right. Examples of what happened to me at the bottom. The only way I could get it to work is to open and close the file over and over.
These are the action I want to perform in the order the need to:
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace("a sample text line", ' '))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
pattern = r'\d in \d example text'
for line in lines:
f.write(re.sub(pattern, "", line))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open('results.txt','w') as file:
for line in lines:
if not line.isspace():
file.write(line)
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace(" ", '-'))
I've tried to loop them into one but I get doubled lines, words, or extra spaces.
Any help is appreciated, thank you.
If you want to do these in one smooth pass, you better open another file to write the desired results i.e.
import re
pattern = r"\d in \d example text"
# Open your results file for reading and another one for writing
with open("results.txt", "r") as fh_in, open("output.txt", "w") as fh_out:
for line in fh_in:
# Process the line
line = line.replace("a sample text line", " ")
line = re.sub(pattern, "", line)
if line.isspace():
continue
line = line.replace(" ", "-")
# Write out
fh_out.write(line)
We process each line in order you described and the resultant line goes to output file.
I am trying to do what for many will be a very straight forward thing but for me is just infuriatingly difficult.
I am trying search for a line in a file that contains certain words or phrases and modify that line...that's it.
I have been through the forum and suggested similar questions and have found many hints but none do just quite what I want or are beyond my current ability to grasp.
This is the test file:
# 1st_word 2nd_word
# 3rd_word 4th_word
And this is my script so far:
############################################################
file = 'C:\lpthw\\text'
f1 = open(file, "r+")
f2 = open(file, "r+")
############################################################
def wrline():
lines = f1.readlines()
for line in lines:
if "1st_word" in line and "2nd_word" in line:
#f2.write(line.replace('#\t', '\t'))
f2.write((line.replace('#\t', '\t')).rstrip())
f1.seek(0)
wrline()
My problem is that the below inserts a \n after the line every time and adds a blank line to the file.
f2.write(line.replace('#\t', '\t'))
The file becomes:
1st_word 2nd_word
# 3rd_word 4th_word
An extra blank line between the lines of text.
If I use the following:
f2.write((line.replace('#\t', '\t')).rstrip())
I get this:
1st_word 2nd_wordd
# 3rd_word 4th_word
No new blank line inserted but and extra "d" at the end instead.
What am I doing wrong?
Thanks
Your blank line is coming from the original blank line in the file. Writing a line with nothing in it writes a newline to the file. Instead of not putting anything into the written line, you have to completely skip the iteration, so it does not write that newline. Here's what I suggest:
def wrline():
lines = open('file.txt', 'r').readlines()
f2 = open('file.txt', 'w')
for line in lines:
if '1st_word' in line and '2nd_word' in line:
f2.write((line.replace('# ', ' ')).rstrip('\n'))
else:
if line != '\n':
f2.write(line)
f2.close()
I would keep read and write operations separate.
#read
with open(file, 'r') as f:
lines = f.readlines()
#parse, change and write back
with open(file, 'w') as f:
for line in lines:
if line.startswith('#\t'):
line = line[1:]
f.write(line)
You have not closed the files and there is no need for the \t
Also get rid of the rstrip()
Read in the file, replace the data and write it back.. open and close each time.
fn = 'example.txt'
new_data = []
# Read in the file
with open(fn, 'r+') as file:
filedata = file.readlines()
# Replace the target string
for line in filedata:
if "1st_word" in line and "2nd_word" in line:
line = line.replace('#', '')
new_data.append(line)
# Write the file out again
with open(fn, 'w+') as file:
for line in new_data:
file.write(line)
I have a large 11 GB .txt file with email addresses. I would like to save only the strings till the # symbol among each other. My output only generate the first line.I have used this code of a earlier project. I would like to save the output in a different .txt file. I hope someone could help me out.
my code:
import re
def get_html_string(file,start_string,end_string):
answer="nothing"
with open(file, 'rb') as open_file:
for line in open_file:
line = line.rstrip()
if re.search(start_string, line) :
answer=line
break
start=answer.find(start_string)+len(start_string)
end=answer.find(end_string)
#print(start,end,answer)
return answer[start:end]
beginstr=''
end='#'
file='test.txt'
readstring=str(get_html_string(file,beginstr,end))
print readstring
Your file is quite big (11G) so you shouldn't keep all those strings in memory. Instead, process the file line by line and write the result before reading next line.
This should be efficient :
with open('test.txt', 'r') as input_file:
with open('result.txt', 'w') as output_file:
for line in input_file:
prefix = line.split('#')[0]
output_file.write(prefix + '\n')
If your file looks like this example:
user#google.com
user2#jshds.com
Useruser#jsnl.com
You can use this:
def get_email_name(file_name):
with open(file_name) as file:
lines = file.readlines()
result = list()
for line in lines:
result.append(line.split('#')[0])
return result
get_email_name('emails.txt')
Out:
['user', 'user2', 'Useruser']
Say I have a file my_file, and I want to search for a certain word x on every line of the file, and if the word exists, attach my variable y to the left and right side of the word. Then I want replace the old line with the new, modified line in my_new_file. How do I do this? So far I have:
output = open(omy_new_file, "w")
for line in open(my_file):
if (" " + x + "") in line:
You can try this:
y = "someword"
x = "target_string"
lines = [i.strip('\n') for i in open('filename.txt')]
final_lines = ["{}{}{}".format(y, i, y) if x in i else i for i in lines]
f = open(omy_new_file, "w")
for i in final_lines:
f.write("{}\n".format(i))
f.close()
with open('inputfile.txt', 'r') as infile:
with open('outfile.txt', 'w') as outfile:
for line in infile.readlines():
outfile.write(line.replace('string', y + 'string' + y)
Try This:
with open("my_file", "r") as my_file:
raw_data = my_file.read()
# READ YOUR FILE
new_data = raw_data.split("\n")
for line in new_data:
if "sd" in line:
my_new_line = "y" + line + "y"
raw_data = raw_data.replace(line, my_new_line)
print(raw_data)
It's tough to replace a line in a file while reading it, for the same reason that it's tough to safely modify a list as you iterate over it.
It's much better to read through the file, collect a list of lines, then overwrite the original. If the file is particularly large (such that it would be infeasible to hold it all in memory at once), you can write to disk twice.
import tempfile
y = "***"
your_word = "Whatever you're filtering by"
with tempfile.TemporaryFile(mode="w+") as tmpf:
with open(my_file, 'r') as f:
for line in f:
if your_word in line:
line = f"{y}{line.strip()}{y}\n"
tmpf.write(line) # write to the temp file
tmpf.seek(0) # move back to the beginning of the tempfile
with open(my_file, 'w') as f:
for line in tmpf: # reading from tempfile now
my_file.write(line)