How to multiple functions with data file - python

I've managed to create data extraction and output to file,but how add more events to file?
have this code:
f = open('input.csv', "r")
f2 = open('output.txt', "w+")
lines = f.readlines()
for line in lines:
words = line.split("|")
f2.writelines(words[5]+"|"+"\n")
f.close()
f2.close()
(need to check and remove blanks in input file before processing to output, need to check for duplicates before processing to output, need to remove certain matching lines before processing to output)
I have input file:
hello|one|good|bad|weird|man|world|
hello|one|good|bad|weird|man|world|
hi|jungle|12345|present|small|ladie|world|
I need output file:
man|
ladie|

This will remove blank lines and duplicates before writing it to the output file. What kind of matching lines you want to remove?
f = open('input.csv', "r")
f2 = open('output.txt', "w+")
lines = f.readlines()
lines = set(lines)
for line in lines:
if line != '\n' and line != "|||REMOVE ME|||\n":
f2.writelines(line)
f.close()
f2.close()

Related

Slice a given txtfile and write only part of it in a newfile in python

This is my original .txt data:
HKEY_CURRENT_USER\SOFTWARE\7-Zip
HKEY_CURRENT_USER\SOFTWARE\AppDataLow
HKEY_CURRENT_USER\SOFTWARE\Chromium
HKEY_CURRENT_USER\SOFTWARE\Clients
HKEY_CURRENT_USER\SOFTWARE\CodeBlocks
HKEY_CURRENT_USER\SOFTWARE\Discord
HKEY_CURRENT_USER\SOFTWARE\Dropbox
HKEY_CURRENT_USER\SOFTWARE\DropboxUpdate
HKEY_CURRENT_USER\SOFTWARE\ej-technologies
HKEY_CURRENT_USER\SOFTWARE\Evernote
HKEY_CURRENT_USER\SOFTWARE\GNU
And I need to have a new file where the new lines contain only part of those strings, like:
7-Zip
AppDataLow
Chromium
Clients
...
how to do it in python?
Try this:
## read file content as string
with open("file.txt", "r") as file:
string = file.read()
## convert each line to list
lines = string.split("\n")
## write only last part after "\" in each line
with open("new.txt", "w") as file:
for line in lines:
file.write(line.split("\\")[-1] + "\n")
One approach would be to read the entire text file into a Python string. Then use split on each line to find the final path component.
with open('file.txt', 'r') as file:
data = file.read()
lines = re.split(r'\r?\n', data)
output = [x.split("\\")[-1] for x in lines]
# write to file if desired
text = '\n'.join(output)
f_out = open('output.txt', 'w')
f_out.write(text)
f_out.close()

Flipping the contents of a file from bottom to top except for the first line

I'm trying to flip the lines upside down except the first line:
example:
Header
first
second
third
The output with the code below is:
third
second
first
Header
What I'm trying to do is:
Header
third
second
first
Code is below:
with open('file.txt', 'r') as f:
lines = f.readlines()
with open('output.txt', 'w+') as f:
for l in reversed(lines):
f.write(l)
This seems to do what you're after:
with open('file.txt', 'r') as f:
lines = f.readlines()
firstline = lines[0] # get the first line from the file
lines.pop(0) # remove the first line from the lines list, since it's stored separately
with open('output.txt', 'w+') as f:
f.write(firstline) # write the first line to the file
for l in reversed(lines): # write the rest of the lines to the file
f.write(l)
It outputs in output.txt:
Header
third
second
first
with open('file.txt', 'r') as f:
lines = f.readlines()
with open('output.txt', 'w+') as f:
f.write(l[0])
for l in reversed(lines[1:]):
f.write(l)

Multiple str edits to a single .txt file python

I've scraped some comments from a webpage using selenium and saved them to a text file. Now I would like to perform multiple edits to the text file and save it again. I've tried to group the following into one smooth flow but I'm fairly new to python so I just couldn't get it right. Examples of what happened to me at the bottom. The only way I could get it to work is to open and close the file over and over.
These are the action I want to perform in the order the need to:
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace("a sample text line", ' '))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
pattern = r'\d in \d example text'
for line in lines:
f.write(re.sub(pattern, "", line))
with open('results.txt', 'r') as f:
lines = f.readlines()
with open('results.txt','w') as file:
for line in lines:
if not line.isspace():
file.write(line)
with open('results.txt', 'r') as f:
lines = f.readlines()
with open("results.txt", "w") as f:
for line in lines:
f.write(line.replace(" ", '-'))
I've tried to loop them into one but I get doubled lines, words, or extra spaces.
Any help is appreciated, thank you.
If you want to do these in one smooth pass, you better open another file to write the desired results i.e.
import re
pattern = r"\d in \d example text"
# Open your results file for reading and another one for writing
with open("results.txt", "r") as fh_in, open("output.txt", "w") as fh_out:
for line in fh_in:
# Process the line
line = line.replace("a sample text line", " ")
line = re.sub(pattern, "", line)
if line.isspace():
continue
line = line.replace(" ", "-")
# Write out
fh_out.write(line)
We process each line in order you described and the resultant line goes to output file.

How to modify a line in a file using Python

I am trying to do what for many will be a very straight forward thing but for me is just infuriatingly difficult.
I am trying search for a line in a file that contains certain words or phrases and modify that line...that's it.
I have been through the forum and suggested similar questions and have found many hints but none do just quite what I want or are beyond my current ability to grasp.
This is the test file:
# 1st_word 2nd_word
# 3rd_word 4th_word
And this is my script so far:
############################################################
file = 'C:\lpthw\\text'
f1 = open(file, "r+")
f2 = open(file, "r+")
############################################################
def wrline():
lines = f1.readlines()
for line in lines:
if "1st_word" in line and "2nd_word" in line:
#f2.write(line.replace('#\t', '\t'))
f2.write((line.replace('#\t', '\t')).rstrip())
f1.seek(0)
wrline()
My problem is that the below inserts a \n after the line every time and adds a blank line to the file.
f2.write(line.replace('#\t', '\t'))
The file becomes:
1st_word 2nd_word
# 3rd_word 4th_word
An extra blank line between the lines of text.
If I use the following:
f2.write((line.replace('#\t', '\t')).rstrip())
I get this:
1st_word 2nd_wordd
# 3rd_word 4th_word
No new blank line inserted but and extra "d" at the end instead.
What am I doing wrong?
Thanks
Your blank line is coming from the original blank line in the file. Writing a line with nothing in it writes a newline to the file. Instead of not putting anything into the written line, you have to completely skip the iteration, so it does not write that newline. Here's what I suggest:
def wrline():
lines = open('file.txt', 'r').readlines()
f2 = open('file.txt', 'w')
for line in lines:
if '1st_word' in line and '2nd_word' in line:
f2.write((line.replace('# ', ' ')).rstrip('\n'))
else:
if line != '\n':
f2.write(line)
f2.close()
I would keep read and write operations separate.
#read
with open(file, 'r') as f:
lines = f.readlines()
#parse, change and write back
with open(file, 'w') as f:
for line in lines:
if line.startswith('#\t'):
line = line[1:]
f.write(line)
You have not closed the files and there is no need for the \t
Also get rid of the rstrip()
Read in the file, replace the data and write it back.. open and close each time.
fn = 'example.txt'
new_data = []
# Read in the file
with open(fn, 'r+') as file:
filedata = file.readlines()
# Replace the target string
for line in filedata:
if "1st_word" in line and "2nd_word" in line:
line = line.replace('#', '')
new_data.append(line)
# Write the file out again
with open(fn, 'w+') as file:
for line in new_data:
file.write(line)

Comparing two lines from two text files according to a single part of the text file

I have two text files and I want to write out two new text files according to whether there is a common section to each line in the two original text files.
The format of the text files is as follows:
commontextinallcases uniquetext2 potentiallycommontext uniquetext4
There are more than 4 columns but you get the idea. I want to check the 'potentiallycommontext' part in each text file and if they are the same write out the whole line of each text file to a new text file for each with its own unique text still in place.
Spliting it is fairly easy just using the .split() command when reading it in. I have found the following code:
with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)
But I am not sure this would work for my case where I need to split the lines. Is there a way to do this I am missing?
Thanks
I don't think, that this set-approach is suitable for your case.
I'd try like
with open('some_file_1.txt', 'r') as file1, open('some_file_2.txt', 'r') as file2, open('some_output_file.txt', 'w') as file_out:
for line1, line2 in zip(file1, file2):
if line1.split()[2] == line2.split()[2]:
file_out.write(line1)
file_out.write(line2)
There might be shorter solutions but this should work
PCT_IDX = _ # find which index of line.split() corresponds to potentiallycommontext
def lines(filename):
with open(filename, 'r') as file:
for line in file:
line = line.rstrip('\n')
yield line
lines_1 = lines('some_file_1.txt')
lines_2 = lines('some_file_2.txt')
with open('some_output_file.txt', 'w') as file_out:
for (line_1, line_2) in zip(lines_1, lines_2):
maybe_cmn1 = line_1.split()[PCT_IDX]
maybe_cmn2 = line_2.split()[PCT_IDX]
if maybe_cmn1 == maybe_cmn2:
file_out.write(line_1)
file_out.write(line_2)

Categories