iterate over file

iterate over file - python

I would like to iterate over a file and remove a series of lines that match a given regex. I have the script below but it only removes the 1st matching line/regex. How can i iterate through the file to get it to work?
import glob
import re
read_files = glob.glob("*.agr")
with open("out.txt", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
with open("out.txt", "r") as file:
filedata = file.read()
filedata = re.sub(r'#time\s+residue\s+[0-9]\s+Total', '', filedata)
with open("out.txt", "w") as file:
file.write(filedata)
Thanks

I solved the issue, i needed to modify the regex to specify >= 1 digit as follows: #time\s+residue\s+[0-9]+\s+Total. Previously the + sign in the regex was absent.

Related

Replace single line into multi line python

I have file.txt the contents are below
{"action":"validate","completed_at":"2019-12-24T15:24:40+05:30"}{"action":"validate","completed_at":"2019-12-24T15:24:42+05:30"}{"action":"validate","completed_at":"2019-12-24T15:24:45+05:30"}{"action":"validate","completed_at":"2019-12-24T15:24:48+05:30"}
How to convert to like below
{"action":"validate","completed_at":"2019-12-24T15:24:40+05:30"}
{"action":"validate","completed_at":"2019-12-24T15:24:42+05:30"}
{"action":"validate","completed_at":"2019-12-24T15:24:45+05:30"}
{"action":"validate","completed_at":"2019-12-24T15:24:48+05:30"}
I tried
with open('file.txt', w) as f:
f.replace("}{", "}\n{")
Any better way is to replace?

If you file is small enough you could try
with open('file.txt', 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
f.write(content.replace("}{", "}\n{"))

I would not replace inplace the content, but rather read it, split it, then write it again using a simple regex {.*?}
with open('file.txt', 'r') as f:
value = f.read()
contents = re.findall('{.*?}', value)
with open('file.txt', 'w') as f:
for content in contents:
f.write(content + "\n")

Python - replace the startswith character

I want to replace the first character in each line from the text file.
2 1.510932 0.442072 0.978141 0.872182
5 1.510932 0.442077 0.978141 0.872181
Above is my text file.
import sys
import glob
import os.path
list_of_files = glob.glob('/path/txt/23.txt')
for file_name in list_of_files:
f= open(file_name, 'r')
lst = []
for line in f:
f = open(file_name , 'w')
if line.startswith("2 "):
line = line.replace("2 ","7")
f.write(line)
f.close()
What i want:-
If the number starting with 2, i want to change that into 7. The problem is that, In the same line multiple 7 is there. If i change startswith character and save everything was changing
Thanks

The proper solution is (pseudo code):
open sourcefile for reading as input
open temporaryfile for writing as output
for each line in input:
fix the line
write it to output
close input
close output
replace sourcefile with temporaryfile
We use a temporary file and write along to avoid potential memory errors.
I leave it up to you to translate this to Python (hint: that's quite straightforward).

This is one approach.
Ex:
for file_name in list_of_files:
data = []
with open(file_name) as infile:
for line in infile:
if line.startswith("2 "): #Check line
line = " ".join(['7'] + line.split()[1:]) #Update line
data.append(line)
with open(file_name, "w") as outfile: #Write back to file
for line in data:
outfile.write(line+"\n")

Making the reading and writing of text files quicker

I have the following code, where I read an input list, split it on its backslash, and then append the variable evid to the evids array. Next, I open a file called evids.txt and write the evids to that file. How do I speed up/reduce the number of lines in this code? Thanks.
evids = []
with open('evid_list.txt', 'r') as infile:
data = infile.readlines()
for i in data:
evid = i.split('/')[2]
evids.append(evid)
with open('evids.txt', 'w') as f:
for i in evids:
f.write("%s" % i)

with open('evid_list.txt', 'r') as infile, open('evids.txt', 'w') as ofile:
for line in infile:
ofile.write('{}\n'.format(line.split('/')[2]))

Convert txt files in a folder to rows in csv file

I have 100 txt files in a folder. I would like to create a csv file in which the content of each text file becomes a single row (actually, a single cell in a row) in this csv file. So, the result would be a csv file with 100 rows.
I tried the following code:
import glob
read_files = glob.glob('neg/*')
with open("neg.csv", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
for line in infile:
outfile.write(line)
This create a csv with over thousands of rows since each txt file contains multiple paragraphs. Any suggestion?

Try:
import glob
import csv
read_files = glob.glob('neg/*')
with open("neg.csv", "wb") as outfile:
w=csv.writer(outfile)
for f in read_files:
with open(f, "rb") as infile:
w.writerow([line for line in infile])
That makes each line a cell in the output and each file a row.
If you want each cell to be the entire contents of the file, try:
import glob
import csv
read_files = glob.glob('neg/*')
with open("neg.csv", "wb") as outfile:
w=csv.writer(outfile)
for f in read_files:
with open(f, "rb") as infile:
w.writerow(" ".join([line for line in infile]))

Before writing each line, first do line.replace('\n',' ') to replace all new line characters with spaces.
Obviously, adjust your newline character according to your OS.

Read lines from a text file, reverse and save in a new text file

So far I have this code:
f = open("text.txt", "rb")
s = f.read()
f.close()
f = open("newtext.txt", "wb")
f.write(s[::-1])
f.close()
The text in the original file is:
This is Line 1
This is Line 2
This is Line 3
This is Line 4
And when it reverses it and saves it the new file looks like this:
4 eniL si sihT 3 eniL si sihT 2 eniL si sihT 1 eniL si sihT
When I want it to look like this:
This is line 4
This is line 3
This is line 2
This is line 1
How can I do this?

You can do something like:
with open('test.txt') as f, open('output.txt', 'w') as fout:
fout.writelines(reversed(f.readlines()))

read() returns the whole file in a single string. That's why when you reverse it, it reverses the lines themselves too, not just their order. You want to reverse only the order of lines, you need to use readlines() to get a list of them (as a first approximation, it is equivalent to s = f.read().split('\n')):
s = f.readlines()
...
f.writelines(s[::-1])
# or f.writelines(reversed(s))

f = open("text.txt", "rb")
s = f.readlines()
f.close()
f = open("newtext.txt", "wb")
s.reverse()
for item in s:
print>>f, item
f.close()

The method file.read() returns a string of the whole file, not the lines.
And since s is a string of the whole file, you're reversing the letters, not the lines!
First, you'll have to split it to lines:
s = f.read()
lines = s.split('\n')
Or:
lines = f.readlines()
And your method, it is already correct:
f.write(lines[::-1])
Hope this helps!

There are a couple of steps here. First we want to get all the lines from the first file, and then we want to write them in reversed order to the new file. The code for doing this is as follows
lines = []
with open('text.txt') as f:
lines = f.readlines()
with open('newtext.txt', 'w') as f:
for line in reversed(lines):
f.write(line)
Firstly, we initialize a variable to hold our lines. Then we read all the lines from the 'test.txt' file.
Secondly, we open our output file. Here we loop through the lines in reversed order, writing them to the output file as we go.

A sample using list so it will be much easier:
I'm sure there answer that are more elegant but this way is clear to understand.
f = open(r"c:\test.txt", "rb")
s = f.read()
f.close()
rowList = []
for value in s:
rowList.append(value + "\n")
rowList.reverse()
f = open(r"c:\test.txt", "wb")
for value in rowList:
f.write(value)
f.close()

You have to work line by line.
f = open("text.txt", "rb")
s = f.read()
f.close()
f = open("newtext.txt", "wb")
lines = s.split('\n')
f.write('\n'.join(lines[::-1]))
f.close()

Use it like this if your OS uses \n to break lines
f = open("text.txt", "rb")
s = f.read()
f.close()
f = open("newtext.txt", "wb")
f.write(reversed(s.split("\n")).join("\n"))
f.close()
Main thing here is reversed(s.split("\n")).join("\n").
It does the following:
Split your string by line breaks - \n,
resulting an array
reverses the array
merges the array back with linebreaks \n to a string
Here the states:
string: line1 \n line2 \n line3
array: ["line1", "line2", "line3"]
array: ["line3", "line2", "line1"]
string: line3 \n line2 \n line1 \n

If your input file is too big to fit in memory, here is an efficient way to reverse it:
Split input file into partial files (still in original order).
Read each partial file from last to first, reverse it and append to output file.
Implementation:
import os
from itertools import islice
input_path = "mylog.txt"
output_path = input_path + ".rev"
with open(input_path) as fi:
for i, sli in enumerate(iter(lambda: list(islice(fi, 100000)), []), 1):
with open(f"{output_path}.{i:05}", "w") as fo:
fo.writelines(sli)
with open(output_path, "w") as fo:
for file_index in range(i, 0, -1):
path = f"{output_path}.{file_index:05}"
with open(path) as fi:
lines = fi.readlines()
os.remove(path)
for line in reversed(lines):
fo.write(line)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

iterate over file - python

I solved the issue, i needed to modify the regex to specify >= 1 digit as follows: #time\s+residue\s+[0-9]+\s+Total. Previously the + sign in the regex was absent.

Related

Replace single line into multi line python

Python - replace the startswith character

Making the reading and writing of text files quicker

Convert txt files in a folder to rows in csv file

Read lines from a text file, reverse and save in a new text file

Categories

Resources