I have input.txt file and output.txt file which are passed in argument in Python script. I am reading input file content using readline() function. Before I update to current line and write it to output file, I want to check some condition on upcoming lines as described below. Could you please provide me some guidance? Thank you.
I want to update current line with internal_account value (random number with 16 digits) from 11th location if line starts with 01065008 and following condition are met.
5th upcoming line starts with 06 and
line start with 06 has value as USD from 6th character
input.txt
01065008200520P629658405456454
02BRYAN ANGUS 56425555643
0300000000000000000HUTS7858863
04PROSPECTUS ENCLOSYUSS574U623
05AS OF 05/13/20 45452366753
06Q47USDTFT 87845566765
input.txt file has pattern:
1st line will start with 010065008
2nd line will start with 02
...
6th line will start with 06
1st line will start with 010065008
...
What I have tried?
import random
import sys
infile=open(sys.argv[1], 'r')
lines=infile.readlines()
outfile=open(sys.argv[2], 'w')
internal_account = random.randint(1000000000000000,9999999999999999)
formattedStr = ''
for line in lines:
if line[0:8] == '01065008':
formattedStr='%s%s%s'%(line[0:10],internal_account,line[26:])
outfile.write(formattedStr)
else:
outfile.write(line)
outfile.close()
To check forward in the text file, read all the lines into a list then use the line index to check forward lines. Use the enumerate function to the track the line index.
ss = '''
01065008200520P629658405456454
02BRYAN ANGUS 56425555643
0300000000000000000HUTS7858863
04PROSPECTUS ENCLOSYUSS574U623
05AS OF 05/13/20 45452366753
06Q47USDTFT 87845566765
'''.strip()
with open ('input.txt','w') as f: f.write(ss) # write data file
###############################3
import random
import sys
infile=open('input.txt') #open(sys.argv[1], 'r')
lines=infile.readlines()
outfile=open('output.txt','w') #open(sys.argv[2], 'w')
internal_account = random.randint(1000000000000000,9999999999999999)
print('internal_account', internal_account, end='\n\n')
formattedStr = ''
for i,line in enumerate(lines):
line
if line[0:8] == '01065008' and i < len(lines)-5 and lines[i+5].startswith('06') and lines[i+5][5:8] == 'USD':
formattedStr='%s%s%s'%(line[0:10],internal_account,line[26:])
outfile.write(formattedStr)
print(formattedStr.strip())
else:
outfile.write(line)
print(line.strip())
outfile.close()
Output
internal_account 2371299802657810
010650082023712998026578106454
02BRYAN ANGUS 56425555643
0300000000000000000HUTS7858863
04PROSPECTUS ENCLOSYUSS574U623
05AS OF 05/13/20 45452366753
06Q47USDTFT 87845566765
You were not far from finding a good solution. Using enumerate on input lines let use use the index to check future lines so you can verify if all your conditions are fulfilled. You need to catch IndexError so that no exception is raised when there are not enough lines left.
Other minor modifications I made in your code:
Use with statement to handle file opening to prevent having to close file yourself.
Use startswith wherever you can to make the code clearer.
Use scientific notation when you can to make code clearer.
import random
import sys
input_file, output_file = sys.argv[0:2]
internal_account = random.randint(1e15, 9999999999999999)
with open(input_file, "r") as stream:
input_lines = stream.readlines()
with open(output_file, "w") as stream:
for index, line in enumerate(input_lines):
try:
update_account = (
line.startswith("01065008")
and input_lines[index + 5].startswith("06")
and input_lines[index + 5][5:8] == "USD"
)
except IndexError:
update_account = False
if update_account:
line = line[0:10] + str(internal_account) + line[26:]
stream.write(line)
Related
I am trying to find a line starts with specific string and replace entire line with new string
I tried this code
filename = "settings.txt"
for line in fileinput.input(filename, inplace=True):
print line.replace('BASE_URI =', 'BASE_URI = "http://example.net"')
This one not replacing entire line but just a matching string. what is best way to replace entire line starting with string ?
You don't need to know what old is; just redefine the entire line:
import sys
import fileinput
for line in fileinput.input([filename], inplace=True):
if line.strip().startswith('BASE_URI ='):
line = 'BASE_URI = "http://example.net"\n'
sys.stdout.write(line)
Are you using the python 2 syntax. Since python 2 is discontinued, I will try to solve this in python 3 syntax
suppose you need to replace lines that start with "Hello" to "Not Found" then you can do is
lines = open("settings.txt").readlines()
newlines = []
for line in lines:
if not line.startswith("Hello"):
newlines.append(line)
else:
newlines.append("Not Found")
with open("settings.txt", "w+") as fh:
for line in newlines:
fh.write(line+"\n")
This should do the trick:
def replace_line(source, destination, starts_with, replacement):
# Open file path
with open(source) as s_file:
# Store all file lines in lines
lines = s_file.readlines()
# Iterate lines
for i in range(len(lines)):
# If a line starts with given string
if lines[i].startswith(starts_with):
# Replace whole line and use current line separator (last character (-1))
lines[i] = replacement + lines[-1]
# Open destination file and write modified lines list into it
with open(destination, "w") as d_file:
d_file.writelines(lines)
Call it using this parameters:
replace_line("settings.txt", "settings.txt", 'BASE_URI =', 'BASE_URI = "http://example.net"')
Cheers!
I have a similar question to delete multiple line
I want to delete the line and the next 4 lines. This is my code:
bind = open('/etc/bind/named.conf.local','r')
a = dict['name']
for line in bind:
if a in line:
print('line exist')
''' and delete this line and 4 line after it'''
else:
print('line does not exist')
I want to save modify text in /etc/bind/named.conf.local in place, without fileinput. I do not want skip 4 line I want to delete them from the file. I do not want to read it and write it again and skip 4 lines.
What should I do?
I think the following code does what you're looking for. You will have to adjust settings filename, keyword and delete to your needs. The code will delete delete lines from file filename every time keyword is found in a line. (Including the keyword line.)
# Settings
filename = "test.txt"
keyword = "def"
delete = 2
# Read lines from file
with open(filename) as f:
lines = f.readlines()
# Process lines
for i, line in enumerate(lines):
if keyword in line:
del lines[i:i + delete]
# Save modified lines to file
with open(filename, "w") as f:
f.writelines(lines)
Example test.txt before:
abc
def
ghi
jkl
Example test.txt afterwards:
abc
jkl
If you don't want to use fileinput, you can also read all the lines of your file, write them (except for the lines you skip using next(f)) to a tempfile.NamedTemporaryFile and replace the original file with the temporary file.
from pathlib import Path
from tempfile import NamedTemporaryFile
named_conf = Path('/etc/bind/named.conf.local')
with open(named_conf) as infile:
with NamedTemporaryFile("w", delete=False) as outfile:
for line in infile:
if line_should_be_deleted(line):
# skip this line and the 4 lines after it
for _ in range(4):
next(infile)
else:
outfile.write(line)
Path(outfile.name).replace(named_conf)
But you should just use fileinput, like the answer to the question you linked to says, since it does the tempfile stuff for you.
It all boils down to keeping a skip count that you initialize with the first occurrence of the matching line and increase afterward:
match = "text line to match"
with open('input.txt','r') as lines:
with open('output.txt','w') as output:
skip = -1
for line in lines:
skip += skip >= 0 or skip < 0 and line.strip("\n") == match
if skip not in range(5):
output.write(line)
If what you're trying to avoid is reading lines one by one, you could write it like this (but you still need to open the files)
match = "text line to match"
lines = open('input.txt','r').read().split("\n")
matchPos = lines.index(match)
del lines[matchPos:matchPos+5]
open('output.txt','w').write("\n".join(lines))
bind = open('text.txt','r')
a = dict['name']
lines = bind.readlines()
i = 0
while i < len(lines):
if a in lines[i]:
del lines[i:i+5]
i += 1
print(lines)
bind = open('/etc/bind/named.conf.local','r')
textfile = bind.readlines()
a = 'some text'
for line_num in range(len(textfile)):
try:
if a in textfile[line_num]:
print('line exists')
del textfile[line_num:line_num+5]
except IndexError:
break
writer = open("/etc/bind/named.conf.local","w")
writer.write(''.join(textfile))
writer.close()
Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
with open(f_in, 'r') as fin:
with open(f_out, 'w') as fout:
linenums = [10884121]
s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
fin.seek(0)
fin.write(''.join(s))
fin.truncate(fin.tell())
First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.
Something like this might do the trick with less hassle - we read line by line,
use enumerate to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
Please try to use:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
There are high chances that you run out of memory since you are trying to store file into list.
Try this below:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory.
You can also try reading file using buffering.
Hope this helps.
How about a generic file filter function?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)
I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat...
Eventually once I'm comfortable with the process, I'll automate it entirely. For now however, let's assume I'm running this by hand.
What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python...but would be open to other examples. The line might be anywhere in the file.
If Python, assume the following interface:
def removeLine(filename, lineno):
Thanks,
-aj
You can have two file objects for the same file at the same time (one for reading, one for writing):
def removeLine(filename, lineno):
fro = open(filename, "rb")
current_line = 0
while current_line < lineno:
fro.readline()
current_line += 1
seekpoint = fro.tell()
frw = open(filename, "r+b")
frw.seek(seekpoint, 0)
# read the line we want to discard
fro.readline()
# now move the rest of the lines in the file
# one line back
chars = fro.readline()
while chars:
frw.writelines(chars)
chars = fro.readline()
fro.close()
frw.truncate()
frw.close()
Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing
import os
from mmap import mmap
def removeLine(filename, lineno):
f=os.open(filename, os.O_RDWR)
m=mmap(f,0)
p=0
for i in range(lineno-1):
p=m.find('\n',p)+1
q=m.find('\n',p)
m[p:q] = ' '*(q-p)
os.close(f)
If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop
As far as I know, you can't just open a txt file with python and remove a line. You have to make a new file and move everything but that line to it. If you know the specific line, then you would do something like this:
f = open('in.txt')
fo = open('out.txt','w')
ind = 1
for line in f:
if ind != linenumtoremove:
fo.write(line)
ind += 1
f.close()
fo.close()
You could of course check the contents of the line instead to determine if you want to keep it or not. I also recommend that if you have a whole list of lines to be removed/changed to do all those changes in one pass through the file.
If the lines are variable length then I don't believe that there is a better algorithm than reading the file line by line and writing out all lines, except for the one(s) that you do not want.
You can identify these lines by checking some criteria, or by keeping a running tally of lines read and suppressing the writing of the line(s) that you do not want.
If the lines are fixed length and you want to delete specific line numbers, then you may be able to use seek to move the file pointer... I doubt you're that lucky though.
Update: solution using sed as requested by poster in comment.
To delete for example the second line of file:
sed '2d' input.txt
Use the -i switch to edit in place. Warning: this is a destructive operation. Read the help for this command for information on how to make a backup automatically.
def removeLine(filename, lineno):
in = open(filename)
out = open(filename + ".new", "w")
for i, l in enumerate(in, 1):
if i != lineno:
out.write(l)
in.close()
out.close()
os.rename(filename + ".new", filename)
I think there was a somewhat similar if not exactly the same type of question asked here. Reading (and writing) line by line is slow, but you can read a bigger chunk into memory at once, go through that line by line skipping lines you don't want, then writing this as a single chunk to a new file. Repeat until done. Finally replace the original file with the new file.
The thing to watch out for is when you read in a chunk, you need to deal with the last, potentially partial line you read, and prepend that into the next chunk you read.
#OP, if you can use awk, eg assuming line number is 10
$ awk 'NR!=10' file > newfile
I will provide two alternatives based on the look-up factor (line number or a search string):
Line number
def removeLine2(filename, lineNumber):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
currentLineNumber = 0
while currentLineNumber < lineNumber:
inputFile.readline()
currentLineNumber += 1
seekPosition = inputFile.tell()
outputFile.seek(seekPosition, 0)
inputFile.readline()
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()
String
def removeLine(filename, key):
with open(filename, 'r+') as outputFile:
with open(filename, 'r') as inputFile:
seekPosition = 0
currentLine = inputFile.readline()
while not currentLine.strip().startswith('"%s"' % key):
seekPosition = inputFile.tell()
currentLine = inputFile.readline()
outputFile.seek(seekPosition, 0)
currentLine = inputFile.readline()
while currentLine:
outputFile.writelines(currentLine)
currentLine = inputFile.readline()
outputFile.truncate()
Sequence 1.1.1 ATGCGCGCGATAAGGCGCTA
ATATTATAGCGCGCGCGCGGATATATATATATATATATATT
Sequence 1.2.2 ATATGCGCGCGCGCGCGGCG
ACCCCGCGCGCGCGCGGCGCGATATATATATATATATATATT
Sequence 2.1.1 ATTCGCGCGAGTATAGCGGCG
NOW,I would like to remove the last digit from each of the line that starts with '>'. For example, in this first line, i would like to remove '.1' (rightmost) and in second instance i would like to remove '.2' and then write the rest of the file to a new file. Thanks,
import fileinput
import re
for line in fileinput.input(inplace=True, backup='.bak'):
line = line.rstrip()
if line.startswith('>'):
line = re.sub(r'\.\d$', '', line)
print line
many details can be changed depending on details of the processing you want, which you have not clearly communicated, but this is the general idea.
import re
trimmedtext = re.sub(r'(\d+\.\d+)\.\d', '$1', text)
Should do it. Somewhat simpler than searching for start characters (and it won't effect your DNA chains)
if line.startswith('>Sequence'):
line = line[:-2] # trim 2 characters from the end of the string
or if there could be more than one digit after the period:
if line.startswith('>Sequence'):
dot_pos = line.rfind('.') # find position of rightmost period
line = line[:dot_pos] # truncate upto but not including the dot
Edit for if the sequence occurs on the same line as >Sequence
If we know that there will always be only 1 digit to remove we can cut out the period and the digit with:
line = line[:13] + line[15:]
This is using a feature of Python called slices. The indexes are zero-based and exclusive for the end of the range so line[0:13] will give us the first 13 characters of line. Except that if we want to start at the beginning the 0 is optional so line[:13] does the same thing. Similarly line[15:] gives us the substring starting at character 15 to the end of the string.
map "".join(line.split('.')[:-1]) to each line of the file.
Here's a short script. Run it like: script [filename to clean]. Lots of error handling omitted.
It operates using generators, so it should work fine on huge files as well.
import sys
import os
def clean_line(line):
if line.startswith(">"):
return line.rstrip()[:-2]
else:
return line.rstrip()
def clean(input):
for line in input:
yield clean_line(line)
if __name__ == "__main__":
filename = sys.argv[1]
print "Cleaning %s; output to %s.." % (filename, filename + ".clean")
input = None
output = None
try:
input = open(filename, "r")
output = open(filename + ".clean", "w")
for line in clean(input):
output.write(line + os.linesep)
print ": " + line
except:
input.close()
if output != None:
output.close()
import re
input_file = open('in')
output_file = open('out', 'w')
for line in input_file:
line = re.sub(r'(\d+[.]\d+)[.]\d+', r'\1', line)
output_file.write(line)