I am trying to replace a string in a file.
Below code is simply modifying certain substrings within the bigger string from the file. Any ideas on how I can actually replace line with current_line in the filename?
from sys import *
import os
import re
import datetime
import fileinput
script, filename = argv
userhome = os.path.expanduser('~')
username = os.path.split(userhome)[-1]
print "\n"
print "User: " + username
today = datetime.date.today().strftime("%Y/%m/%d")
time = datetime.datetime.now().strftime("%H:%M:%S")
print "Date: " + str(today)
print "Current time: " + str(time)
print "Filename: %s\n" % filename
def replace_string():
found = False
with open(filename, 'r+') as f:
for line in f:
if re.search("CVS Header", line):
print line
####################################################################################
# Below logic: #
# if length of revision number is 4 characters (e.g. 1.15) then increment by 0.01 #
# else if it is 3 characters (e.g. 1.5) then increment by 0.1 #
####################################################################################
if len(line.split("$Revision: ")[1].split()[0]) == 4:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.01))
elif len(line.split("$Revision: ")[1].split()[0]) == 3:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.1))
###
###
newer_line = str.replace(new_line, line.split("$Author: ")[1].split()[0], username)
newest_line = str.replace(newer_line, line.split("$Date: ")[1].split()[0], today)
current_line = str.replace(newest_line, line.split("$Date: ")[1].split()[1], time)
print current_line
found = True
if not found:
print "No CVS Header exists in %s" % filename
if __name__ == "__main__":
replace_string()
I tried adding something like..
f.write(f.replace(line, current_line))
but this just clears all the contents out of the file and leaves it blank so obviously that is incorrect.
The fileinput provides a way to edit a file in place. If you use the inplace parameter the file is moved to a backup file and standard output is directed to the input file.
import fileinput
def clause(line):
return len(line) < 5
for line in fileinput.input('file.txt', inplace=1):
if clause(line):
print '+ ' + line[:-1]
fileinput.close()
Trying to apply this idea to your example, it could be something like this:
def replace_string():
found = False
for line in fileinput.input(filename, inplace=1): # <-
if re.search("CVS Header", line):
#print line
####################################################################################
# Below logic: #
# if length of revision number is 4 characters (e.g. 1.15) then increment by 0.01 #
# else if it is 3 characters (e.g. 1.5) then increment by 0.1 #
####################################################################################
if len(line.split("$Revision: ")[1].split()[0]) == 4:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.01))
elif len(line.split("$Revision: ")[1].split()[0]) == 3:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.1))
###
###
newer_line = str.replace(new_line, line.split("$Author: ")[1].split()[0], username)
newest_line = str.replace(newer_line, line.split("$Date: ")[1].split()[0], today)
current_line = str.replace(newest_line, line.split("$Date: ")[1].split()[1], time)
print current_line[:-1] # <-
found = True
else:
print line[:-1] # <- keep original line otherwise
fileinput.close() # <-
if not found:
print "No CVS Header exists in %s" % filename
The solution proposed by user2040251 is the correct way, and the way used but all text editors I know. The reason is that in case of a major problem when writing the file, you keep the previous version unmodified until the new version is ready.
But of course if you want you can edit in place, if you accept the risk of completely losing the file in case of crash - it can be acceptable for a file under version control since you can always get previous commited version.
The principle is then a read before write, ensuring that you never write something that you have not still read.
At the simplest level, you load everything in memory with readlines, replace the line rewind the file the the correct position (or to the beginning) and write it back.
Edit : here is a simple implementation when all lines can be loaded in memory :
fd = open(filename, "r+")
lines = fd.readlines()
for i, line in enumerate(lines):
# test if line if the searched line
if found :
lines[i] = replacement_line
break
fd.seek(0)
fd.writelines()
It could be done even for a big file using readlines(16384) for example instead of readlines() to read by chunks of little more than 16K, and always reading one chunk before writing previous, but it is really much more complicated and anyway you should use a backup file when processing big files.
You can create another file and write the output to it. After that, you can just remove the original file and rename the new file.
Related
let's say I have a file Example.txt like this:
alpha_1 = 10
%alpha_2 = 20
Now, I'd like to have a python script which performs these tasks:
If the line containing alpha_1 parameter is not commented (% symbol), to rewrite the line adding %, like it is with alpha_2
To perform the task in 1. independently of the line order
To leave untouched the rest of the file Example.txt
The file I wrote is:
with open('Example.txt', 'r+') as config:
while 1:
line = config.readline()
if not line:
break
# remove line returns
line = line.strip('\r\n')
# make sure it has useful data
if (not "=" in line) or (line[0] == '%'):
continue
# split across equal sign
line = line.split("=",1)
this_param = line[0].strip()
this_value = line[1].strip()
for case in switch(this_param):
if case("alpha1"):
string = ('% alpha1 =', this_value )
s = str(string)
config.write(s)
Up to now the output is the same Example.txt with a further line (%alpha1 =, 10) down the original line alpha1 = 10.
Thanks everybody
I found the solution after a while. Everything can be easily done writing everything on another file and substituting it at the end.
configfile2 = open('Example.txt' + '_temp',"w")
with open('Example.txt', 'r') as configfile:
while 1:
line = configfile.readline()
string = line
if not line:
break
# remove line returns
line = line.strip('\r\n')
# make sure it has useful data
if (not "=" in line) or (line[0] == '%'):
configfile2.write(string)
else:
# split across equal sign
line = line.split("=",1)
this_param = line[0].strip()
this_value = line[1].strip()
#float values
if this_param == "alpha1":
stringalt = '% alpha1 = '+ this_value + ' \r\n'
configfile2.write(stringalt)
else:
configfile2.write(string)
configfile.close()
configfile2.close()
# the file is now replaced
os.remove('Example.txt' )
os.rename('Example.txt' + '_temp','Example.txt' )
Say customPassFile.txt has two lines in it. First line is "123testing" and the second line is "testing321". If passwordCracking = "123testing", then the output would be that "123testing" was not found in the file (or something similar). If passwordCracking = "testing321", then the output would be that "testing321" was found in the file. I think that the for loop I have is only reading the last line of the text file. Any solutions to fix this?
import time
import linecache
def solution_one(passwordCracking):
print("Running Solution #1 # " + time.strftime("%Y-%m-%d %H:%M:%S",time.localtime()))
startingTimeSeconds = time.time()
currentLine = 1
attempt = 1
passwordFound = False
wordListFile = open("customPassFile.txt", encoding="utf8")
num_lines = sum(1 for line in open('customPassFile.txt'))
while(passwordFound == False):
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
passwordChecking = line
if(passwordChecking == passwordCracking):
passwordFound = True
endingTimeSeconds = time.time()
overallTimeSeconds = endingTimeSeconds - startingTimeSeconds
print("~~~~~~~~~~~~~~~~~")
print("Password Found: {}".format(passwordChecking))
print("ATTEMPTS: {}".format(attempt))
print("TIME TO FIND: {} seconds".format(overallTimeSeconds))
wordListFile.close()
break
elif(currentLine == num_lines):
print("~~~~~~~~~~~~~~~~~")
print("Stopping Solution #1 # " + time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
print("REASON: Password could not be cracked")
print("ATTEMPTS: {}".format(attempt))
break
else:
attempt = attempt + 1
currentLine = currentLine + 1
continue
The main problem with your code is that you open the file and you read it multiple times. The first time the file object position goes to the end and stays there. Next time you read the file nothing happens, since you are already at the end of the file.
Example
Sometimes an example is worth more than lots of words.
Take the file test_file.txt with the following lines:
line1
line2
Now open the file and read it twice:
f = open('./test_file.txt')
f.tell()
>>> 0
for l in f:
print(l, end='')
else:
print('nothing')
>>> line1
>>> line2
>>> nothing
f.tell()
>>> 12
for l in f:
print(l, end='')
else:
print('nothing')
>>> nothing
f.close()
The second time nothing happen, as the file object is already at the end.
Solution
Here you have two options:
you read the file only once and save all the lines in a list and then use the list in your code. It should be enough to replace
wordListFile = open("customPassFile.txt", encoding="utf8")
num_lines = sum(1 for line in open('customPassFile.txt'))
with
with open("customPassFile.txt", encoding="utf8") as f:
wordListFile = f.readlines()
num_lines = len(wordListFile)
you reset the file object position after you read the file using seek. It would be something along the line:
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
wordListFile.seek(0)
I would go with option 1., unless you have memory constraint (e.g. the file is bigger than memory)
Notes
I have a few extra notes:
python starts counters with 0 (like c/c++) and not 1 (like fortran). So probably you want to set:
currentLine = 0
when you read a file, the new line character \n is not stripped, so you have to do it (with strip) or account for it when comparing strings (using e.g. startswith). As example:
passwordChecking == passwordCracking
will likely always return False as passwordChecking contains \n and passwordCracking very likely doesn't.
Disclamer
I haven't tried the code, nor my suggestions, so there might be other bugs lurking around.
**I will delete this answer after OP understands the problem in indentation of I understand his intention of his code.*
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
passwordChecking = line
#rest of the code.
Here your code is outside of for loop so only last line is cached.
for i, line in enumerate(wordListFile):
if(i == currentLine):
line = line
passwordChecking = line
#rest of the code.
I ran into a curious problem while parsing json objects in large text files, and the solution I found doesn't really make much sense. I was working with the following script. It copies bz2 files, unzips them, then parses each line as a json object.
import os, sys, json
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# USER INPUT
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
args = sys.argv
extractDir = outputDir = ""
if (len(args) >= 2):
extractDir = args[1]
else:
extractDir = raw_input('Directory to extract from: ')
if (len(args) >= 3):
outputDir = args[2]
else:
outputDir = raw_input('Directory to output to: ')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# RETRIEVE FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
tweetModel = [u'id', u'text', u'lang', u'created_at', u'retweeted', u'retweet_count', u'in_reply_to_user_id', u'coordinates', u'place', u'hashtags', u'in_reply_to_status_id']
filenames = next(os.walk(extractDir))[2]
for file in filenames:
if file[-4:] != ".bz2":
continue
os.system("cp " + extractDir + '/' + file + ' ' + outputDir)
os.system("bunzip2 " + outputDir + '/' + file)
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# PARSE DATA
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
input = open (outputDir + '/' + file[:-4], 'r')
output = open (outputDir + '/p_' + file[:-4], 'w+')
for line in input.readlines():
try:
tweet = json.loads(line)
for field in enumerate(tweetModel):
if tweet.has_key(field[1]) and tweet[field[1]] != None:
if field[0] != 0:
output.write('\t')
fieldData = tweet[field[1]]
if not isinstance(fieldData, unicode):
fieldData = unicode(str(fieldData), "utf-8")
output.write(fieldData.encode('utf8'))
else:
output.write('\t')
except ValueError as e:
print ("Parse Error: " + str(e))
print line
line = input.readline()
quit()
continue
print "Success! " + str(len(line))
input.flush()
output.write('\n')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# REMOVE OLD FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
os.system("rm " + outputDir + '/' + file[:-4])
While reading in certain lines in the for line in input.readlines(): loop, the lines would occasionally be truncated at inconsistent locations. Since the newline character was truncated as well, it would keep reading until it found the newline character at the end of the next json object. The result was an incomplete json object followed by a complete json object, all considered one line by the parser. I could not find the reason for this issue, but I did find that changing the loop to
filedata = input.read()
for line in filedata.splitlines():
worked. Does anyone know what is going on here?
After looking at the source code for file.readlines and string.splitlines I think I see whats up. Note: This is python 2.7 source code so if you're using another version... maybe this answer pertains maybe not.
readlines uses the function Py_UniversalNewlineFread to test for a newline splitlines uses a constant STRINGLIB_ISLINEBREAK that just tests for \n or \r. I would suspect Py_UniversalNewlineFread is picking up some character in the file stream as linebreak when its not really intended as a line break, could be from the encoding.. I don't know... but when you just dump all that same data to a string the splitlines checks it against \r and \n theres no match so splitlines moves on until the real line break is encountered and you get your intended line.
I've been trying to write lines to a file based on specific file names from the same directory, a search of the file names in another log file(given as an input), and the modified date of the files.
The output is limiting me to under 80 characters per line.
def getFiles(flag, file):
if (flag == True):
file_version = open(file)
if file_version:
s = mmap.mmap(file_version.fileno(), 0, access=mmap.ACCESS_READ)
file_version.close()
file = open('AllModules.txt', 'wb')
for i, values in dict.items():
# search keys in version file
if (flag == True):
index = s.find(bytes(i))
if index > 0:
s.seek(index + len(i) + 1)
m = s.readline()
line_new = '{:>0} {:>12} {:>12}'.format(i, m, values)
file.write(line_new)
s.seek(0)
else:
file.write(i +'\n')
file.close()
if __name__ == '__main__':
dict = {}
for file in os.listdir(os.getcwd()):
if os.path.splitext(file)[1] == '.psw' or os.path.splitext(file)[1] == '.pkw':
time.ctime(os.path.getmtime(file))
dict.update({str(os.path.splitext(file)[0]).upper():time.strftime('%d/%m/%y')})
if (len(sys.argv) > 1) :
if os.path.exists(sys.argv[1]):
getFiles(True, sys.argv[1])
else:
getFiles(False, None)
The output is always like:
BW_LIB_INCL 13.1 rev. 259 [20140425 16:28]
16/05/14
The interpretation of data is correct, then again the formatting is not correct as the time is put on the next line (not on the same).
This is happening to all the lines of my new file.
Could someone give me a hint?
m = s.readline() has \n at the end of line. Then you're doing .format(i, m, values) which writes m in the middle of the string.
I leave it as exercise to the reader to find out what's happening when you're writing such line to a file. :-)
(hint: m = s.readline().rstrip('\n'))
I'm in trouble here. I need to read a file. Txt file that contains a sequence of records, check the records that I want to copy them to a new file.
The file content is like this (this is just an example, the original file has more than 30 000 lines):
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|316 #begin register
03000|SP|467
99999|33|130 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
The records that begin with 03000 and have the characters 'TO' must be written to a new file. Based on the example, the file should look like this:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
Code:
file = open("file.txt",'r')
newFile = open("newFile.txt","w")
content = file.read()
file.close()
# here I need to check if the record exists 03000 characters 'TO', if it exists, copy the recordset 00000-99999 for the new file.
I did multiple searches and found nothing to help me.
Thank you!
with open("file.txt",'r') as inFile, open("newFile.txt","w") as outFile:
outFile.writelines(line for line in inFile
if line.startswith("03000") and "TO" in line)
If you need the previous and the next line, then you have to iterate inFile in triads. First define:
def gen_triad(lines, prev=None):
after = current = next(lines)
for after in lines:
yield prev, current, after
prev, current = current, after
And then do like before:
outFile.writelines(''.join(triad) for triad in gen_triad(inFile)
if triad[1].startswith("03000") and "TO" in triad[1])
import re
pat = ('^00000\|\d+\|\d+.*\n'
'^03000\|TO\|\d+.*\n'
'^99999\|\d+\|\d+.*\n'
'|'
'^AAAAA\|\d+\|\d+.*\n'
'|'
'^ZZZZZ\|\d+\|\d+.*')
rag = re.compile(pat,re.MULTILINE)
with open('fifi.txt','r') as f,\
open('newfifi.txt','w') as g:
g.write(''.join(rag.findall(f.read())))
For files with additional lines between lines beginning with 00000, 03000 and 99999, I didn't find simpler code than this one:
import re
pat = ('(^00000\|\d+\|\d+.*\n'
'(?:.*\n)+?'
'^99999\|\d+\|\d+.*\n)'
'|'
'(^AAAAA\|\d+\|\d+.*\n'
'|'
'^ZZZZZ\|\d+\|\d+.*)')
rag = re.compile(pat,re.MULTILINE)
pit = ('^00000\|.+?^03000\|TO\|\d+.+?^99999\|')
rig = re.compile(pit,re.DOTALL|re.MULTILINE)
def yi(text):
for g1,g2 in rag.findall(text):
if g2:
yield g2
elif rig.match(g1):
yield g1
with open('fifi.txt','r') as f,\
open('newfifi.txt','w') as g:
g.write(''.join(yi(f.read())))
file = open("file.txt",'r')
newFile = open("newFile.txt","w")
content = file.readlines()
file.close()
newFile.writelines(filter(lambda x:x.startswith("03000") and "TO" in x,content))
This seems to work. The other answers seem to only be writing out records that contain '03000|TO|' but you have to write out the record before and after that as well.
import sys
# ---------------------------------------------------------------
# ---------------------------------------------------------------
# import file
file_name = sys.argv[1]
file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name
file = open(file_path,"r")
# ---------------------------------------------------------------
# create output files
output_file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name + '.out'
output_file = open(output_file_path,"w")
# create output files
# ---------------------------------------------------------------
# process file
temp = ''
temp_out = ''
good_write = False
bad_write = False
for line in file:
if line[:5] == 'AAAAA':
temp_out += line
elif line[:5] == 'ZZZZZ':
temp_out += line
elif good_write:
temp += line
temp_out += temp
temp = ''
good_write = False
elif bad_write:
bad_write = False
temp = ''
elif line[:5] == '03000':
if line[6:8] != 'TO':
temp = ''
bad_write = True
else:
good_write = True
temp += line
temp_out += temp
temp = ''
else:
temp += line
output_file.write(temp_out)
output_file.close()
file.close()
Output:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
Does it have to be python? These shell commands would do the same thing in a pinch.
head -1 inputfile.txt > outputfile.txt
grep -C 1 "03000|TO" inputfile.txt >> outputfile.txt
tail -1 inputfile.txt >> outputfile.txt
# Whenever I have to parse text files I prefer to use regular expressions
# You can also customize the matching criteria if you want to
import re
what_is_being_searched = re.compile("^03000.*TO")
# don't use "file" as a variable name since it is (was?) a builtin
# function
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
for this_line in source_file:
if what_is_being_searched.match(this_line):
destination_file.write(this_line)
and for those who prefer a more compact representation:
import re
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
destination_file.writelines(this_line for this_line in source_file
if re.match("^03000.*TO", this_line))
code:
fileName = '1'
fil = open(fileName,'r')
import string
##step 1: parse the file.
parsedFile = []
for i in fil:
##tuple1 = (1,2,3)
firstPipe = i.find('|')
secondPipe = i.find('|',firstPipe+1)
tuple1 = (i[:firstPipe],\
i[firstPipe+1:secondPipe],\
i[secondPipe+1:i.find('\n')])
parsedFile.append(tuple1)
fil.close()
##search criterias:
searchFirst = '03000'
searchString = 'TO' ##can be changed if and when required
##step 2: used the parsed contents to write the new file
filout = open('newFile','w')
stringToWrite = parsedFile[0][0] + '|' + parsedFile[0][1] + '|' + parsedFile[0][2] + '\n'
filout.write(stringToWrite) ##to write the first entry
for i in range(1,len(parsedFile)):
if parsedFile[i][1] == searchString and parsedFile[i][0] == searchFirst:
for j in range(-1,2,1):
stringToWrite = parsedFile[i+j][0] + '|' + parsedFile[i+j][1] + '|' + parsedFile[i+j][2] + '\n'
filout.write(stringToWrite)
stringToWrite = parsedFile[-1][0] + '|' + parsedFile[-1][1] + '|' + parsedFile[-1][2] + '\n'
filout.write(stringToWrite) ##to write the first entry
filout.close()
I know that this solution may be a bit long. But it is quite easy to understand. And it seems an intuitive way to do it. And I have already checked this with the Data that you have provided and it works perfectly.
Please tell me if you need some more explanation on the code. I will definitely add the same.
I tip (Beasley and Joran elyase) very interesting, but it only allows to get the contents of the line 03000. I would like to get the contents of the lines 00000 to line 99999.
I even managed to do here, but I am not satisfied, I wanted to make a more cleaner.
See how I did:
file = open(url,'r')
newFile = open("newFile.txt",'w')
lines = file.readlines()
file.close()
i = 0
lineTemp = []
for line in lines:
lineTemp.append(line)
if line[0:5] == '03000':
state = line[21:23]
if line[0:5] == '99999':
if state == 'TO':
newFile.writelines(lineTemp)
else:
linhaTemp = []
i = i+1
newFile.close()
Suggestions...
Thanks to all!