Add lines to text file after occurence of certain text - python

I have a text file that I needs to manipulate. I want to add a line after occurence of word "exactarch". Means whenever "exactarch" occurs, I want to add text in the next line.
E.g. If this is the original file content,
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
gpgcheck=1
plugins=1
I want to change it as below:
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
This is what I tried to do:
with open('file1.txt') as f:
for line in input_data:
if line.strip() == 'exactarch':
f.write('obsoletes=1')
Obviously this is not working as I can't figure out how can I count and write to this line.

You ask for a Python solution. But tasks like this are made to be solved using simpler tools.
If you are using a system that has sed, you can do this in a simle one-liner:
$ sed '/exactarch/aobsoletes=1' < in.txt
What does this mean?
sed: the executable
/exactarch/: matches all lines that contain exactarch
a: after the current line, append a new line with the following text
obsoletes=1: the text to append in a new line
Output:
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
Edit:
To modify the file in place, use the option -i and the file as an argument:
$ sed -i '/exactarch/aobsoletes=1' in.txt

Simple - read all lines, find correct line and insert desired line after found. Dump result lines to file.
import os
with open('lines.txt') as f:
lines = f.readlines()
lines.insert(lines.index('exactarch=1\n') + 1, 'obsoletes=1\n')
with open('dst.txt', 'w') as f:
for l in lines:
f.write(l)

The past says it's pretty simple - replacing words in files is not a new thing.
If you want to replace a word, you can use the solution implemented there. In your context:
import fileinput
for line in fileinput.input(fileToSearch, inplace=True):
print(line.replace("exactarch", "exactarch\nobsoletes=1"), end='')

I am hesitant using fileinput, b/c if something goes wrong during the 'analysis' phase you are left with a file in whatever conditions it was left before the failure. I would read everything in, and then do full work on it. The code below ensures that:
Your inserted value contains a newline value '\n' if it's not going to be the last item.
Will not add duplicate inserted values by checking the one below it.
Iterates through all values incase multiple "exactarch=1"s were added since the snippet last ran.
Hope this helps, albeit not as stylish as a one/two liner.
with open('test.txt') as f:
data = f.readlines()
insertValue = 'obsoletes=1'
for item in data:
if item.rstrip() == 'exactarch=1': #find it if it's in the middle or the last line (ie. no '\n')
point = data.index(item)
if point+1 == len(data): #Will be inserted as new line since current exactarch=1 is in last position, so you don't want the '\n', right?
data.insert(point+1, instertValue)
else:
if data[point + 1].rstrip() != insertValue: #make sure the value isn't already below exactarch=1
data.insert(point+1, insertValue + '\n')
print('insertValue added below "exactarch=1"')
else:
print('insertValue already exists below exactarch=1')
with open('test.txt','w') as f:
f.writelines(data)

Related

python search for string in file return entire line + next line into new text file

I have a very large text file (50,000+ lines) that should always be in the same sequence. In python I want to search the text file for each of the $INGGA lines and join this line with the subsequent $INHDT to create a new text file. I need to do this without reading into memory as this causes it to crash every time. I can find return the $INGGA line but I'm not sure of the best way of then getting the next line and joining into a new string that is memory efficient
Thanks
Phil
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2016.05.06 09:11:34 =~=~=~=~=~=~=~=~=~=~=~= > $PRDID,2.15,-0.10,31.87*6E
$INGGA,091124.00,5249.8336,N,00120.9619,W,1,20,0.6,95.0,M,49.4,M,,*50
$INHDT,31.9,T*1E $INZDA,091124.0055,06,05,2016,,*7F
$INVTG,22.0,T,,M,4.4,N,8.1,K,A*24 $PRDID,2.13,-0.06,34.09*6C
$INGGA,091124.20,5249.8338,N,00120.9618,W,1,20,0.6,95.0,M,49.4,M,,*5D
$INHDT,34.1,T*13 $INZDA,091124.2055,06,05,2016,,*7D
$INVTG,24.9,T,,M,4.4,N,8.1,K,A*2B $PRDID,2.16,-0.03,36.24*61
$INGGA,091124.40,5249.8340,N,00120.9616,W,1,20,0.6,95.0,M,49.4,M,,*5A
$INHDT,36.3,T*13 $INZDA,091124.4055,06,05,2016,,*7B
$INVTG,27.3,T,,M,4.4,N,8.1,K,A*22 $PRDID,2.11,-0.05,38.33*68
$INGGA,091124.60,5249.8343,N,00120.9614,W,1,20,0.6,95.1,M,49.4,M,,*58
$INHDT,38.4,T*1A $INZDA,091124.6055,06,05,2016,,*79
$INVTG,29.5,T,,M,4.4,N,8.1,K,A*2A $PRDID,2.09,-0.02,40.37*6D
$INGGA,091124.80,5249.8345,N,00120.9612,W,1,20,0.6,95.1,M,49.4,M,,*56
$INHDT,40.4,T*15 $INZDA,091124.8055,06,05,2016,,*77
$INVTG,31.7,T,,M,4.4,N,8.1,K,A*21 $PRDID,2.09,0.02,42.42*40
$INGGA,091125.00,5249.8347,N,00120.9610,W,1,20,0.6,95.1,M,49.4,M,,*5F
$INHDT,42.4,T*17
You can just read a line of file and write to another new file.
Like this:
import re
#open new file with append
nf = open('newfile', 'at')
#open file with read
with open('file', 'rt') as f:
for line in f:
r = re.match(r'\$INGGA', line)
if r is not None:
nf.write(line)
nf.write("$INHDT,31.9,T*1E" + '\n')
You can use at to append write and wt to read line!
I have 150,000 lines file, It's run well!
I suggest using a simple regex that will parse and capture the parts you care about. Here is an example that will capture the piece you care about:
(\$INGGA.*\n\$INHDT.*\n)
https://regex101.com/r/tK1hF0/3
As in my above link, you'll notice that I used the "global" g setting on the regex, telling it to capture all groups that match. Otherwise, it'll stop after the first match.
I also had trouble determining where the actual line breaks exist in your above example file, so you can tweak the above to match exactly where the breaks occur.
Here is some starter python example code:
import re
test_str = # load your file here
p = re.compile(ur'(\$INGGA.*\n\$INHDT.*\n)')
matches = re.findall(p, test_str)
In the example PuTTY log you give, its all one line separated with space.
So in this case you can use this to replace the space with new line and gets new file -
cat large_file | sed 's/ /\n/g' > new_large_file
To iterate over the file separated with new line, run this -
cat new_large_file | python your_script.py
Your script get line by line so your computer should not crash.
your_script.py -
import sys
INGGA_line = ""
for line in sys.stdin:
line_striped = line.strip()
if line_striped.startswith("$INGGA"):
INGGA_line = line_striped
elif line_striped.startswith("$INZDA"):
print line_striped, INGGA_line
else:
print line_striped
This answer is aimed at python 3.
According to this other answer (and the docs), you can iterate your file line-by-line memory-efficiently:
with open(filename, 'r') as f:
for line in f:
...process...
An example of how you could fulfill your above criteria could be
# Target file write-only, source file read-only
with open(targetfile, 'w') as tf, open(sourcefile, 'r') as sf:
# Flag for whether we are looking for 1st or 2nd part
look_for_ingga = True
for line in sf:
if look_for_ingga:
if line.startswith('$INGGA,'):
tf.write(line)
look_for_ingga = False
elif line.startswith('$INHDT,'):
tf.write(line)
look_for_ingga = True
In the case where you have multiple '$INGGA,' prior to the '$INHDT,', this grabs the first one and disregards the rest. In case you want to take only the last '$INGGA,' before the '$INHDT,', store the last '$INGGA,' in a variable instead of writing it to disk. Then, when you find your '$INHDT,', store both.
In case you meant that you want to write to a separate new file for each INGGA-INHDT pair, the target file with-statement should be nested inside for line in sf instead, or the results should be buffered in a list for later storage.
Refer to the docs for introductions to with-statements and file reading/writing.

skip a line when it has a # character in python?

I would like some help about a problem that I'm facing as a new python programmer. I did a .txt file in c++ where there are some lines starting with # character which mean a comment and I want to skip those lines when I'm reading the file in my python script. How can I do that?
I think this should help you.
I'll read the whole file and save all lines into a list.
Then I'll iterate over this list looking for the first character in every line.
If the first char is equal to "#", go to the next line.
Otherwise, append this line to a new list called selected_lines.
My code isn't super effective, one-liner or etc... but I think this may help you.
lines = []
selected_lines = []
filepath = "/usr//home/Desktop/myfile.txt"
with open(filepath, "r") as f:
lines.append(f.readlines())
for line in lines:
if line[0:1] == "#":
continue
else:
selected_lines.append(line)
Something like this would work if it's just the beginning character. If you need it to ignore comments after code, you would need to modify it to if '#' in line: and handle it accordingly.
with open('somefile.txt', 'r') as f:
for line in f:
# Use continue so your code doesn't become a nested mess.
# if this check passes, we can assume line is not a comment.
if line[0] == '#':
continue
# Do stuff with line after checking for the comment.

Python: Extracting lines from a file using another file as key

I have a 'key' file that looks like this (MyKeyFile):
afdasdfa ghjdfghd wrtwertwt asdf (these are in a column, but I never figured out the formatting, sorry)
I call these keys and they are identical to the first word of the lines that I want to extract from a 'source' file. So the source file (MySourceFile) would look something like this (again, bad formatting, but 1st column = the key, following columns = data):
afdasdfa (several tab delimited columns)
.
.
ghjdfghd ( several tab delimited columns)
.
wrtwertwt
.
.
asdf
And the '.' would indicate lines of no interest currently.
I am an absolute novice in Python and this is how far I've come:
with open('MyKeyFile','r') as infile, \
open('MyOutFile','w') as outfile:
for line in infile:
for runner in source:
# pick up the first word of the line in source
# if match, print the entire line to MyOutFile
# here I need help
outfile.close()
I realize there may be better ways to do this. All feedback is appreciated - along my way of solving it, or along more sophisticated ones.
Thanks
jd
I think that this would be a cleaner way of doing it, assuming that your "key" file is called "key_file.txt" and your main file is called "main_file.txt"
keys = []
my_file = open("key_file.txt","r") #r is for reading files, w is for writing to them.
for line in my_file.readlines():
keys.append(str(line)) #str() is not necessary, but it can't hurt
#now you have a list of strings called keys.
#take each line from the main text file and check to see if it contains any portion of a given key.
my_file.close()
new_file = open("main_file.txt","r")
for line in new_file.readlines():
for key in keys:
if line.find(key) > -1:
print "I FOUND A LINE THAT CONTAINS THE TEXT OF SOME KEY", line
You can modify the print function or get rid of it to do what you want with the desired line that contains the text of some key. Let me know if this works
As I understood (corrent me in the comments if I am wrong), you have 3 files:
MySourceFile
MyKeyFile
MyOutFile
And you want to:
Read keys from MyKeyFile
Read source from MySourceFile
Iterate over lines in the source
If line's first word is in keys: append that line to MyOutFile
Close MyOutFile
So here is the Code:
with open('MySourceFile', 'r') as sourcefile:
source = sourcefile.read().splitlines()
with open('MyKeyFile', 'r') as keyfile:
keys = keyfile.read().split()
with open('MyOutFile', 'w') as outfile:
for line in source:
if line.split():
if line.split()[0] in keys:
outfile.write(line + "\n")
outfile.close()

Edit and save file

I need to edit my file and save it so that I can use it for another program . First I need to put "," in between every word and add a word at the end of every line.
In order to put "," in between every word , I used this command
for line in open('myfile','r+') :
for word in line.split():
new = ",".join(map(str,word))
print new
I'm not too sure how to overwrite the original file or maybe create a new output file for the edited version . I tried something like this
with open('myfile','r+') as f:
for line in f:
for word in line.split():
new = ",".join(map(str,word))
f.write(new)
The output is not what i wanted (different from the print new) .
Second, I need to add a word at the end of every line. So, i tried this
source = open('myfile','r')
output = open('out','a')
output.write(source.read().replace("\n", "yes\n"))
The code to add new word works perfectly. But I was thinking there should be an easier way to open a file , do two editing in one go and save it. But I'm not too sure how. Ive spent a tremendous amount of time to figure out how to overwrite the file and it's about time I seek for help
Here you go:
source = open('myfile', 'r')
output = open('out','w')
output.write('yes\n'.join(','.join(line.split()) for line in source.read().split('\n')))
One-liner:
open('out', 'w').write('yes\n'.join(','.join(line.split() for line in open('myfile', 'r').read().split('\n')))
Or more legibly:
source = open('myfile', 'r')
processed_lines = []
for line in source:
line = ','.join(line.split()).replace('\n', 'yes\n')
processed_lines.append(line)
output = open('out', 'w')
output.write(''.join(processed_lines))
EDIT
Apparently I misread everything, lol.
#It looks like you are writing the word yes to all of the lines, then spliting
#each word into letters and listing those word's letters on their own line?
source = open('myfile','r')
output = open('out','w')
for line in source:
for word in line.split():
new = ",".join(word)
print >>output, new
print >>output, 'y,e,s'
How big is this file?
Maybe You could create a temporary list which would just contain everything from file you want to edit. Every element could represent one line.
Editing list of strings is pretty simple.
After Your changes you can just open Your file again with
writable = open('configuration', 'w')
and then put changed lines to file with
file.write(writable, currentLine + '\n')
.
Hope that helps - even a little bit. ;)
For the first problem, you could read all the lines in f before overwriting f, assuming f is opened in 'r+' mode. Append all the results into a string, then execute:
f.seek(0) # reset file pointer back to start of file
f.write(new) # new should contain all concatenated lines
f.truncate() # get rid of any extra stuff from the old file
f.close()
For the second problem, the solution is similar: Read the entire file, make your edits, call f.seek(0), write the contents, f.truncate() and f.close().

How to flush to a file in python?

I have this snippet and a weird thing is going on:
out = open("./out.txt","w+")
for line in open("./int.txt","r").readlines():
for key in dic.keys():
if line.count(key) > 0:
line = re.sub(key,dic[key],line)
print line
out.write(line)
The output to the shell of python is the right one it contains all the lines after the switches according to the dic{} while the out file contains just about half of the lines?
I recommend using the with statement to manage the context of your output file handle so that it is closed at the end of the scope (so that when you view it, it is up-to-date). The for loop does this for you with your input file.
with open("./out.txt", "w+") as out:
for line in open("./int.txt", "r"):
for key in dic.keys():
if key in line:
line = re.sub(key, dic[key], line)
print line,
out.write(line)
A few other minor changes:
.readlines() is not required
if key in line: will stop searching for key in line once it has found the first instance, improving effiency.
print line, will not add another new-line after line.
Also consider whether line = line.replace(key, dic[key]) would suffice, since you're not searching for a regular expression on the line.
files don't always get written to disk straight away, they need flushing
try out.flush() at the end

Categories