Python Script to copy define line in C files - python

I am having some trouble with a script. My task is to read in a bunch of .h (files written in c) files and I am capable of doing that using:
myfiles glob.glob('*.h')
but the struggle I am having is once these files are being read in, I need to take the #define line and paste it below that and change it. Confusing I know but an example would be:
#ifndefine _THIS_CODE_NEEDS_COPIED_H
#define _THIS_CODE_NEEDS_COPIED_H
#define THIS_CODE_NEEDS_COPIED_VERSION "10" <----thats what I need to add!
Noting: it loses one underscore
after define and the H is changed to
VERSION with a String "10" at the end.
Yes, it seems as though it would be simple but I am not sure how to read python in character by character. Any suggestions?! Minding, this is a new line copied and then edited below those others. Also, this is many files. There are hundreds of this! And all of their #define's read something different after them (ex: another might be #define _THIS_IS_DIFFERENT_H). So they would not all say the same thing. Please, help! My brain can't take anymore!

This sounds like a job for the fileinput and re modules:
import fileinput
import glob
import re
import sys
files = glob.glob('*.h')
pattern = re.compile(r'#define\s+_([_A-Z]+)_H\s+$')
realstdout = sys.stdout
for line in fileinput.input(files, inplace=True, backup='.bak'):
sys.stdout.write(line)
m = pattern.match(line)
if m:
sys.stdout.write('\n#define %s_VERSION "10"\n' % m.group(1))
realstdout.write('%s: %s\n'%(fileinput.filename(),m.group(1)))
Notes:
The call to fileinput.input() iterates over files in the list that is passed in as the first argument.
The inplace parameter to fileinput.input() indicates that you are editing the files in place. That is, they files will be replaced by whatever your program writes to standard output.
The regular expression matches the sort of #define that you say you are looking for. Additionally, the parentheses () in it capture a substring of that match.
Inside the loop, we maintain the existing content by writing every single line of every file. Additionally, if we see the magic #define, then we write one extra line.
The business with realstdout provides a log of which files were modified, and what patterns were detected.

You don't need to read in character-by-character to do this in Python. This could be done in fewer lines of code but it would be even uglier and harder to follow than it is now:
with open("input_file.txt", "rb") as f: # you can use glob or os.walk as needed to get input files
with open("output_file.txt", "w") as out: # output file, adjust as needed
for line in f: # iterate through each line
new_line = line # make new_line to be written equal to current line
if line.startswith("#define"):
tokens = line.strip().split(' ')
sp = 1 if tokens[-1].startswith('_') else 0 # skip initial underscore if present by adjusting start position (sp)
def_list = tokens[-1][sp:].split('_')
if def_list[-1] == 'H':
def_list[-1] = 'VERSION'
new_line = '_'.join(def_list) + ' "10"'
out.write("%s\n" % new_line) # write new_line to file
This will change lines as needed and write unaffected lines to the new file as is. If it is required to have an underscore prefix to the ones needing to be changed that can be modified also, currently this script handles it either way by adjusting the start position (sp).

Related

Delete a line in multiple text files with the same line beginning but varying line ending using Python v3.5

I have a folder full of .GPS files, e.g. 1.GPS, 2.GPS, etc...
Within each file is the following five lines:
Trace #1 at position 0.004610
$GNGSA,A,3,02,06,12,19,24,25,,,,,,,2.2,1.0,2.0*21
$GNGSA,A,3,75,86,87,,,,,,,,,,2.2,1.0,2.0*2C
$GNVTG,39.0304,T,39.0304,M,0.029,N,0.054,K,D*32
$GNGGA,233701.00,3731.1972590,S,14544.3073733,E,4,09,1.0,514.675,M,,,0.49,3023*27
...followed by the same data structure, with different values, over the next five lines:
Trace #6 at position 0.249839
$GNGSA,A,3,02,06,12,19,24,25,,,,,,,2.2,1.0,2.0*21
$GNGSA,A,3,75,86,87,,,,,,,,,,2.2,1.0,2.0*2C
$GNVTG,247.2375,T,247.2375,M,0.081,N,0.149,K,D*3D
$GNGGA,233706.00,3731.1971997,S,14544.3075178,E,4,09,1.0,514.689,M,,,0.71,3023*2F
(I realise the values after the $GNGSA lines don't vary in the above example. This is just a bad example... in the real dataset they do vary!)
I need to remove the lines that begin with "$GNGSA" and "$GNVTG" (i.e. I need to delete lines 2, 3, and 4 from each group of five lines within each .GPS file).
This five-line pattern continues for a varying number of times throughout each file (for some files, there might only be two five-line groups, while other files might have hundreds of the five-line groups). Hence, deleting these lines based on the line number will not work (because the line number would be variable).
The problem I am having (as seen in the above examples) is that the text that follows the "$GNGSA" or "$GNVTG" varies.
I'm currently learning Python (I'm using v3.5), so figured this would make for a good project for me to learn a few new tricks...
What I've tried already:
So far, I've managed to create the code to loop through the entire folder:
import os
indir = '/Users/dhunter/GRID01/' # input directory
for i in os.listdir(indir): # for each "i" (iteration) within the indir variable directory...
if i.endswith('.GPS'): # if the filename of an iteration ends with .GPS, then...
print(i + ' loaded') # print the filename to CLI, simply for debugging purposes.
with open(indir + i, 'r') as my_file: # open the iteration file
file_lines = my_file.readlines() # uses the readlines method to create a list of all lines in the file.
print(file_lines) # this prints the entire contents of each file to CLI for debugging purposes.
Everything in the above works perfectly.
What I need help with:
How do I detect and delete the lines themselves, and then save the file (to the same location; there is no need to save to a different filename)?
The filenames - which usually end with ".GPS" - sometimes end with ".gps" instead (the only difference being the case). My above code will only work with the uppercase files. Besides completely duplicating the code and changing the endswith argument, how do I make it work with both cases?
In the end, my file needs to look something like this:
Trace #1 at position 0.004610
$GNGGA,233701.00,3731.1972590,S,14544.3073733,E,4,09,1.0,514.675,M,,,0.49,3023*27
Trace #6 at position 0.249839
$GNGGA,233706.00,3731.1971997,S,14544.3075178,E,4,09,1.0,514.689,M,,,0.71,3023*2F
Any suggestions, please? Thanks in advance. :)
You're almost there.
import os
indir = '/Users/dhunter/GRID01/' # input directory
for i in os.listdir(indir): # for each "i" (iteration) within the indir variable directory...
if i.endswith('.GPS'): # if the filename of an iteration ends with .GPS, then...
print(i + ' loaded') # print the filename to CLI, simply for debugging purposes.
with open(indir + i, 'r') as my_file: # open the iteration file
for line in my_file:
if not line.startswith('$GNGSA') and not line.startswith('$GNVTG'):
print(line)
As per what the others have said, you're on the right track! Where you're going wrong is in the case-sensitive file extension check, and in reading in the entire file contents at once (this isn't per se wrong, but it's probably adding complexity we won't need).
I've commented your code, removing all the debug stuff for simplicity, to illustrate what I mean:
import os
indir = '/path/to/files'
for i in os.listdir(indir):
if i.endswith('.GPS'): #This CASE SENSITIVELY checks the file extension
with open(indir + i, 'r') as my_file: # Opens the file
file_lines = my_file.readlines() # This reads the ENTIRE file at once into an array of lines
So we need to fix the case sensitivity issue, and instead of reading in all the lines, we'll instead read the file line-by-line, check each line to see if we want to discard it or not, and write the lines we're interested in into an output file.
So, incorporating #tdelaney's case-insensitive fix for file name, we replace line #5 with
if i.lower().endswith('.gps'): # Case-insensitively check the file name
and instead of reading in the entire file at once, we'll instead iterate over the file stream and print each desired line out
with open(indir + i) as in_file, open(indir + i + 'new.gps') as out_file: # Open the input file for reading and creates + opens a new output file for writing - thanks #tdelaney once again!
for line in in_file # This reads each line one-by-one from the in file
if not line.startswith('$GNGSA') and not line.startswith('$GNVTG'): # Check the line has what we want (thanks Avinash)
out_file.write(line + "\n") # Write the line to the new output file
Note that you should make certain that you open the output file OUTSIDE of the 'for line in in_file' loop, or else the file will be overwritten on every iteration which will erase what you've already written to it so far (I suspect this is the issue you've had with the previous answers). Open both files at the same time and you can't go wrong.
Alternatively, you can specify the file access mode when you open the file, as per
with open(indir + i + 'new.gps', 'a'):
which will open the file in append-mode, which is a specialised from of write-mode that preserves the original contents of the file, and appends new data to it instead of overwriting existing data.
Ok, based on suggestions by Avinash Raj, tdelaney, and Sampson Oliver, here on Stack Overflow, and another friend who helped privately, here is the solution that is now working:
import os
indir = '/Users/dhunter/GRID01/' # input directory
for i in os.listdir(indir): # for each "i" (iteration) within the indir variable directory...
if i.lower().endswith('.gps'): # if the filename of an iteration ends with .GPS, then...
if not i.lower().endswith('.gpsnew.gps'): # if the filename does not end with .gpsnew.gps, then...
print(i + ' loaded') # print the filename to CLI.
with open (indir + i, 'r') as my_file:
for line in my_file:
if not line.startswith('$GNGSA'):
if not line.startswith('$GNVTG'):
with open(indir + i + 'new.gps', 'a') as outputfile:
outputfile.write(line)
outputfile.write('\r\n')
(You'll see I had to add in another layer of if statement to stop it from using the output files from previous uses of the script "if not i.lower().endswith('.gpsnew.gps'):", but this line can easily be deleted for anyone who uses these instructions in future)
We switched the open mode on the third-last line to "a" for append, so that it would save all the right lines to the file, rather than overwriting each time.
We also added in the final line to add a line break at the end of each line.
Thanks everyone for their help, explanations, and suggestions. Hopefully this solution will be useful to someone in future. :)
2. The filenames:
The if accepts any expression returning a truth value, and you can combine expressions with the standart boolean operators: if i.endswith('.GPS') or i.endswith('.gps').
You can also put the ... and ... expression after the if in brackets, to feel more sure, but it's not neccessary.
Alternatively, as a less universal solution, (but since you wanted to learn a few tricks :)) you can use string manipulation in this case: an object of type string has a lot of methods. '.gps'.upper() gives '.GPS' -- try, if you can make use of this! (even a printed string is a string object, but your variables behave the same).
1. Finding the Lines:
As you can see in the other solution, you need not read out all of your lines, you can check if want to have them 'on the fly'. But I will stick to your approach with readlines. It gives you a list, and lists support indexing and slicing. Try:
anylist[stratindex, endindex, stride], for any values, so for example try: newlist = range(100)[1::5].
It's always helpfull to try out the easy basic operations in interactive mode, or at the beginning of your script. Here range(100) is just some sample list. Here you see, how the python for-syntax works, differently than in other languages: you can iterate over any list, and if you just need integers, you create a list with integers with range().
So this will work the same with any other list -- e.g. the one you get from readlines()
This selects a slice from the list, beginnig with the second element, ending at the end (since the end index is omitted), and taking every 5th element. Now you have this sub-list, you can just revome it from the original. So for the example with the range:
a = range(100)
del(a[1::5])
print a
So you see, that the appropriate items have been removed. Now do the same with your file_lines, and then proceed to remove the other lines you want to remove.
Then, in a new with block, open the file for writing and do writelines(file_lines), so the remainig lines are written back to the file.
Of course you can also take the approach to look for the content of each line with a for loop over your list and startswith(). Or you can combine the approaches, and check, if deleting lines by number leaves the right starts, so you can print an error if something is unexpected...
3. Saving the file
You can close your file after you have the lines saved in the readlines(). In fact this is done automatically at the end of the with-block. Then just open it in 'w' mode instead of 'r' and do yourfilename.writelines(yourlist). You don't need to save, it's saven on closing.

Python - Calling lines from a text file to compile a pattern search of a second file

Forgive me if this is asked and answered. If so, chalk it up to my being new to programming and not knowing enough to search properly.
I have a need to read in a file containing a series of several hundred phrases, such as names or email addresses, one per line, to be used as part of a compiled search term - pattern = re.search(name). The 'pattern' variable will be used to search another file of over 5 million lines to identify and extract select fields from relevant lines.
The text of the name file being read in for variable would be in the format of:
John\n
Bill\n
Harry#helpme.com\n
Sally\n
So far I have the below code which does not error out, but also does not process and close out. If I pass the names manually using slightly different code with a sys.argv[1], everything works fine. The code (which should be) in bold is the area I am having problems with - starting at "lines = open...."
import sys
import re
import csv
import os
searchdata = open("reallybigfile", "r")
Certfile = csv.writer(open('Certfile.csv', 'ab'), delimiter=',')
**lines = open("Filewithnames.txt", 'r')
while True:
for line in lines:
line.rstrip('\n')
lines.seek(0)
for nam in lines:
pat = re.compile(nam)**
for f in searchdata.readlines():
if pat.search(f):
fields = f.strip().split(',')
Certfile.writerow([nam, fields[3], fields[4]])
lines.close()
The code at the bottom (starting "for f in searchdata.readlines():") locates, extracts and writes the fields fine. I have been unable to find a way to read in the Filewithnames.txt file and have it use each line. It either hangs, as with this code, or it reads all lines of the file to the last line and returns data only for the last line, e.g. 'Sally'.
Thanks in advance.
while True is an infinite loop, and there is no way to break out of it that I can see. That will definitely cause the program to continue to run forever and not throw an error.
Remove the while True line and de-indent that loop's code, and see what happens.
EDIT:
I have resolved a few issues, as commented, but I will leave you to figure out the precise regex you need to accomplish your goal.
import sys
import re
import csv
import os
searchdata = open("c:\\dev\\in\\1.txt", "r")
# Certfile = csv.writer(open('c:\\dev\\Certfile.csv', 'ab'), delimiter=',') #moved to later to ensure the file will be closed
lines = open("c:\\dev\\in\\2.txt", 'r')
pats = [] # An array of patterns
for line in lines:
line.rstrip()
lines.seek(0)
# Add additional conditioning/escaping of input here.
for nam in lines:
pats.append(re.compile(nam))
with open('c:\\dev\\Certfile.csv', 'ab') as outfile: #This line opens the file
Certfile = csv.writer(outfile, delimiter=',') #This line interprets the output into CSV
for f in searchdata.readlines():
for pat in pats: #A loop for processing all of the patterns
if pat.search(f) is not None:
fields = f.strip().split(',')
Certfile.writerow([pat.pattern, fields[3], fields[4]])
lines.close()
searchdata.close()
First of all, make sure to close all the files, including your output file.
As stated before, the while True loop was causing you to run infinitely.
You need a regex or set of regexes to cover all of your possible "names." The code is simpler to do a set of regexes, so that is what I have done here. This may not be the most efficient. This includes a loop for processing all of the patterns.
I believe you need additional parsing of the input file to give you clean regular expressions. I have left some space for you to do that.
Hope that helps!

how to add a new line in a text file using python without \n

I have a file that has a list of files but it adds \n at the end how can I have python just write the info I need on a new line without getting \n in the way so that way my info will be called X.acc not x.acc\n? Here is my code that writes the file
def add(x):
nl = "\n"
acc = ".acc"
xy = x + acc
exyz = xy
xyz = exyz
xxx = str(xyz)
tf = open('accounts.dat',"a+")
tf.writelines(nl)
tf.writelines(xxx)
tf.close
Here is the code that calls upon the file:
import sys
tf = open('accounts.dat','r')
names = tf.readlines()
u = choicebox(msg="pick something",title = "Choose an account",choices=(names))
counter_file = open(u, 'r+')
content_lines = []
for line in counter_file:
if line == "credits =":
creds = line
else:
False
for line in counter_file:
if 'credits =' in line:
line_components = line.split('=')
int_value = int(line_components[1]) + 1
line_components[1] = str(int_value)
updated_line= "=".join(line_components)
content_lines.append(updated_line)
else:
msgbox(msg=(creds))
content_lines.append(line)
counter_file.seek(0)
counter_file.truncate()
counter_file.writelines(content_lines)
counter_file.close()
thank you for your help and sorry if this is a trival question still new to python :)
Your question doesn't actually make sense, because of what a "line" actually is and what that '\n' character means.
Files don't have an intrinsic concept of lines. A file is just a sequence of bytes. '\n' is the line separator (as Python represents it with universal newlines). If you want your data to show up on different "lines", you must put a line separator between them. That's all that the '\n' character is. If you open up the file in a text editor after you write it, most editors won't explicitly show the newline character by default, because it's already represented by the separation of the lines.
To break down what your code is doing, let's look at the add method, and fix some things along the way.
The first thing add does is name a variable called nl and assign it the newline character. From this, I can surmise that nl stands for "newline", but it would be much better if that was actually the variable name.
Next, we name a variable called acc and assign it the '.acc' suffix, presumably to be used as a file extension or something.
Next, we make a variable called xy and assign it to x + acc. xy is now a string, though I have no idea of what it contains from the variable name. With some knowledge of what x is supposed to be or what these lines represent, perhaps I could rename xy to something more meaningful.
The next three lines create three new variables called exyz, xyz, and xxx, and point them all to the same string that xy references. There is no reason for any of these lines whatsoever, since their values aren't really used in a meaningful way.
Now, we open a file. Fine. Maybe tf stands for "the file"? "text file"? Again, renaming would make the code much more friendly.
Now, we call tf.writelines(nl). This writes the newline character ('\n') to the file. Since the writelines method is intended for writing a whole list of strings, not just a single character, it'll be cleaner if we change this call to tf.write(nl). I'd also change this to write the newline at the end, rather than the beginning, so the first time you write to the file it doesn't insert an empty line at the front.
Next, we call writelines again, with our data variable (xxx, but hopefully this has been renamed!). What this actually does is break the iterable xxx (a string) into its component characters, and then write each of those to the file. Better replace this with tf.write(xxx) as well.
Finally, we have tf.close, which is a reference to the close function of the file object. It's a no-op, because what you presumably meant was to close the file, by calling the method: tf.close(). We could also wrap the file up as a context manager, to make its use a little cleaner. Also, most of the variables aren't necessary: we can use string formatting to do most of the work in one step. All in all, your method could look like this at the end of the day:
def add(x):
with open('accounts.dat',"a+") as output_file:
output_file.write('{0}.acc\n'.format(x))
So you can see, the reason the '\n' appears at the end of every line is because you are writing it between each line. Furthermore, this is exactly what you have to do if you want the lines to appear as "lines" in a text editor. Without the newline character, everything would appear all smashed together (take out the '\n' in my add method above and see for yourself!).
The problem you described in the comment is happening because names is a direct reading of the file. Looking at the readlines documentation, it returns a list of the lines in the file, breaking at each newline. So to clean those names up, you want line 4 of the code you posted to call str.strip on the individual lines. You can do that like this:
names = tf.readlines()
for i in range(len(names)):
names[i] = names[i].strip() # remove all the outside whitespace, including \n
However, it's much cleaner, quicker, and generally nicer to take advantage of Python's list comprehensions, and the fact that file objects are already iterable line-by-line. So the expression below is equivalent to the previous one, but it looks far nicer:
names = [line.strip() for line in tf]
Just change add:
def add(x):
nl = "\n"
acc = ".acc"
xy = x + acc
exyz = xy
xyz = exyz
xxx = str(xyz)
tf = open('accounts.dat',"a+")
tf.writelines(xxx)
tf.writelines(nl) # Write the newline AFTER instead of before the output
tf.close() # close is a function so needs to be called by having () at the end.
See the comments for what has changed.
why dont you just write a function with "\n" at the end of the line.
So no need recall "\n" every time
I did this way-
import os
log_path = r"c:\python27\Logs\log.txt"
if not os.path.exists(r"c:\python27\Logs"):
os.mkdir(r"c:\python27\Logs")
def write_me_log(text):
global log_path
with open(log_path,"a+") as log:
log.write(text+"\n")
write_me_log("Hello this is the first log text with new line")
file = open("accountfile.txt","a")
file.write(username)
file.write(" ")
file.write(password)
file.write(" ")
file.write(age)
#need it to go down a line here so it writes"hello world" on the next line
file.write("hello world")
file.close()``

Replace text in the first line in a huge txt tab delimited file

I have a huge text file (19GB in size); it is a genetic data file with variables and observations.
The first line contains the variable names and they are structured as followed:
id1.var1 id1.var2 id1.var3 id2.var1 id2.var2 id2.var3
I need to swap id1, id2 ect. with corresponding values that are in another text file (this file has about 7k rows) ids are not in any particular order and it's structured as follow:
oldId newIds
id1 rs004
id2 rs135
I have done some google search and could not really find a language that would allow to do the following:
read the first line
replace the ids with the new ids
remove the first line from the original file and replace it with the new one
Is this a good approach or is there a better one?
Which is the best language to accomplish this?
We have people with experience in python, vbscipt and Perl.
The whole "replace" thing is possible in almost any language (I'm sure about Python and Perl), as long as the length of the replacement line is the same as the original, or if it can be made the same by padding with whitespace (otherwise, you'll have to rewrite the whole file).
Open the file for reading and writing (w+ mode), read the first line, prepare the new line, seek to position 0 in the file, write the new line, close the file.
I suggest you use the Tie::File module, which maps the lines in a text file to a Perl array and will make the rewriting of the lines after the header a simple job.
This program demonstrates. It first reads all of the old/new IDs into a hash, and then maps the data file using Tie::File. The first line of the file (in $file[0]) is modified using a substitution, and then the array is untied to rewrite and close the file.
You will need to change your file names from the ones I have used. Also beware that I have assumed that the IDs are always "word" characters (alphanumeric plus underscore) followed by a dot, and have no spaces. Of course you will want to back up your file before you modify it, and you should test the program on a smaller file before you update the real thing.
use strict;
use warnings;
use Tie::File;
my %ids;
open my $fh, '<', 'newids.txt' or die $!;
while (<$fh>) {
my ($old, $new) = split;
$ids{$old} = $new;
}
tie my #file, 'Tie::File', 'datafile.txt' or die $!;
$file[0] =~ s<(\w+)(?=\.)><$ids{$1} // $1>eg;
untie #file;
This should be pretty easy. I would use Python as I am a Python fan. Outline:
Read the mapping file, and save the mapping (in Python, use a dictionary).
Read the data file a line at a time, remap variable names, and output the edited line.
You really can't edit a file in-place... hmm, I guess you could if every new variable name was always exactly the same length as the old name. But for ease of programming, and safety while running, it would be best to always write a new output file and then delete the original. This means you will need at least 20 GB of free disk space before running this, but that shouldn't be a problem.
Here is a Python program that shows how to do it. I used your example data to make test files and this seems to work.
#!/usr/bin/python
import re
import sys
try:
fname_idmap, fname_in, fname_out = sys.argv[1:]
except ValueError:
print("Usage: remap_ids <id_map_file> <input_file> <output_file>")
sys.exit(1)
# pattern to match an ID, only as a complete word (do not match inside another id)
# match start of line or whitespace, then match non-period until a period is seen
pat_id = re.compile("(^|\s)([^.]+).")
idmap = {}
def remap_id(m):
before_word = m.group(1)
word = m.group(2)
if word in idmap:
return before_word + idmap[word] + "."
else:
return m.group(0) # return full matched string unchanged
def replace_ids(line, idmap):
return re.sub(pat_id, remap_id, line)
with open(fname_idmap, "r") as f:
next(f) # discard first line with column header: "oldId newIds"
for line in f:
key, value = line.split()
idmap[key] = value
with open(fname_in, "r") as f_in, open(fname_out, "w") as f_out:
for line in f_in:
line = replace_ids(line, idmap)
f_out.write(line)

Remove lines from files

I have a lot of files which have comments beginning with ! and I need to remove all of those, then replace the #Mhz with # Mhz on the next line and keep the file name same. What is an efficient way of doing this? I can read the file and write to a new file in a different directory and manually delete them later i guess but is there a better way?
Here's a stupidly simple way:
for line in in_file:
if line[0] == '!':
continue
if line.startswith('#Mhz'):
line = '# MHz' + line[4:] # Assuming it's megahertz, it's spelled MHz.
out_file.write(line)
You can read the whole input file and split it into lines then open the file for writing if you want to do it in place.
The fileimput module is a good choice when you want to filter one (or more) files in-place:
import fileinput
import sys
files_ = fileinput.input(['somefile.ext','anotherfile'], inplace=1)
for line in files_:
if line.startswith('#Mhz'):
sys.stdout.write('# Mhz' + line[4:])
elif line[0] != '!':
sys.stdout.write(line)
files_.close() # cancel stdin & stdout redirection
The first argument to fileinput.input() can also be a single filename instead of a sequence of them or, if left out, they're automatically taken from successive sys.argv[1:] arguments, or sys.stdin if there aren't any -- allowing it to easily handle multiple file seamlessly as written. It can also automatically make backup files and has numerous other useful features, all of which are described in detail in the documentation.
In Python 3.2+ it also can be used in conjunction with a Python with statement which would allow the code above to simplified slightly.
You didn't say anything in the question about why/if it needed to be in python.
If you're only doing this to one or a few files, one very simple way to do this would be to open a file in vim and type
:%s/^!.*\n#Mhz/# Mhz/
and possibly
:%s/^!.*\n//
to get lines to remove that aren't followed by #Mhz, then save the file and quit with
:wq
With mode 'r+' , no need of open in 'r' - read - shut - reopen in 'w' - write -shut , all can be done in the same opening of the same file
From this sentence :
then replace the #Mhz with # Mhz on
the next line
I understood that '#Mhz' must be replaced with '# Mhz' only if '#Mhz' is present in a line that follows a line beginning with '!'
If so, the following code does the job for files that are not too big (so that they can easily be loaded in the RAM)
import re
regx = re.compile('^!.*\r?\n((?!!)(.*?)(#Mhz)(.*\r?\n))?',re.MULTILINE)
def repl(mat):
return (mat.group(2)+'# Mhz'+mat.group(4) if mat.group(2)
else mat.group(1))
with open(filename,'r+') as f:
content = f.read()
f.seek(0,0)
f.write(regx.sub(repl,content))
f.truncate()
For enomrous files, another algorithm must be employed.

Categories