Search and Replace a string in text file - python

I would like to search for a line in a text file which contains the string "SECTION=C-BEAM" and replace the first 13 characters in the "next line" by reading a pattern from first line (pattern highlighted in bold (see example below - read 1.558 from first line and replace it with 1.558/2 =0.779 in the second line). The number to read from first line is always in between the strings "H_" and "H_0".
Example Input:
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0., 1, 2, 3, 4, 5
Output as follows:
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0.779, 1, 2, 3, 4, 5
This is what I have tried so far.
file_in = open(test_input, 'rb')
file_out = open(test_output, 'wb')
lines = file_in.readlines()
print ("Total no. of lines to process: ", len(lines))
for i in range(len(lines)):
if lines.startswith("SECTION") and "SECTION=C-BEAM" in lines:
start_index = lines.find("H_")+1
end_index = lines.find("H_0")
x = lines[start_index:end_index]/2.0
print (x)
lines[i+1]= lines[i+1].replace(" 0.",x)+lines[i+1][13:]
file_out.write(lines[i])
file_in.close()
file_out.close()

As you have mentioned that the content resides in a file, I tried to store some other random lines in a string other than the pattern you are looking for.
Tested below piece of code and it works. I assume there is only one such occurrence in the file.If there are multiple occurrences in the file that can be done through a loop though.
import re
st = '''These are some different lines - you need not worry about.
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0., 1, 2, 3, 4, 5
These are more different lines - you need not worry about.
0.,2 numbers'''
num = str(float(re.findall('.*H_(.+)H_0.*SECTION=C-BEAM.*\n.*',st)[0].replace("_","."))/2)
print (re.sub(r'(.*SECTION=C-BEAM.*\n)(0\.)(,.*)',r'\g<1>'+num+r'\g<3>',st))
# re.findall('.*H_(.*)H_0.*SECTION=C-BEAM.*\n.*',st) --> Returns ['1_558']. Extract 1_558 by indexing it -[0]
# Then replace "_" with "." Convert to a float, divide by 2 and then convert the result to string again
# .* means 0 or more non-newline characters,.+ means 1 or more non-newline characters "\n" stands for new line.
# (.+) means characters inside the bracket from the overall pattern will be extracted
# Second line of the code: I replaced the desired number("0.") for the matching patternin the second line.
# Divided the pattern in to 3 groups: 1) Before the pattern "0." 2) The pattern "0." itself 3) After the pattern "0.".
# Replaced the pattern "0." with "group 1 + num + group 2"
Output as shown below:

Basic python regex should do it :
my_text = """SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;\n0., 1, 2, 3, 4, 5"""
# This find the index of the first occurence of your regex in my_text
index = my_text.find('SECTION=C-BEAM')
# You select everything before the first occurence of your regex
# and count the number of lines (\n is the escape line character)
nb_line = my_text[:index].count('\n')
# Now you wand to find the index of the beginning of the n + 1 line.
# You can do this thanks to finditer function
# This creates the list of index of a specified regex,
# you select the n + 1 (here it is nb_line because python indexing starts at 0)
index = [m.start() for m in re.finditer(r"\n",my_text)][nb_line]
# the you re build the wanted string with :
# the beginning of your string until the n + 1 line,
# the text you want (0.779)
# the text after the substring you removed (you need to know the length of the string you want to remove here 2
string_to_remove = "0."
my_text = my_text[:index+1] + '0.779' + my_text[index + 1 + len(string_to_remove):]
print(my_text)

Related

How to increment last number in file python

I have below text in the file
firefox-x 46.0:
google 5.1.0.1:
- request
- branch
I need to extract the last letter of first line and increase by one version and append to same file. My append part will be
firefox-x 46.1:
google 5.1.0.1:
- request
- branch
able to extract last integer but how to update and write to same file.
import re
with open('branch.txt','r') as fh:
first_line = fh.readline()
#print (first_line)
last_number = re.findall(".*(?:\D|^)(\d+)", first_line)
for i in last_number:
to_int = int(i)
#print (to_int)
next_num = (to_int +1)
print (next_num)
You may use
import re
rx = r'\d+(?=:$)'
s="""firefox-x 46.0:
google 5.1.0.1:
- request
- branch"""
print(re.sub(rx, lambda x: str(int(x.group(0)) + 1), s, 1, re.M))
Output:
firefox-x 46.1:
google 5.1.0.1:
- request
- branch
See the Python demo
The \d+(?=:$) regex with re.M flag will match 1+ digits that are followed with : at the end of a line and 1 passed as the count argument to re.sub will only perform a single replacement.
The lambda x: str(int(x.group(0)) + 1) part will take the first match, cast the 1+ digits matched to an int, add 1 to the value and cast it back to a string.
To read and write to another file:
import re
rx = r'\d+(?=:$)'
with open('branch.txt', 'r') as fr:
data = fr.read()
with open('branch.out.txt', 'w') as fw:
fw.write(re.sub(rx , lambda x: str(int(x.group(0)) + 1), data, 1, re.M))

substring extract in a file using Python Regex

A file has n number of lines in blocks of logically defined strings. I'm parsing each line and capturing the required data based on some matching conditions.
I have read through each line and finding the blocks with this code:
#python
for lines in file.readlines():
if re.match(r'block.+',lines)!= None:
block_name = re.match(r'block.+', lines).group(0)
# string matching code to be added here
Input File:
line1 select KT_TT=$TMTL/$SYSNAME.P1
line2 . $dhe/ISFUNC sprfl tm/tm1032 int 231
line3 select IT_TT=$TMTL/$SYSNAME.P2
line4 . $DHE/ISFUNC ptoic ca/ca256 tli 551
.....
.....
line89 CALLING IK02=$TMTL/$SYSNAME.P2
line90 CALLING KK01=$TMTL/$SYSNAME.P1
Matching conditions & expected output of each step:
While reading the lines, match the word "/ISFUNC" and fetch the characters from the last till it matches a "/" and save it to a variable. Expected o/p->tm1032 int 231, ca256 tli 551 (matching string found in line2 & line 4, etc)
Once ISFUNC is found, read the immediate previous line and fetch the data from that line, start form the last character till it matches a "/" and save it to a variable. Expected o/p->$SYSNAME.P1 & $SYSNAME.P2(line 1 & line 3, etc)
Continue reading the lines down and look for the line starting with "CALLING" and the last string after "/" should match with o/p of step 2($SYSNAME.P1 & $SYSNAME.P2). Just capture the data after CALLING word and save it. expected o/p -> KK01 (line 90) & IK02(line 89)
final output should be like
FUNC SYS CALL
tm1032 int 231 $SYSNAME.P1 KK01
ca256 tli 551 $SYSNAME.P2 IK02
If all you need is the text next to the last slash, you need not go for regex at all .
Simply use the .split("/") on each line and you can get the last part next to the slash
sample = "$dhe/ISFUNC sprfl tm/tm1032 int 231"
sample.split("/")
will result in
['$dhe', 'ISFUNC sprfl tm', 'tm1032 int 231']
and then just access the last element of the list using -1 indexing to get the value
PS : Use the split function once you have found the corresponding line
While reading the lines, match the word "/ISFUNC" and fetch the characters from the last till it matches a "/" and save it to a variable. Expected o/p->tm1032 int 231 (matching string found in line2)
char_list = re.findall(r'/ISFUNC.*/(.*)$', line)
if char_list:
chars = char_list[0]
Once ISFUNC is found, read the immediate previous line and fetch the data from that line, start form the last character till it matches a "/" and save it to a variable. Expected o/p->$SYSNAME.P1 (line 1)
The ideal approach here is to either (a) iterate through the list indices rather than the lines themselves (i.e. for i in range(len(file.readlines()): ... file.readlines()[i]) or (b) maintain a copy of the last line (say, put last_line = line at the end of your for loop. Then, reference that last line for this expression:
data_list = re.findall(r'/([^/]*)$', last_line)
if data_list:
data = data_list[0]
Continue reading the lines down and look for the line starting with "CALLING" and the last string after "/" should match with o/p of step 2($SYSNAME.P1). Just capture the data after CALLING word and save it. expected o/p -> KK01 (line 90)
Assuming, from your example, you mean "just the data immediately after (i.e. up until the equals sign):
calling_list = re.findall(r'CALLING(.*)=.*/' + re.escape(data) + '$', line)
if calling_list:
calling = calling_list[0]
You can move the parentheses around to change what from that line exactly you want to capture. re.findall() will output a list of matches, including only the bits inside the parentheses that were matched.

How to use re to search for negative number?

I have a script that opens up grabs an integer from the user and saves it as results[1].
It then opens up myfile.config and searches for an integer after a string. The string is locationID=.
So the string with the number should look like:
locationID="34"
or whatever random number is there.
It replaces the current number with the number from results.
So far, my script checks if there is a number and if there is no number listed after locationID=.
How can I make it check and replace a negative number?
Here is the original:
def replaceid:
source = "myfile.config"
newtext = str(results[1])
with fileinput.FileInput(source, inplace=True, backup='.bak') as file:
for line in file:
pattern = r'(?<=locationId=")\d+' # find 1 or more digits that come
# after the string locationid
if re.search(pattern, line):
sys.stdout.write(re.sub(pattern, newtext, line)) # adds number after locationid
fileinput.close()
else:
sys.stdout.write(re.sub(r'(locationId=)"', r'\1"' + newtext, line)) # use sys.stdout.write instead of "print"
# using re module to format
# adds a location id number after locationid even if there was no number originally there
fileinput.close()
This is not working for me:
def replaceid:
source = "myfile.config"
newtext = str(results[1])
with fileinput.FileInput(source, inplace=True, backup='.bak') as file:
for line in file:
pattern = r'(?<=locationId=")\d+' # find 1 or more digits that come
# after the string locationid
patternneg = r'(?<=locationId=-")\d+'
if re.search(pattern, line):
sys.stdout.write(re.sub(pattern, newtext, line)) # adds number after locationid
fileinput.close()
elif re.search(patternneg, line): # if Location ID has a negative number
sys.stdout.write(re.sub(patternneg, newtext, line)) # adds number after locationid
fileinput.close()
else:
sys.stdout.write(re.sub(r'(locationId=)"', r'\1"' + newtext, line)) # use sys.stdout.write instead of "print"
# using re module to format
# adds a location id number after locationid even if there was no number originally there
fileinput.close()
Since you are not using the matched digit (whether positive or negative) you could change the pattern to match anything between double quotes .([^"]+) even if it's not a digit
pattern = r'(?<=locationId=").([^"]+)'
To cover the case of an empty "" you could change the pattern to .([^"]*). Also,you might want to extend the pattern for the cases when there are empty spaces before/after locationId something like r'(?<=locationId")\s*=\s*.([^"]*)'.

Can't figure out where function is going wrong

I have a text file IDlistfix, which contains a list of youtube video IDs. I'm trying to make a new text file, newlist.txt, which is the IDs in the first video with apostrophes around them and a comma in between the IDs. This is what I've written to accomplish this:
n = open('IDlistfix','r+')
j = open('newlist.txt','w')
line = n.readline()
def listify(rd):
return '\'' + rd + '\','
for line in n:
j.write(listify(line))
This gives me an output of ','rUfg2SLliTQ where I'd expect the output to be 'rUfg2SLliTQ',. Where is my function going wrong?
You just have to strip it of newlines:
j.write(listify(line.strip())) # Notice the call of the .strip() method on the String
Try to remove trailing whitespace and return a formatted string:
n = open('IDlistfix','r+')
j = open('newlist.txt','w')
line = n.readline()
def listify(rd):
# remove trailing whitespace
rd = rd.rstrip()
# return a formatted string
# this is generally preferable to '+'
return "'{0}',".format(rd)
for line in n:
j.write(listify(line))
The problem must be in,
`return '\'' + rd + '\`','
because rd is ending with '/n'.
Remove the '/n' from rd and it should be fine
Is a problem with change of line.
Change:
for line in n:
j.write(listify(line.replace('\n','')))

Python: split line by comma, then by space

I'm using Python 3 and I need to parse a line like this
-1 0 1 0 , -1 0 0 1
I want to split this into two lists using Fraction so that I can also parse entries like
1/2 17/12 , 1 0 1 1
My program uses a structure like this
from sys import stdin
...
functions'n'stuff
...
for line in stdin:
and I'm trying to do
for line in stdin:
X = [str(elem) for elem in line.split(" , ")]
num = [Fraction(elem) for elem in X[0].split()]
den = [Fraction(elem) for elem in X[1].split()]
but all I get is a list index out of range error: den = [Fraction(elem) for elem in X[1].split()]
IndexError: list index out of range
I don't get it. I get a string from line. I split that string into two strings at " , " and should get one list X containing two strings. These I split at the whitespace into two separate lists while converting each element into Fraction. What am I missing?
I also tried adding X[-1] = X[-1].strip() to get rid of \n that I get from ending the line.
The problem is that your file has a line without a " , " in it, so the split doesn't return 2 elements.
I'd use split(',') instead, and then use strip to remove the leading and trailing blanks. Note that str(...) is redundant, split already returns strings.
X = [elem.strip() for elem in line.split(",")]
You might also have a blank line at the end of the file, which would still only produce one result for split, so you should have a way to handle that case.
With valid input, your code actually works.
You probably get an invalid line, with too much space or even an empty line or so. So first thing inside the loop, print line. Then you know what's going on, you can see right above the error message what the problematic line was.
Or maybe you're not using stdin right. Write the input lines in a file, make sure you only have valid lines (especially no empty lines). Then feed it into your script:
python myscript.py < test.txt
How about this one:
pairs = [line.split(",") for line in stdin]
num = [fraction(elem[0]) for elem in pairs if len(elem) == 2]
den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Categories