How to use re to search for negative number? - python

I have a script that opens up grabs an integer from the user and saves it as results[1].
It then opens up myfile.config and searches for an integer after a string. The string is locationID=.
So the string with the number should look like:
locationID="34"
or whatever random number is there.
It replaces the current number with the number from results.
So far, my script checks if there is a number and if there is no number listed after locationID=.
How can I make it check and replace a negative number?
Here is the original:
def replaceid:
source = "myfile.config"
newtext = str(results[1])
with fileinput.FileInput(source, inplace=True, backup='.bak') as file:
for line in file:
pattern = r'(?<=locationId=")\d+' # find 1 or more digits that come
# after the string locationid
if re.search(pattern, line):
sys.stdout.write(re.sub(pattern, newtext, line)) # adds number after locationid
fileinput.close()
else:
sys.stdout.write(re.sub(r'(locationId=)"', r'\1"' + newtext, line)) # use sys.stdout.write instead of "print"
# using re module to format
# adds a location id number after locationid even if there was no number originally there
fileinput.close()
This is not working for me:
def replaceid:
source = "myfile.config"
newtext = str(results[1])
with fileinput.FileInput(source, inplace=True, backup='.bak') as file:
for line in file:
pattern = r'(?<=locationId=")\d+' # find 1 or more digits that come
# after the string locationid
patternneg = r'(?<=locationId=-")\d+'
if re.search(pattern, line):
sys.stdout.write(re.sub(pattern, newtext, line)) # adds number after locationid
fileinput.close()
elif re.search(patternneg, line): # if Location ID has a negative number
sys.stdout.write(re.sub(patternneg, newtext, line)) # adds number after locationid
fileinput.close()
else:
sys.stdout.write(re.sub(r'(locationId=)"', r'\1"' + newtext, line)) # use sys.stdout.write instead of "print"
# using re module to format
# adds a location id number after locationid even if there was no number originally there
fileinput.close()

Since you are not using the matched digit (whether positive or negative) you could change the pattern to match anything between double quotes .([^"]+) even if it's not a digit
pattern = r'(?<=locationId=").([^"]+)'
To cover the case of an empty "" you could change the pattern to .([^"]*). Also,you might want to extend the pattern for the cases when there are empty spaces before/after locationId something like r'(?<=locationId")\s*=\s*.([^"]*)'.

Related

Search and Replace a string in text file

I would like to search for a line in a text file which contains the string "SECTION=C-BEAM" and replace the first 13 characters in the "next line" by reading a pattern from first line (pattern highlighted in bold (see example below - read 1.558 from first line and replace it with 1.558/2 =0.779 in the second line). The number to read from first line is always in between the strings "H_" and "H_0".
Example Input:
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0., 1, 2, 3, 4, 5
Output as follows:
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0.779, 1, 2, 3, 4, 5
This is what I have tried so far.
file_in = open(test_input, 'rb')
file_out = open(test_output, 'wb')
lines = file_in.readlines()
print ("Total no. of lines to process: ", len(lines))
for i in range(len(lines)):
if lines.startswith("SECTION") and "SECTION=C-BEAM" in lines:
start_index = lines.find("H_")+1
end_index = lines.find("H_0")
x = lines[start_index:end_index]/2.0
print (x)
lines[i+1]= lines[i+1].replace(" 0.",x)+lines[i+1][13:]
file_out.write(lines[i])
file_in.close()
file_out.close()
As you have mentioned that the content resides in a file, I tried to store some other random lines in a string other than the pattern you are looking for.
Tested below piece of code and it works. I assume there is only one such occurrence in the file.If there are multiple occurrences in the file that can be done through a loop though.
import re
st = '''These are some different lines - you need not worry about.
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0., 1, 2, 3, 4, 5
These are more different lines - you need not worry about.
0.,2 numbers'''
num = str(float(re.findall('.*H_(.+)H_0.*SECTION=C-BEAM.*\n.*',st)[0].replace("_","."))/2)
print (re.sub(r'(.*SECTION=C-BEAM.*\n)(0\.)(,.*)',r'\g<1>'+num+r'\g<3>',st))
# re.findall('.*H_(.*)H_0.*SECTION=C-BEAM.*\n.*',st) --> Returns ['1_558']. Extract 1_558 by indexing it -[0]
# Then replace "_" with "." Convert to a float, divide by 2 and then convert the result to string again
# .* means 0 or more non-newline characters,.+ means 1 or more non-newline characters "\n" stands for new line.
# (.+) means characters inside the bracket from the overall pattern will be extracted
# Second line of the code: I replaced the desired number("0.") for the matching patternin the second line.
# Divided the pattern in to 3 groups: 1) Before the pattern "0." 2) The pattern "0." itself 3) After the pattern "0.".
# Replaced the pattern "0." with "group 1 + num + group 2"
Output as shown below:
Basic python regex should do it :
my_text = """SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;\n0., 1, 2, 3, 4, 5"""
# This find the index of the first occurence of your regex in my_text
index = my_text.find('SECTION=C-BEAM')
# You select everything before the first occurence of your regex
# and count the number of lines (\n is the escape line character)
nb_line = my_text[:index].count('\n')
# Now you wand to find the index of the beginning of the n + 1 line.
# You can do this thanks to finditer function
# This creates the list of index of a specified regex,
# you select the n + 1 (here it is nb_line because python indexing starts at 0)
index = [m.start() for m in re.finditer(r"\n",my_text)][nb_line]
# the you re build the wanted string with :
# the beginning of your string until the n + 1 line,
# the text you want (0.779)
# the text after the substring you removed (you need to know the length of the string you want to remove here 2
string_to_remove = "0."
my_text = my_text[:index+1] + '0.779' + my_text[index + 1 + len(string_to_remove):]
print(my_text)

substring extract in a file using Python Regex

A file has n number of lines in blocks of logically defined strings. I'm parsing each line and capturing the required data based on some matching conditions.
I have read through each line and finding the blocks with this code:
#python
for lines in file.readlines():
if re.match(r'block.+',lines)!= None:
block_name = re.match(r'block.+', lines).group(0)
# string matching code to be added here
Input File:
line1 select KT_TT=$TMTL/$SYSNAME.P1
line2 . $dhe/ISFUNC sprfl tm/tm1032 int 231
line3 select IT_TT=$TMTL/$SYSNAME.P2
line4 . $DHE/ISFUNC ptoic ca/ca256 tli 551
.....
.....
line89 CALLING IK02=$TMTL/$SYSNAME.P2
line90 CALLING KK01=$TMTL/$SYSNAME.P1
Matching conditions & expected output of each step:
While reading the lines, match the word "/ISFUNC" and fetch the characters from the last till it matches a "/" and save it to a variable. Expected o/p->tm1032 int 231, ca256 tli 551 (matching string found in line2 & line 4, etc)
Once ISFUNC is found, read the immediate previous line and fetch the data from that line, start form the last character till it matches a "/" and save it to a variable. Expected o/p->$SYSNAME.P1 & $SYSNAME.P2(line 1 & line 3, etc)
Continue reading the lines down and look for the line starting with "CALLING" and the last string after "/" should match with o/p of step 2($SYSNAME.P1 & $SYSNAME.P2). Just capture the data after CALLING word and save it. expected o/p -> KK01 (line 90) & IK02(line 89)
final output should be like
FUNC SYS CALL
tm1032 int 231 $SYSNAME.P1 KK01
ca256 tli 551 $SYSNAME.P2 IK02
If all you need is the text next to the last slash, you need not go for regex at all .
Simply use the .split("/") on each line and you can get the last part next to the slash
sample = "$dhe/ISFUNC sprfl tm/tm1032 int 231"
sample.split("/")
will result in
['$dhe', 'ISFUNC sprfl tm', 'tm1032 int 231']
and then just access the last element of the list using -1 indexing to get the value
PS : Use the split function once you have found the corresponding line
While reading the lines, match the word "/ISFUNC" and fetch the characters from the last till it matches a "/" and save it to a variable. Expected o/p->tm1032 int 231 (matching string found in line2)
char_list = re.findall(r'/ISFUNC.*/(.*)$', line)
if char_list:
chars = char_list[0]
Once ISFUNC is found, read the immediate previous line and fetch the data from that line, start form the last character till it matches a "/" and save it to a variable. Expected o/p->$SYSNAME.P1 (line 1)
The ideal approach here is to either (a) iterate through the list indices rather than the lines themselves (i.e. for i in range(len(file.readlines()): ... file.readlines()[i]) or (b) maintain a copy of the last line (say, put last_line = line at the end of your for loop. Then, reference that last line for this expression:
data_list = re.findall(r'/([^/]*)$', last_line)
if data_list:
data = data_list[0]
Continue reading the lines down and look for the line starting with "CALLING" and the last string after "/" should match with o/p of step 2($SYSNAME.P1). Just capture the data after CALLING word and save it. expected o/p -> KK01 (line 90)
Assuming, from your example, you mean "just the data immediately after (i.e. up until the equals sign):
calling_list = re.findall(r'CALLING(.*)=.*/' + re.escape(data) + '$', line)
if calling_list:
calling = calling_list[0]
You can move the parentheses around to change what from that line exactly you want to capture. re.findall() will output a list of matches, including only the bits inside the parentheses that were matched.

Python - How to make sure that a line being read from a file contain only a given string and nothing else

In order to make sure I start and stop reading a text file exactly where I want to, I am providing 'start1'<->'end1', 'start2'<->'end2' as tags in between the text file and providing that to my python script. In my script I read it as:
start_end = ['start1','end1']
line_num = []
with open(file_path) as fp1:
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
line_num.append(num)
fp1.close()
print '\nLine number: ', line_num
fp2 = open(file_path)
for k, line2 in enumerate(fp2):
for x in range(line_num[0], line_num[1] - 1):
if k == x:
header.append(line2)
fp2.close()
This works well until I reach start10 <-> end10 and further. Eg. it checks if I have "start2" in the line and also reads the text that has "start21" and similarly for end tag as well. so providing "start1, end1" as input also reads "start10, end10". If I replace the line:
if i in line:
with
if i == line:
it throws an error.
How can I make sure that the script reads the line that contains ONLY "start1" and not "start10"?
import re
prog = re.compile('start1$')
if prog.match(line):
print line
That should return None if there is no match and return a regex match object if the line matches the compiled regex. The '$' at the end of the regex says that's the end of the line, so 'start1' works but 'start10' doesn't.
or another way..
def test(line):
import re
prog = re.compile('start1$')
return prog.match(line) != None
> test('start1')
True
> test('start10')
False
Since your markers are always at the end of the line, change:
start_end = ['start1','end1']
to:
start_end = ['start1\n','end1\n']
You probably want to look into regular expressions. The Python re library has some good regex tools. It would let you define a string to compare your line to and it has the ability to check for start and end of lines.
If you can control the input file, consider adding an underscore (or any non-number character) to the end of each tag.
'start1_'<->'end1_'
'start10_'<->'end10_'
The regular expression solution presented in other answers is more elegant, but requires using regular expressions.
You can do this with find():
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
# make sure the next char isn't '0'
if line[line.find(i)+len(i)] != '0':
line_num.append(num)

Python - Concatenate a variable into string format

I'm trying to retrieve the number from a file, and determine the padding of it, so I can apply it to the new file name, but with an added number. I'm basically trying to do a file saver sequencer.
Ex.:
fileName_0026
0026 = 4 digits
add 1 to the current number and keep the same amount of digit
The result should be 0027 and on.
What I'm trying to do is retrieve the padding number from the file and use the '%04d'%27 string formatting. I've tried everything I know (my knowledge is very limited), but nothing works. I've looked everywhere to no avail.
What I'm trying to do is something like this:
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
'%0 P d' % NN
Result=fileName_0027
I hope this is clear enough, I'm having a hard time trying to articulate this.
Thanks in advance for any help.
Cheers!
There's a few things going on here, so here's my approach and a few comments.
def get_next_filename(existing_filename):
prefix = existing_filename.split("_")[0] # get string prior to _
int_string = existing_filename.split("_")[-1].split(".")[0] # pull out the number as a string so we can extract an integer value as well as the number of characters
try:
extension = existing_filename.split("_")[-1].split(".")[-1] # check for extension
except:
extension = None
int_current = int(int_string) # integer value of current filename
int_new = int(int_string) + 1 # integer value of new filename
digits = len(int_string) # number of characters/padding in name
formatter = "%0"+str(digits)+"d" # creates a statement that int_string_new can use to create a number as a string with leading zeros
int_string_new = formatter % (int_new,) # applies that format
new_filename = prefix+"_"+int_string_new # put it all together
if extension: # add the extension if present in original name
new_filename += "."+extension
return new_filename
# since we only want to do this when the file already exists, check if it exists and execute function if so
our_filename = 'file_0026.txt'
while os.path.isfile(our_filename):
our_filename = get_next_filename(our_filename) # loop until a unique filename found
I am writing some hints to acheive that. It's unclear what exactly you wanna achieve?
fh = open("fileName_0026.txt","r") #Read a file
t= fh.read() #Read the content
name= t.split("_|.") #Output:: [fileName,0026,txt]
n=str(int(name[1])+1) #27
s= n.zfill(2) #0027
newName= "_".join([fileName,s])+".txt" #"fileName_0027.txt"
fh = open(newName,"w") #Write a new file*emphasized text*
Use the rjust function from string
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
new_padding = str(NN).rjust(P, '0')
Result=fileName_ + new_padding
import re
m = re.search(r".*_(0*)(\d*)", "filenName_00023")
print m.groups()
print("fileName_{0:04d}".format(int(m.groups()[1])+1))
{0:04d} means pad out to four digits wide with leading zeros.
As you can see there are a few ways to do this that are quite similar. But one thing the other answers haven't mention is that it's important to strip off any existing leading zeroes from your file's number string before converting it to int, otherwise it will be interpreted as octal.
edit
I just realised that my previous code crashes if the file number is zero! :embarrassed:
Here's a better version that also copes with a missing file number and names with multiple or no underscores.
#! /usr/bin/env python
def increment_filename(s):
parts = s.split('_')
#Handle names without a number after the final underscore
if not parts[-1].isdigit():
parts.append('0')
tail = parts[-1]
try:
n = int(tail.lstrip('0'))
except ValueError:
#Tail was all zeroes
n = 0
parts[-1] = str(n + 1).zfill(len(tail))
return '_'.join(parts)
def main():
for s in (
'fileName_0026',
'data_042',
'myfile_7',
'tricky_99',
'myfile_0',
'bad_file',
'worse_file_',
'_lead_ing_under_score',
'nounderscore',
):
print "'%s' -> '%s'" % (s, increment_filename(s))
if __name__ == "__main__":
main()
output
'fileName_0026' -> 'fileName_0027'
'data_042' -> 'data_043'
'myfile_7' -> 'myfile_8'
'tricky_99' -> 'tricky_100'
'myfile_0' -> 'myfile_1'
'bad_file' -> 'bad_file_1'
'worse_file_' -> 'worse_file__1'
'_lead_ing_under_score' -> '_lead_ing_under_score_1'
'nounderscore' -> 'nounderscore_1'
Some additional refinements possible:
An optional arg to specify the number to add to the current file
number,
An optional arg to specify the minimum width of the file
number string,
Improved handling of names with weird number / position of
underscores.

str.startswith() not working as I intended

I'm trying to test for a /t or a space character and I can't understand why this bit of code won't work. What I am doing is reading in a file, counting the loc for the file, and then recording the names of each function present within the file along with their individual lines of code. The bit of code below is where I attempt to count the loc for the functions.
import re
...
else:
loc += 1
for line in infile:
line_t = line.lstrip()
if len(line_t) > 0 \
and not line_t.startswith('#') \
and not line_t.startswith('"""'):
if not line.startswith('\s'):
print ('line = ' + repr(line))
loc += 1
return (loc, name)
else:
loc += 1
elif line_t.startswith('"""'):
while True:
if line_t.rstrip().endswith('"""'):
break
line_t = infile.readline().rstrip()
return(loc,name)
Output:
Enter the file name: test.txt
line = '\tloc = 0\n'
There were 19 lines of code in "test.txt"
Function names:
count_loc -- 2 lines of code
As you can see, my test print for the line shows a /t, but the if statement explicitly says (or so I thought) that it should only execute with no whitespace characters present.
Here is my full test file I have been using:
def count_loc(infile):
""" Receives a file and then returns the amount
of actual lines of code by not counting commented
or blank lines """
loc = 0
for line in infile:
line = line.strip()
if len(line) > 0 \
and not line.startswith('//') \
and not line.startswith('/*'):
loc += 1
func_loc, func_name = checkForFunction(line);
elif line.startswith('/*'):
while True:
if line.endswith('*/'):
break
line = infile.readline().rstrip()
return loc
if __name__ == "__main__":
print ("Hi")
Function LOC = 15
File LOC = 19
\s is only whitespace to the re package when doing pattern matching.
For startswith, an ordinary method of ordinary strings, \s is nothing special. Not a pattern, just characters.
Your question has already been answered and this is slightly off-topic, but...
If you want to parse code, it is often easier and less error-prone to use a parser. If your code is Python code, Python comes with a couple of parsers (tokenize, ast, parser). For other languages, you can find a lot of parsers on the internet. ANTRL is a well-known one with Python bindings.
As an example, the following couple of lines of code print all lines of a Python module that are not comments and not doc-strings:
import tokenize
ignored_tokens = [tokenize.NEWLINE,tokenize.COMMENT,tokenize.N_TOKENS
,tokenize.STRING,tokenize.ENDMARKER,tokenize.INDENT
,tokenize.DEDENT,tokenize.NL]
with open('test.py', 'r') as f:
g = tokenize.generate_tokens(f.readline)
line_num = 0
for a_token in g:
if a_token[2][0] != line_num and a_token[0] not in ignored_tokens:
line_num = a_token[2][0]
print(a_token)
As a_token above is already parsed, you can easily check for function definition, too. You can also keep track where the function ends by looking at the current column start a_token[2][1]. If you want to do more complex things, you should use ast.
You string literals aren't what you think they are.
You can specify a space or TAB like so:
space = ' '
tab = '\t'

Categories