issues when using re.finditer with + sign character in string

issues when using re.finditer with + sign character in string - python

I am using the following code to find the location the start index of some strings as well as a temperature all of which are read from a text file.
The array searchString, contains what I'm looking for. It does locate the index of the first character of each string. The issue is that unless I put the backslash in front of the string: +25°C, finditer gives an error.
(Alternately, if I remove the + sign, it works - but I need to look for the specific +25). My question is am I correctly escaping the + sign, since the line: print('Looking for: ' + headerName + ' in the file: ' + filename )
displays : Looking for: +25°C in the file: 123.txt (with the slash showing in front of of the +)
Am I just 'getting away with this', or is this escaping as it should?
thanks
import re
path = 'C:\mypath\\'
searchString =["Power","Cal", "test", "Frequency", "Max", "\+25°C"]
filename = '123.txt' # file name to check for text
def search_str(file_path):
with open(file_path, 'r') as file:
content = file.read()
for headerName in searchString:
print('Looking for: ' + headerName + ' in the file: ' + filename )
match =re.finditer(headerName, content)
sub_indices=[]
for temp in match:
index = temp.start()
sub_indices.append(index)
print(sub_indices ,'\n')

You should use the re.escape() function to escape your string pattern. It will escape all the special characters in given string, for example:
>>> print(re.escape('+25°C'))
\+25°C
>>> print(re.escape('my_pattern with specials+&$#('))
my_pattern\ with\ specials\+\&\$#\(
So replace your searchString with literal strings and try it with:
def search_str(file_path):
with open(file_path, 'r') as file:
content = file.read()
for headerName in searchString:
print('Looking for: ' + headerName + ' in the file: ' + filename )
match =re.finditer(re.escape(headerName), content)
sub_indices=[]
for temp in match:
index = temp.start()
sub_indices.append(index)
print(sub_indices ,'\n')

Related

Python - Parse SQL Query to return only fields names using regular expression

I'm trying to parse an SQL statement (stored in a file) and only save the name of the fields
so for the following:
SELECT
Involved_Party_ID=pt.ProgramId,
Involved_Party_Type=2,
First_Name=REPLACE(REPLACE(REPLACE(pt.ProgramName, '|', '/'), CHAR(10), ' '), CHAR(13), ' '),
Registration_Status=pt.RegistrationStatus,
Involved_Party_Status=CONVERT(SMALLINT, pt.ProgramStatus),
Creation_Date=pt.CreationDate,
Incorporation_Country=CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry='UK' THEN 'GB' ELSE pd.IncorporationCountry END),
FROM
SomeTable AS pt
GO
The desired output would be:
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
Here's my code:
import re
File1 = open("File1 .sql", "r")
File2 = open("File2 .sql", "w")
for line in File1:
if re.match('\s*SELECT\s*', line):
continue
if re.match('\s*FROM\s*', line):
break
if re.match('(\n).*?(?=(=))', line):
Field_Name = re.search('(\n).*?(?=(=))', line)
File2 .write(Field_Name.group(0) + '\n')
I first tried using this regular expression:
'.*?(?=(=))'
But then my result came out as:
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry
Now that I've added (\n) to my regular expression the file returns completely empty although online regular-expression testing sites return the desired outcome.
(I'm not concerned about the whitespaces matching the regexp as I'm only retrieving the first result per line)

Judging by the patterns you use with re.match, you can do without regex here. Just skip the line that starts with SELECT, stop matching at the line starting with FROM and collect the parts of lines between them before the first =:
File1 = open("File1 .sql", "r")
File2 = open("File2 .sql", "w")
for line in File1:
if line.strip().startswith("SELECT"):
continue
elif line.strip().startswith("FROM"):
break
else:
result = line.strip().split("=", 1)[0]
File2.write(result + '\n')
The output I get is
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
See this Python demo.

Try (regex101):
s = """\
SELECT
Involved_Party_ID=pt.ProgramId,
Involved_Party_Type=2,
First_Name=REPLACE(REPLACE(REPLACE(pt.ProgramName, '|', '/'), CHAR(10), ' '), CHAR(13), ' '),
Registration_Status=pt.RegistrationStatus,
Involved_Party_Status=CONVERT(SMALLINT, pt.ProgramStatus),
Creation_Date=pt.CreationDate,
Incorporation_Country=CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry='UK' THEN 'GB' ELSE pd.IncorporationCountry END),
FROM
SomeTable AS pt
GO"""
for v in re.findall(r"^([^=\s]+)=", s, flags=re.M):
print(v)
Prints:
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country

Replace a line with a pattern

I am trying to replace a line when a pattern (only one pattern I have in that file) found with the below code, but it replaced whole content of the file.
Could you please advise or any better way with pathlib ?
import datetime
def insert_timestamp():
""" To Update the current date in DNS files """
pattern = '; serial number'
current_day = datetime.datetime.today().strftime('%Y%m%d')
subst = "\t" + str(current_day) + "01" + " ; " + pattern
print(current_day)
with open(lab_net_file, "w+") as file:
for line in file:
file.write(line if pattern not in line else line.replace(pattern, subst))
lab_net_file = '/Users/kams/nameserver/10_15'
insert_timestamp()

What you would want to do is read the file, replace the pattern, and write to it again like this:
with open(lab_net_file, "r") as file:
read = file.read()
read = read.replace(pattern, subst)
with open(lab_net_file, "w") as file:
file.write(read)
The reason that you don't need to use if/else is because if there is no pattern inside read, then .replace won't do anything, and you don't need to worry about it. If pattern is inside read, then .replace will replace it throughout the entire string.

I am able to get the output I wanted with this block of code.
def insert_timestamp(self):
""" To Update the current date in DNS files """
pattern = re.compile(r'\s[0-9]*\s;\sserial number')
current_day = datetime.datetime.today().strftime('%Y%m%d')
subst = "\t" + str(current_day) + "01" + " ; " + 'serial number'
with open(lab_net_file, "r") as file:
reading_file = file.read()
pattern = pattern.search(reading_file).group()
reading_file = reading_file.replace(pattern, subst)
with open(lab_net_file, "w") as file:
file.write(reading_file)
Thank you #Timmy

Regular expression in Python issue

I have the below code in one of my configuration files:
appPackage_name = sqlncli
appPackage_version = 11.3.6538.0
The left side is the key and the right side is value.
Now i want to be able to replace the value part with something else given a key in Python.
import re
Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
searchstr = re.escape(key) + " = [\da-zA-Z]+"
replacestr = re.escape(key) + " = " + re.escape(value)
filedata = ""
with open(Filepath,'r') as File:
filedata = File.read()
File.close()
print ("Before change:",filedata)
re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
I assume there is something wrong with the regex i am using. But i am not able to figure out what . Can someone please help me ?

Use the following fix:
import re
#Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
#searchstr = re.escape(key) + " = [\da-zA-Z]+"
#replacestr = re.escape(key) + " = " + re.escape(value)
searchstr = r"({} *= *)[\da-zA-Z.]+".format(re.escape(key))
replacestr = r"\1{}".format(value)
filedata = "appPackage_name = sqlncli"
#with open(Filepath,'r') as File:
# filedata = File.read()
#File.close()
print ("Before change:",filedata)
filedata = re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
See the Python demo
There are several issues: you should not escape the replacement pattern, only the literal user-defined values in the regex pattern. You can use a capturing group (a pair of unescaped (...)) and a backreference (here, \1 since the group is only one in the pattern) to restore the part of the matched string you need to keep rather than build that replacement string dynamically. As the version value contains dots, you should add a . to the character class, [\da-zA-Z.]. You also need to assign new value after replacing, so as to actually modify it.

find and replace regular expression rather than full string

I've loaded a dictionary of "regex":"picture" pairs parsed from a json.
These values are intended to match the regex within a message string and replace it with the picture for display in a flash plugin that displays HTML text.
for instance typing:
Hello MVGame everyone.
Would return:
Hello <img src='http://static-cdn.jtvnw.net/jtv_user_pictures/chansub-global-emoticon-1a1a8bb5cdf6efb9-24x32.png' height = '32' width = '24'> everyone.
However:
If I type,
Hello :) everyone.
it will not parse the :) because this is encoded as a regular expression "\\:-?\\)" rather than just a string match.
How do I get it to parse the regular expression as the matching parameter?
Here is my test code:
# regular expression test
import urllib
import json # for loading json's for emoticons
import urllib.request # more for loadings jsons from urls
import re # allows pattern filtering for emoticons
def loademotes():
#Create emoteicon dictionary
try:
print ("Trying to load emoteicons from twitch")
response = urllib.request.urlopen('https://api.twitch.tv/kraken/chat/emoticons').read()
mydata = json.loads(response.decode('utf-8'))
for idx,item in enumerate(mydata['emoticons']):
regex = item['regex']
url = "<img src='" + item['images'][0]['url'] + "'" + " height = '" + str(item['images'][0]['height']) + "'" + " width = '" + str(item['images'][0]['width']) + "' >"
emoticonDictionary[regex] = url
print ("All emoteicons loaded")
except IOError as e:
print ("I/O error({0}) : {1}".format(e.errno, e.strerror))
print ("Cannot load emoteicons.")
emoticonDictionary = {} # create emoticon dictionary indexed by words returns url in html image tags
loademotes()
while 1:
myString = input ("Here you type something : ")
pattern = re.compile(r'\b(' + '|'.join(emoticonDictionary.keys()) + r')\b')
results = pattern.sub(lambda x: emoticonDictionary[x.group()], myString)
print (results)

I think you could make sure each syntactic character in regular expressions is surrounded by character classes before you feed it to the re. Like write something that takes :) and makes it [:][)]

Double quote string manipulation

I have some input data from ASCII files which uses double quote to encapsulate string as well as still use double quote inside those strings, for example:
"Reliable" "Africa" 567.87 "Bob" "" "" "" "S 05`56'21.844"" "No Shift"
Notice the double quote used in the coordinate.
So I have been using:
valList = shlex.split(line)
But shlex get's confused with the double quote used as the second in the coordinate.
I've been doing a find and replace on '\"\"' to '\\\"\"'. This of course turns an empty strings to \"" as well so I do a find and replace on (this time with spaces) ' \\\"\" ' to ' \"\"" '. Not exactly the most efficient way of doing it!
Any suggestions on handling this double quote in the coordinate?

I would do it this way:
I would treat this line of text as a csv file. Then according to RFC 4180 :
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
Then all you would need to do is to add another " to your coordinates. So it would look like this "S 0556'21.844"""(NOTE extra quote) Then you can use a standartcsv` module to break it apart and extract necessary information.
>>> from StringIO import StringIO
>>> import csv
>>>
>>> test = '''"Reliable" "Africa" 567.87 "Bob" "" "" "" "S 05`56'21.844""" "No Shift"'''
>>> test_obj = StringIO(test)
>>> reader = csv.reader(test_obj, delimiter=' ', quotechar='"', quoting=csv.QUOTE_ALL)
>>> for i in reader:
... print i
...
The output would be :
['Reliable', 'Africa', '567.87', 'Bob', '', '', '', 'S 05`56\'21.844"', 'No Shift']

I'm not good with regexes, but this non-regex suggestion might help ...
INPUT = ('"Reliable" "Africa" 567.87 "Bob" "" "" "" "S 05`56'
"'"
'21.844"" "No Shift"')
def main(input):
output = input
surrounding_quote_symbol = '<!>'
if input.startswith('"'):
output = '%s%s' % (surrounding_quote_symbol, output[1:])
if input.endswith('"'):
output = '%s%s' % (output[:-1], surrounding_quote_symbol)
output = output.replace('" ', '%s ' % surrounding_quote_symbol)
output = output.replace(' "', ' %s' % surrounding_quote_symbol)
print "Stage 1:", output
output = output.replace('"', '\"')
output = output.replace(surrounding_quote_symbol, '"')
return output
if __name__ == "__main__":
output = main(INPUT)
print "End results:", output

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

issues when using re.finditer with + sign character in string - python

Related

Python - Parse SQL Query to return only fields names using regular expression

Replace a line with a pattern

Regular expression in Python issue

find and replace regular expression rather than full string

Double quote string manipulation

Categories

Resources