Regular expression in Python issue - python

I have the below code in one of my configuration files:
appPackage_name = sqlncli
appPackage_version = 11.3.6538.0
The left side is the key and the right side is value.
Now i want to be able to replace the value part with something else given a key in Python.
import re
Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
searchstr = re.escape(key) + " = [\da-zA-Z]+"
replacestr = re.escape(key) + " = " + re.escape(value)
filedata = ""
with open(Filepath,'r') as File:
filedata = File.read()
File.close()
print ("Before change:",filedata)
re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
I assume there is something wrong with the regex i am using. But i am not able to figure out what . Can someone please help me ?

Use the following fix:
import re
#Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
#searchstr = re.escape(key) + " = [\da-zA-Z]+"
#replacestr = re.escape(key) + " = " + re.escape(value)
searchstr = r"({} *= *)[\da-zA-Z.]+".format(re.escape(key))
replacestr = r"\1{}".format(value)
filedata = "appPackage_name = sqlncli"
#with open(Filepath,'r') as File:
# filedata = File.read()
#File.close()
print ("Before change:",filedata)
filedata = re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
See the Python demo
There are several issues: you should not escape the replacement pattern, only the literal user-defined values in the regex pattern. You can use a capturing group (a pair of unescaped (...)) and a backreference (here, \1 since the group is only one in the pattern) to restore the part of the matched string you need to keep rather than build that replacement string dynamically. As the version value contains dots, you should add a . to the character class, [\da-zA-Z.]. You also need to assign new value after replacing, so as to actually modify it.

Related

issues when using re.finditer with + sign character in string

I am using the following code to find the location the start index of some strings as well as a temperature all of which are read from a text file.
The array searchString, contains what I'm looking for. It does locate the index of the first character of each string. The issue is that unless I put the backslash in front of the string: +25°C, finditer gives an error.
(Alternately, if I remove the + sign, it works - but I need to look for the specific +25). My question is am I correctly escaping the + sign, since the line: print('Looking for: ' + headerName + ' in the file: ' + filename )
displays : Looking for: +25°C in the file: 123.txt (with the slash showing in front of of the +)
Am I just 'getting away with this', or is this escaping as it should?
thanks
import re
path = 'C:\mypath\\'
searchString =["Power","Cal", "test", "Frequency", "Max", "\+25°C"]
filename = '123.txt' # file name to check for text
def search_str(file_path):
with open(file_path, 'r') as file:
content = file.read()
for headerName in searchString:
print('Looking for: ' + headerName + ' in the file: ' + filename )
match =re.finditer(headerName, content)
sub_indices=[]
for temp in match:
index = temp.start()
sub_indices.append(index)
print(sub_indices ,'\n')
You should use the re.escape() function to escape your string pattern. It will escape all the special characters in given string, for example:
>>> print(re.escape('+25°C'))
\+25°C
>>> print(re.escape('my_pattern with specials+&$#('))
my_pattern\ with\ specials\+\&\$#\(
So replace your searchString with literal strings and try it with:
def search_str(file_path):
with open(file_path, 'r') as file:
content = file.read()
for headerName in searchString:
print('Looking for: ' + headerName + ' in the file: ' + filename )
match =re.finditer(re.escape(headerName), content)
sub_indices=[]
for temp in match:
index = temp.start()
sub_indices.append(index)
print(sub_indices ,'\n')

Sort a file with a specific line pattern in Python

Given a file with the following content:
enum class Fruits(id: String) {
BANANA(id = "banana"),
LEMON(id = "lemon"),
DRAGON_FRUIT(id = "dragonFruit"),
APPLE(id = "apple"); }
I want to sort this file given the pattern "id = ", and then replace these lines with the new sorted lines.
I wrote a piece of code in python that sorts the whole file, but I'm struggling with regex to read/find the pattern so I can sort it.
My python script:
import re
fruitsFile = '/home/genericpath/Fruits.txt'
def sortFruitIds():
# this is an attempt to get/find the pattern, but it return an AttributeError:
# 'NoneType' object has no attribute 'group'
with open(fruitsFile, "r+") as f:
lines = sorted(f, key=lambda line: str(re.search(r"(?<=id = )\s+", line)))
for line in lines:
f.write(line)
When trying to find the pattern with regex, it returns an AttributeError: 'NoneType' object has no attribute 'group'
Any help is appreciated.
Looks like your main issue is that your regex expects a space character \s but what you want to be looking for is any non-space character \S. With that in mind this should work:
import re
fruitsFile = 'Fruits.txt'
def sortFruitIds():
with open(fruitsFile, "r+") as f:
lines = f.readlines()
lines_sorted = sorted(lines, key=lambda line: re.search(r"(?<=id = \")\S+|$", line).group())
for line in lines_sorted:
f.write(line)
I also added |$ to the regex to return an empty string if there is no match, and added group() to grab the match.
We can approach this by doing a regex find all for all entries in the enum. Then sort them alphabetically by the id string value, and join together the final enum code. Note that below I also extract the first line of the enum for use later in the output.
inp = '''enum class Fruits(id: String) {
BANANA(id = "banana"),
LEMON(id = "lemon"),
DRAGON_FRUIT(id = "dragonFruit"),
APPLE(id = "apple"); }'''
header = re.search(r'enum.*?\{', inp).group()
items = re.findall(r'\w+\(id\s*=\s*".*?"\)', inp)
items.sort(key=lambda m: re.search(r'"(.*?)"', m).group(1))
output = header + '\n ' + ',\n '.join(items) + '; }'
print(output)
This prints:
enum class Fruits(id: String) {
APPLE(id = "apple"),
BANANA(id = "banana"),
DRAGON_FRUIT(id = "dragonFruit"),
LEMON(id = "lemon"); }

Replace a line with a pattern

I am trying to replace a line when a pattern (only one pattern I have in that file) found with the below code, but it replaced whole content of the file.
Could you please advise or any better way with pathlib ?
import datetime
def insert_timestamp():
""" To Update the current date in DNS files """
pattern = '; serial number'
current_day = datetime.datetime.today().strftime('%Y%m%d')
subst = "\t" + str(current_day) + "01" + " ; " + pattern
print(current_day)
with open(lab_net_file, "w+") as file:
for line in file:
file.write(line if pattern not in line else line.replace(pattern, subst))
lab_net_file = '/Users/kams/nameserver/10_15'
insert_timestamp()
What you would want to do is read the file, replace the pattern, and write to it again like this:
with open(lab_net_file, "r") as file:
read = file.read()
read = read.replace(pattern, subst)
with open(lab_net_file, "w") as file:
file.write(read)
The reason that you don't need to use if/else is because if there is no pattern inside read, then .replace won't do anything, and you don't need to worry about it. If pattern is inside read, then .replace will replace it throughout the entire string.
I am able to get the output I wanted with this block of code.
def insert_timestamp(self):
""" To Update the current date in DNS files """
pattern = re.compile(r'\s[0-9]*\s;\sserial number')
current_day = datetime.datetime.today().strftime('%Y%m%d')
subst = "\t" + str(current_day) + "01" + " ; " + 'serial number'
with open(lab_net_file, "r") as file:
reading_file = file.read()
pattern = pattern.search(reading_file).group()
reading_file = reading_file.replace(pattern, subst)
with open(lab_net_file, "w") as file:
file.write(reading_file)
Thank you #Timmy

Python string formatting creating a random new line on print statement

Here is my function. Trying to get this all to print to one line.
Here is the output ->
config::$var['pdf']['meta']['staff_member_name']
= ";"
The = ";" portion of the string prints to a new line in the console for some reason?
This is totally just a personal hack to help with a repetitious job requirement so i'm not looking for anything fancy.
Here is my function ->
def auto_pdf_config(file):
with open(file) as f:
content = f.readlines()
kill = " = array("
start = "config::$var['intake']"
new_line = ""
for line in content:
if kill not in line:
pass
elif start in line:
new_line = line
x = new_line.replace(kill, "")
y = x.replace(start,"")
pdf_end = ' = ";" '
z = "config::$var['pdf']['meta']{}{}".format(y,pdf_end)
print(z)
it seems you "y" variable has new line in it. you can try to strip it off.
y = x.replace(start,"").strip('\n')
Since x = new_line.replace(kill, ""), y = x.replace(start,""), and new_line is the line of content, it contains endline symbol (\n), that's why this endline symbol is appended before pdf_end. You just need to remove endline symbol from y.
You can do something like that:
y = y.strip('\n')

find and replace regular expression rather than full string

I've loaded a dictionary of "regex":"picture" pairs parsed from a json.
These values are intended to match the regex within a message string and replace it with the picture for display in a flash plugin that displays HTML text.
for instance typing:
Hello MVGame everyone.
Would return:
Hello <img src='http://static-cdn.jtvnw.net/jtv_user_pictures/chansub-global-emoticon-1a1a8bb5cdf6efb9-24x32.png' height = '32' width = '24'> everyone.
However:
If I type,
Hello :) everyone.
it will not parse the :) because this is encoded as a regular expression "\\:-?\\)" rather than just a string match.
How do I get it to parse the regular expression as the matching parameter?
Here is my test code:
# regular expression test
import urllib
import json # for loading json's for emoticons
import urllib.request # more for loadings jsons from urls
import re # allows pattern filtering for emoticons
def loademotes():
#Create emoteicon dictionary
try:
print ("Trying to load emoteicons from twitch")
response = urllib.request.urlopen('https://api.twitch.tv/kraken/chat/emoticons').read()
mydata = json.loads(response.decode('utf-8'))
for idx,item in enumerate(mydata['emoticons']):
regex = item['regex']
url = "<img src='" + item['images'][0]['url'] + "'" + " height = '" + str(item['images'][0]['height']) + "'" + " width = '" + str(item['images'][0]['width']) + "' >"
emoticonDictionary[regex] = url
print ("All emoteicons loaded")
except IOError as e:
print ("I/O error({0}) : {1}".format(e.errno, e.strerror))
print ("Cannot load emoteicons.")
emoticonDictionary = {} # create emoticon dictionary indexed by words returns url in html image tags
loademotes()
while 1:
myString = input ("Here you type something : ")
pattern = re.compile(r'\b(' + '|'.join(emoticonDictionary.keys()) + r')\b')
results = pattern.sub(lambda x: emoticonDictionary[x.group()], myString)
print (results)
I think you could make sure each syntactic character in regular expressions is surrounded by character classes before you feed it to the re. Like write something that takes :) and makes it [:][)]

Categories