Search Patterns replacement using lambda - python

I need to write into a file with Before and after search replacement patterns. I have written the below code. I have used function in writing to output file and it worked fine. But i have around 20 such replacement patterns and i feel i am not writing a good code because i need to create functions for all those replacements. Can you please let me know is there any other way in implementing this?
import re
Report_file = open("report.txt", "w")
st = '''<TimeLog>
<InTime='10Azx'>1056789</InTime>
<OutTime='14crg'>1056867</OutTime>
<PsTime='32lxn'>1056935</PsTime>
<ClrTime='09zvf'>1057689</ClrTime>
</TimeLog>'''
def tcnv(str):
Report_file.write("Previous TS: " + str + "\n\n")
v1 = re.search(r"(?i)<clrtime='(\d+\w+)'>", str)
val1 = v1.group(1)
v2 = re.search(r"(?i)(<clrtime='(\d+\w+)'>(.*?)</clrtime>)", str)
val2 = v2.group(3)
soutval = "<Clzone><clnvl='" + val1 + "'>" + val2 + "</clnvl></Clzone>"
Report_file.write("New TS: " + soutval + "\n")
return soutval
st = re.sub(r"(?i)(<clrtime='(\d+\w+)'>(.*?)</clrtime>)", lambda m: tcnv(m.group(1)), st)
st = re.sub(r"(?i)<intime='(\d+\w+)'>(.*?)</intime>", "<Izone><Invl='\\1'>\\2</Invl></Izone>", st)
st = re.sub(r"(?i)<outtime='(\d+\w+)'>(.*?)</outtime>", "<Ozone><onvl='\\1'>\\2</onnvl></Ozone>", st)
st = re.sub(r"(?i)<pstime='(\d+\w+)'>(.*?)</pstime>", "<Pszone><psnvl='\\1'>\\2</psnvl

I didn't see why you put the re.IGNORECASE flag under the form of (?i), so I don't use it the following solution, and the pattern is written with the uppercased letters where necessary according to your sample
Note that you should use the with statement to open the files, it would be far better:
with open('filename.txt','rb') as f:
ch = f.read()
The answer
import re
st = '''<InTime='10Azx'>1056789</InTime>
<OutTime='14crg'>1056867</OutTime>
<PsTime='32lxn'>1056935</PsTime>
<ClrTime='09zvf'>1057689</ClrTime>
'''
d = dict(zip(('InTime','OutTime','PsTime','ClrTime'),
(('Izone><Invl','/Invl></Izone'),
('Ozone><onvl','/onnvl></Ozone'),
('Pszone><psnvl','/psnvl></Pszone'),
('Clzone><clnvl','/clnvl></Clzone'))
)
)
def ripl(ma,d=d):
return "<{}='{}'>{}<{}>".format(d[ma.group(1)][0],
ma.group(2),
ma.group(3),
d[ma.group(1)][1])
st2 = re.sub(r"<(InTime|OutTime|PsTime|ClrTime)='(\d+\w+)'>(.*?)</\1>",
ripl, st)
print '%s\n\n%s\n' % (st,st2)

Related

String Operation on captured group in re Python

I have a string:
str1 = "abc = def"
I want to convert it to:
str2 = "abc = #Abc#"
I am trying this:
re.sub("(\w+) = (\w+)",r"\1 = %s" % ("#"+str(r"\1").title()+"#"),str1)
but it returns: (without the string operation done)
"abc = #abc#"
What is the possible reason .title() is not working.?
How to use string operation on the captured group in python?
You can see what's going on with the help of a little function:
import re
str1 = "abc = def"
def fun(m):
print("In fun(): " + m)
return m
str2 = re.sub(r"(\w+) = (\w+)",
r"\1 = %s" % ("#" + fun(r"\1") + "#"),
# ^^^^^^^^^^
str1)
Which yields
In fun(): \1
So what you are basically trying to do is to change \1 (not the substitute!) to an uppercase version which obviously remains \1 literally. The \1 is replaced only later with the captured content than your call to str.title().
Go with a lambda function as proposed by #Rakesh.
Try using lambda.
Ex:
import re
str1 = "abc = def"
print( re.sub("(?P<one>(\w+)) = (\w+)",lambda match: r'{0} = #{1}#'.format(match.group('one'), match.group('one').title()), str1) )
Output:
abc = #Abc#

Regular expression in Python issue

I have the below code in one of my configuration files:
appPackage_name = sqlncli
appPackage_version = 11.3.6538.0
The left side is the key and the right side is value.
Now i want to be able to replace the value part with something else given a key in Python.
import re
Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
searchstr = re.escape(key) + " = [\da-zA-Z]+"
replacestr = re.escape(key) + " = " + re.escape(value)
filedata = ""
with open(Filepath,'r') as File:
filedata = File.read()
File.close()
print ("Before change:",filedata)
re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
I assume there is something wrong with the regex i am using. But i am not able to figure out what . Can someone please help me ?
Use the following fix:
import re
#Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
#searchstr = re.escape(key) + " = [\da-zA-Z]+"
#replacestr = re.escape(key) + " = " + re.escape(value)
searchstr = r"({} *= *)[\da-zA-Z.]+".format(re.escape(key))
replacestr = r"\1{}".format(value)
filedata = "appPackage_name = sqlncli"
#with open(Filepath,'r') as File:
# filedata = File.read()
#File.close()
print ("Before change:",filedata)
filedata = re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
See the Python demo
There are several issues: you should not escape the replacement pattern, only the literal user-defined values in the regex pattern. You can use a capturing group (a pair of unescaped (...)) and a backreference (here, \1 since the group is only one in the pattern) to restore the part of the matched string you need to keep rather than build that replacement string dynamically. As the version value contains dots, you should add a . to the character class, [\da-zA-Z.]. You also need to assign new value after replacing, so as to actually modify it.

Delete Specific Part Of A String

I want to delete the part after the last '/' of a string in this following way:
str = "live/1374385.jpg"
formated_str = "live/"
or
str = "live/examples/myfiles.png"
formated_str = "live/examples/"
I have tried this so far ( working )
import re
for i in re.findall('(.*?)/',str):
j += i
j += '/'
Output :
live/ or live/examples/
I am a beginner to python so just curious is there any other way to do that .
Use rsplit:
str = "live/1374385.jpg"
print (str.rsplit('/', 1)[0] + '/')
live/
str = "live/examples/myfiles.png"
print (str.rsplit('/', 1)[0] + '/')
live/examples/
You can also use .rindex string method:
s = 'live/examples/myfiles.png'
s[:s.rindex('/')+1]
#!/usr/bin/python
def removePart(_str1):
return "/".join(_str1.split("/")[:-1])+"/"
def main():
print removePart("live/1374385.jpg")
print removePart("live/examples/myfiles.png")
main()

How do I replace a specific part of a string in Python

As of now I am trying to scrape Good.is.The code as of now gives me the regular image(turn the if statement to True) but I want to higher res picture. I was wondering how I would replace a certain text so that I could download the high res picture. I want to change the html: http://awesome.good.is/transparency/web/1207/invasion-of-the-drones/flash.html to http://awesome.good.is/transparency/web/1207/invasion-of-the-drones/flat.html (The end is different). My code is:
import os, urllib, urllib2
from BeautifulSoup import BeautifulSoup
import HTMLParser
parser = HTMLParser.HTMLParser()
# make folder.
folderName = 'Good.is'
if not os.path.exists(folderName):
os.makedirs(folderName)
list = []
# Python ranges start from the first argument and iterate up to one
# less than the second argument, so we need 36 + 1 = 37
for i in range(1, 37):
list.append("http://www.good.is/infographics/page:" + str(i) + "/sort:recent/range:all")
listIterator1 = []
listIterator1[:] = range(0,37)
counter = 0
for x in listIterator1:
soup = BeautifulSoup(urllib2.urlopen(list[x]).read())
body = soup.findAll("ul", attrs = {'id': 'gallery_list_elements'})
number = len(body[0].findAll("p"))
listIterator = []
listIterator[:] = range(0,number)
for i in listIterator:
paragraphs = body[0].findAll("p")
nextArticle = body[0].findAll("a")[2]
text = body[0].findAll("p")[i]
if len(paragraphs) > 0:
#print image['src']
counter += 1
print counter
print parser.unescape(text.getText())
print "http://www.good.is" + nextArticle['href']
originalArticle = "http://www.good.is" + nextArticle['href']
article = BeautifulSoup(urllib2.urlopen(originalArticle).read())
title = article.findAll("div", attrs = {'class': 'title_and_image'})
getTitle = title[0].findAll("h1")
article1 = article.findAll("div", attrs = {'class': 'body'})
articleImage = article1[0].find("p")
betterImage = articleImage.find("a")
articleImage1 = articleImage.find("img")
paragraphsWithinSection = article1[0].findAll("p")
print betterImage['href']
if len(paragraphsWithinSection) > 1:
articleText = article1[0].findAll("p")[1]
else:
articleText = article1[0].findAll("p")[0]
print articleImage1['src']
print parser.unescape(getTitle)
if not articleText is None:
print parser.unescape(articleText.getText())
print '\n'
link = articleImage1['src']
x += 1
actually_download = False
if actually_download:
filename = link.split('/')[-1]
urllib.urlretrieve(link, filename)
Have a look at str.replace. If that isn't general enough to get the job done, you'll need to use a regular expression ( re -- probably re.sub ).
>>> str1="http://awesome.good.is/transparency/web/1207/invasion-of-the-drones/flash.html"
>>> str1.replace("flash","flat")
'http://awesome.good.is/transparency/web/1207/invasion-of-the-drones/flat.html'
I think the safest and easiest way is to use a regular expression:
import re
url = 'http://www.google.com/this/is/sample/url/flash.html'
newUrl = re.sub('flash\.html$','flat.html',url)
The "$" means only match the end of the string. This solution will behave correctly even in the (admittedly unlikely) event that your url includes the substring "flash.html" somewhere other than the end, and also leaves the string unchanged (which I assume is the correct behavior) if it does not end with 'flash.html'.
See: http://docs.python.org/library/re.html#re.sub
#mgilson has a good solution, but the problem is it will replace all occurrences of the string with the replacement; so if you have the word "flash" as part of the URL (and not the just the trailing file name), you'll have multiple replacements:
>>> str = 'hello there hello'
>>> str.replace('hello','world')
'world there world'
An alternate solution is to replace the last part after / with flat.html:
>>> url = 'http://www.google.com/this/is/sample/url/flash.html'
>>> url[:url.rfind('/')+1]+'flat.html'
'http://www.google.com/this/is/sample/url/flat.html'
Using urlparse you can do a few bits and bobs:
from urlparse import urlsplit, urlunsplit, urljoin
s = 'http://awesome.good.is/transparency/web/1207/invasion-of-the-drones/flash.html'
url = urlsplit(s)
head, tail = url.path.rsplit('/', 1)
new_path = head, 'flat.html'
print urlunsplit(url._replace(path=urljoin(*new_path)))

Select lines stack python

i writen this code:
import os
import re
import string
##
Path = 'C:/RESULT/BATCH/'
##
Nfile = 'Skin_Refined_v05'
f=open(Path + Nfile + '.inp')
n=open(Path + 'newfile.inp', 'w')
for lines, text in enumerate(f):
found = text.find('*SURFACE')
while found > -1:
print found, lines, text
found = text.find('*SURFACE', found + 1)
n.write(text)
##
f.close()
n.close()
This is what *.inp looks like (usually about 30Mb)
*SURFACE, NAME = BOTTOM, TYPE = ELEMENT
40012646, S2
40012647, S2
40012648, S2
40012649, S2
40012650, S2
40012651, S2
*SURFACE, NAME = ALL_INT_TIE_1, TYPE = ELEMENT
40243687, S3
40243703, S3
40243719, S3
40243735, S3
40243751, S3
40243767, S3
**
*TIE, NAME = INTERNAL_TIE, POSITION TOLERANCE = 1.0 , ADJUST=NO
SLAVE,MASTER
*TIE, NAME = SKN_REF_1
ALL_INT_FRONT, ALL_EXT_FRONT
*TIE, NAME = SKIN_LAT
ALL_INT_LAT, ALL_EXT_LAT
*TIE, NAME = SKIN_TIE_1
ALL_INT_TIE_1, ALL_INT_TIE_2
**
*SURFACE , NAME = TOP, COMBINE = UNION
TOP_1
TOP_2
**HM_UNSUPPORTED_CARDS
*END PART
*****
what he does it is clear. what I would like to achive is to get all the line between the *SURFACE that begin with a number, which then I will have to arrange differently, but I will worry about that later.
I rewrote the code cos i could not get it to work as suggested, now it is creating the blocks as I need them, but how do i work on each block?
I need to separate all the elements (number followed by S1, S2 and so on) and create groups for each block sorted by S1, S2 and so on the final result should look like
*ELSET, ELSET=TOP_S1
40221320, 40221306, 40221305, 40221304, 40221290, 40221289, 40221288, 40221274,
40221273, 40221272, 40221258, 40221257, 40221256, 40221242, 40221241, 40221240,
*SURFACE, NAME = TOP, TYPE = ELEMENT
TOP_S1,S1
import os
import re
import string
##
Path = 'C:/RESULT/BATCH/'
##
Nfile = 'Skin_Refined_v05'
f=open(Path + Nfile + '.inp')
n=open(Path + 'newfile.inp', 'w')
in_surface_block = False;
for line_num, text in enumerate(f):
found = text.find('*SURFACE')
if found > -1:
in_surface_block=True;
print found, line_num, text
surface_lines = []
continue
if in_surface_block:
m = re.match('\s*\d+\,\s*\w\d+',text)
if m:
mtext = m.group(0)
## p=surface_lines.append(text)
print mtext
## ntext = surface_lines.append(m.group(0))
## n.write(ntext)
##
f.close()
n.close()
I hope it is clear
I think this will do what you want:
import os
import re
##
Path = 'C:/RESULT/BATCH/'
##
Nfile = 'Skin_Refined_v05'
f=open(Path + Nfile + '.inp')
n=open(Path + 'newfile.inp', 'w')
in_surface_block = False;
for line_num, text in enumerate(f):
found = text.find('*SURFACE')
if found > -1:
in_surface_block=True;
print found, line_num, text
surface_lines = []
continue
if in_surface_block:
if re.match('\s*\d+', text):
surface_lines.append(text)
else:
in_surface_block = False
// do surface lines work here:
// surface_lines is a list with all the lines in a surface block
// that start with a number
...
##
f.close()
n.close()
Edit: Fixed logic error

Categories