python replace() not working as expected - python

My script is supposed to write html files changing the html menu to show the current page as class="current_page_item" so that it will be highlighted when rendered. It has to do two replacements, first set the previous current page to be not current, then set the new current page to current. The two writeText.replace lines do not appear to have any effect. It doesn't give me an error or anything. Any suggestions would be appreciated.
for each in startList:
sectionName = s[each:s.find("\n",each)].split()[1]
if sectionName[-3:] <> "-->":
end = s.find("end "+sectionName+'-->')
sectionText = s[each+len(sectionName)+12:end-1]
writeText = templatetop+"\n"+sectionText+"\n"+templatebottom
writeText.replace('<li class="current_page_item">','<li>')
writeText.replace('<li><a href="'+sectionName+'.html','<li class="current_page_item"><a href="'+sectionName+'.html')
f = open(sectionName+".html", 'w+')
f.write(writeText)
f.close()
Here is part of the string I am targeting (templatetop):
<li class="current_page_item">Home</li>
<li>History</li>
<li>Members</li>

replace returns the resulting string, so you need to do this:
writeText = writeText.replace('<li class="current_page_item">','<li>')
writeText = writeText.replace('<li><a href="'+sectionName+'.html','<li class="current_page_item"><a href="'+sectionName+'.html')

You should not expect that to work, because you should read the documentation:
Return a copy of the string with all occurrences of substring old replaced by new.

So first you replace '<li class="current_page_item">' with '<li>' and then you replace '<li>' with '<li class="current_page_item">'. That's a bit funny, I have to say.
In addition to the problem pointed out by misha, that replace returns the result, your two replacements in fact cancel each other out.
>>> writeText = """<li class="current_page_item">Home</li>
... <li>History</li>
... <li>Members</li>"""
>>> result = writeText.replace('<li class="current_page_item">','<li>')>>> result = result.replace('<li><a href="index.html','<li class="current_page_item"><a href="index.html')
>>> result == writeText
True
Now this is just the first iteration of replacements, but it's a good indication that you are using the wrong solution. It also means you can simply remove the first of the replacements and it will still work.
Also, why are you doing the replacement on writeText, when you are only targeting templatetop?

Related

Replacing [[Words]] with other [[Words]] from a reference file in Notepad++ using Javascript

I have a translation file that looks like this:
Apple=Apfel
Apple pie=Apfelkuchen
Banana=Banane
Bananaisland=Bananen Insel
Cherry=Kirsche
Train=Zug
...500+ more lines like that
now I have a file I need to work on with text. Only certain parts of text needs to be replaced, example:
The [[Apple]] was next to the [[Banana]]. Meanwhile the [[Cherry]] was chilling by the [[Train]].
The [[Apple pie]] tastes great on the [[Bananaisland]].
Result needs to be
The [[Apfel]] was next to the [[Banane]]. Meanwhile the [[Kirsche]] was chilling by the [[Zug]].
The [[Apfelkuchen]] tastes great on the [[Bananen Insel]].
There are way too many incident to copy/paste manually. What is an easy way to search for [[XXX]] and replace from another file as mentioned?
I tried getting help for this for many hours but to no avail. The closest I have gotten was this script:
import re
separators = "=", "\n"
def custom_split(sepr_list, str_to_split):
# create regular expression dynamically
regular_exp = '|'.join(map(re.escape, sepr_list))
return re.split(regular_exp, str_to_split)
with open('D:/_working/paired-search-replace.txt') as f:
for l in f:
s = custom_split(separators, l)
editor.replace(s[0], s[1])
However, this will replace too much, or not consistent. E.g. [[Apple]] gets correctly replaced by [[Apfel]] but [[File:Apple.png]] gets wrongly replaced by [[File:Apfel.png]] and [[Apple pie]] gets replaced by [[Apfel pie]], so I tried tweaking the regular expression for hours on end to no avail. Does anyone have any info -in very simple terms please- how I can fix this/achieve my goal?
This is a little tricky because [ is a meta character in regex.
I'm sure there is a more efficient way to do it but this works:
replaces="""Apple=Apfel
Apple pie=Apfelkuchen
Banana=Banane
Bananaisland=Bananen Insel
Cherry=Kirsche
Train=Zug"""
text = """
The [[Apple]] was next to the [[Banana]]. Meanwhile the [[Cherry]] was chilling by the [[Train]].
The [[Apple pie]] tastes great on the [[Bananaisland]].
"""
if __name__ == '__main__':
import re
for replace in replaces.split('\n'):
english, german = replace.split('=')
text = re.sub(rf'\[\[{english}\]\]', f'[[{german}]]', text)
print(text)
outputs:
The [[Apfel]] was next to the [[Banane]]. Meanwhile the [[Kirsche]] was chilling by the [[Zug]].
The [[Apfelkuchen]] tastes great on the [[Bananen Insel]].
First, read in the file with translations:
translations={}
with open('file/with/translations.txt', 'r', encoding='utf-8') as f:
for line in f:
items = line.strip().split('=', 1)
translations[items[0]] = items[1]
I assume the phrases/words are unique in the file.
Then, you need to match all substrings between [[ and ]], capture the text in between (with a regex like \[\[(.*?)]], see the online demo), check if there is a key with the group 1 value in the translations dictionary, and replace with [[ + dictionary value + ]] if there is such a key, or return the whole match if there is no such a translation:
text = """The [[Apple]] was next to the [[Banana]]. Meanwhile the [[Cherry]] was chilling by the [[Train]].
The [[Apple pie]] tastes great on the [[Bananaisland]]."""
import re
translated_text = re.sub(r"\[\[(.*?)]]", lambda x: f'[[{translations[x.group(1)]}]]' if x.group(1) in translations else x.group(), text)
Output:
>>> translated_text
'The [[Apfel]] was next to the [[Banane]]. Meanwhile the [[Kirsche]] was chilling by the [[Zug]]. \nThe [[Apfelkuchen]] tastes great on the [[Bananen Insel]].'

Replacing specific substrings in a specific part of a string

I have a following text file that is to be edited in a certain manner. The part of the file that comes to inside the (init: part is to be overwritten and nothing except that should be edited.
File:
(define (problem bin-picking-doosra)
(:domain bin-picking-second)
;(:requirements :typing :negative-preconditions)
(:objects
)
(:init
(batsmen first_batsman)
(bowler none_bowler)
(umpire third_umpire)
(spectator no_spectator)
)
(:goal (and
(batsmen first_batsman)
(bowler last_bowler)
(umpire third_umpire)
(spectator full_spectator)
)
)
)
In this file I want replace every line that is inside the (init: section with the required string. In this case, I want to replace:
(batsmen first_batsman) with (batsmen none_batsmen)
(bowler none_bowler) with (bowler first_bowler)
(umpire third_umpire) with (umpire leg_umpire)
(spectator no_spectator) with (spectator empty_spectator)
The code I currently have the following:
file_path = "/home/mus/problem_turtlebot.pddl"
s = open(file_path).read()
s = s.replace('(batsmen first_batsman)', '(batsmen '+ predicate_batsmen + '_batsman)')
f = open(file_path, 'w')
f.write(s)
f.close()
The term predicate_batsmen here contains the word none. It works fine this way. This code only satisfies point number 1. mentioned above
There are three problems that I have.
This code also changes the '(batsmen first_batsmen)' part in (goal: part which I dont want. I only want it to change the (init: part
Currently for the other strings in the (init: part, I have to redo this code with different statement. For eg: for '(bowler none_bowler)' i.e. point number 2 above, I have to have a copy of the coded lines again which I think is a not a good coding technique. Any better way for it.
If we consider the first string in (init: that is to be overwritten i.e (batsmen first_batsman). Is there a way in python that no matter what matter what is written in the question mark part of the string like (batsmen ??????_batsman) could be replaced with none. For now it is 'first' but even if it is written 'second'((batsmen second_batsman)) or 'last' ((batsmen last_batsman)) , I want to replace it with 'none'(batsmen none_batsman).
Any ideas on these issues?
Thanks
First of all you need to find the init-group. The init-group seems to have the structure:
(:init
...
)
where ... is some recurrence of text contained inside parenthesis, e.g. "(batsmen first_batsman)". Regular expressions is a powerful way to locate these kind of patterns in text. If you are not familiar with regular expressions (or regex for short) have a look here.
The following regex locates this group:
import re
#Matches the items in the init-group:
item_regex = r"\([\w ]+\)\s+"
#Matches the init-group including items:
init_group_regex = re.compile(r"(\(:init\s+({})+\))".format(item_regex))
init_group = init_group_regex.search(s).group()
Now you have the init-group in match. The next step is to locate the term you would want to replace, and actually replace it. re.sub can do just that! First store the mappings in a dictionary:
mappings = {'batsmen first_batsman': 'batsmen '+ predicate_batsmen + '_batsman',
'bowler none_bowler': 'bowler first_bowler',
'umpire third_umpire': 'umpire leg_umpire',
'spectator no_spectator': 'spectator empty_spectator'}
Finding the occurrences and replacing them by their corresponding value one-by-one:
for key, val in mappings.items():
init_group = re.sub(key, val, init_group)
Finally you can replace the init-group in the original string:
s = init_group_regex.sub(init_group, s)
This is really flexible! You can use regex in mappings to have it match anything you like, including:
mappings = {'batsmen \w+_batsman': '(batsmen '+ predicate_batsmen + '_batsman)'}
to match 'batsmen none_batsman', 'batsmen first_batsman' etc.

Python: use a list index as a function argument

I'm trying to use list indices as arguments for a function that performs regex searches and substitutions over some text files. The different search patterns have been assigned to variables and I've put the variables in a list that I want to feed the function as it loops through a given text.
When I call the function using a list index as an argument nothing happens (the program runs, but no substitutions are made in my text files), however, I know the rest of the code is working because if I call the function with any of the search variables individually it behaves as expected.
When I give the print function the same list index as I'm trying to use to call my function it prints exactly what I'm trying to give as my function argument, so I'm stumped!
search1 = re.compile(r'pattern1')
search2 = re.compile(r'pattern2')
search3 = re.compile(r'pattern3')
searches = ['search1', 'search2', 'search2']
i = 0
for …
…
def fun(find)
…
fun(searches[i])
if i <= 2:
i += 1
…
As mentioned, if I use fun(search1) the script edits my text files as wished. Likewise, if I add the line print(searches[i]) it prints search1 (etc.), which is what I'm trying to give as an argument to fun.
Being new to Python and programming, I've a limited investigative skill set, but after poking around as best I could and subsequently running print(searches.index(search1) and getting a pattern1 is not in list error, my leading (and only) theory is that I'm giving my function the actual regex expression rather than the variable it's stored in???
Much thanks for any forthcoming help!
Try to changes your searches list to be [search1, search2, search3] instead of ['search1', 'search2', 'search2'] (in which you just use strings and not regex objects)
Thanks to all for the help. eyl327's comment that I should use a list or dictionary to store my regular expressions pointed me in the right direction.
However, because I was using regex in my search patterns, I couldn't get it to work until I also created a list of compiled expressions (discovered via this thread on stored regex strings).
Very appreciative of juanpa.arrivillaga point that I should have proved a MRE (please forgive, with a highly limited skill set, this in itself can be hard to do), I'll just give an excerpt of a slightly amended version of my actual code demonstrating the answer (one again, please forgive its long-windedness, I'm not presently able to do anything more elegant):
…
# put regex search patterns in a list
rawExps = ['search pattern 1', 'search pattern 2', 'search pattern 3']
# create a new list of compiled search patterns
compiledExps = [regex.compile(expression, regex.V1) for expression in rawExps]
i = 0
storID = 0
newText = ""
for file in filepathList:
for expression in compiledExps:
with open(file, 'r') as text:
thisText = text.read()
lines = thisThis.splitlines()
setStorID = regex.search(compiledExps[i], thisText)
if setStorID is not None:
storID = int(setStorID.group())
for line in lines:
def idSub(find):
global storID
global newText
match = regex.search(find, line)
if match is not None:
newLine = regex.sub(find, str(storID), line) + "\n"
newText = newText + newLine
storID = plus1(int(storID), 1)
else:
newLine = line + "\n"
newText = newText + newLine
# list index number can be used as an argument in the function call
idSub(compiledExps[i])
if i <= 2:
i += 1
write()
newText = ""
i = 0

How to pass string variable into search function?

I am having issues passing a string variable into a search function.
Here is what I'm trying to accomplish:
I have a file full of values and I want to check the file to make sure a specific matching line exists before I proceed. I want to ensure that the line <endSW=UNIQUE-DNS-NAME-HERE<> exists if a valid <begSW=UNIQUE-DNS-NAME-HERE<> exists and is reachable.
Everything works fine until I call if searchForString(searchString,fileLoc): which always returns false. If I assign the variable 'searchString' a direct value and pass it it works, so I know it must be something with the way I'm combining the strings, but I can't seem to figure out what I'm doing wrong.
If I examine the data that 'searchForString' is using I see what seems to be valid values:
values in fileLines list:
['<begSW=UNIQUE-DNS-NAME-HERE<>', ' <begPortType=UNIQUE-PORT-HERE<>', ' <portNumbers=80,443,22<>', ' <endPortType=UNIQUE-PORT-HERE<>', '<endSW=UNIQUE-DNS-NAME-HERE<>']
value of searchVar:
<endSW=UNIQUE-DNS-NAME-HERE<>
An example of the entry in the file is:
<begSW=UNIQUE-DNS-NAME-HERE<>
<begPortType=UNIQUE-PORT-HERE<>
<portNumbers=80,443,22<>
<endPortType=UNIQUE-PORT-HERE<>
<endSW=UNIQUE-DNS-NAME-HERE<>
Here is the code in question:
def searchForString(searchVar,readFile):
with open(readFile) as findMe:
fileLines = findMe.read().splitlines()
print fileLines
print searchVar
if searchVar in fileLines:
return True
return False
findMe.close()
fileLoc = '/dir/folder/file'
fileLoc.lstrip()
fileLoc.rstrip()
with open(fileLoc,'r') as switchFile:
for line in switchFile:
#declare all the vars we need
lineDelimiter = '#'
endLine = '<>\n'
begSWLine= '<begSW='
endSWLine = '<endSW='
begPortType = '<begPortType='
endPortType = '<endPortType='
portNumList = '<portNumbers='
#skip over commented lines -(REMOVE THIS)
if line.startswith(lineDelimiter):
pass
#checks the file for a valid switch name
#checks to see if the host is up and reachable
#checks to see if there is a file input is valid
if line.startswith(begSWLine):
#extract switch name from file
switchName = line[7:-3]
#check to make sure switch is up
if pingCheck(switchName):
print 'Ping success. Host is reachable.'
searchString = endSWLine+switchName+'<>'
**#THIS PART IS SUCKING, WORKS WITH DIRECT STRING PASS
#WONT WORK WITH A VARIABLE**
if searchForString(searchString,fileLoc):
print 'found!'
else:
print 'not found'
Any advice or guidance would be extremely helpful.
Hard to tell without the file's contents, but I would try
switchName = line[7:-2]
So that would look like
>>> '<begSW=UNIQUE-DNS-NAME-HERE<>'[7:-2]
'UNIQUE-DNS-NAME-HERE'
Additionally, you could look into regex searches to make your cleanup more versatile.
import re
# re.findall(search_expression, string_to_search)
>>> re.findall('\=(.+)(?:\<)', '<begSW=UNIQUE-DNS-NAME-HERE<>')[0]
'UNIQUE-DNS-NAME-HERE'
>>> e.findall('\=(.+)(?:\<)', ' <portNumbers=80,443,22<>')[0]
'80,443,22'
I found how to recursively iterate over XML tags in Python using ElementTree? and used the methods detailed to parse an XML file instead of using a TXT file.

String comparing in python

I have an array of strings like
urls_parts=['week', 'weeklytop', 'week/day']
And i need to monitor inclusion of this strings in my url, so this example needs to be triggered by weeklytop part only:
url='www.mysite.com/weeklytop/2'
for part in urls_parts:
if part in url:
print part
But it is of course triggered by 'week' too.
What is the way to do it right?
OOps, let me specify my question a bit.
I need that code not to trigger when url='www.mysite.com/week/day/2' and part='week'
The only url needed to trigger on is when the part='week' and the url='www.mysite.com/week/2' or 'www.mysite.com/week/2-second' for example
This is how I would do it.
import re
urls_parts=['week', 'weeklytop', 'week/day']
urls_parts = sorted(urls_parts, key=lambda x: len(x), reverse=True)
rexes = [re.compile(r'{part}\b'.format(part=part)) for part in urls_parts]
urls = ['www.mysite.com/weeklytop/2', 'www.mysite.com/week/day/2', 'www.mysite.com/week/4']
for url in urls:
for i, rex in enumerate(rexes):
if rex.search(url):
print url
print urls_parts[i]
print
break
OUTPUT
www.mysite.com/weeklytop/2
weeklytop
www.mysite.com/week/day/2
week/day
www.mysite.com/week/4
week
Suggestion to sort by length came from #Roman
Sort you list by len and break from the loop at first match.
try something like this:
>>> print(re.findall('\\weeklytop\\b', 'www.mysite.com/weeklytop/2'))
['weeklytop']
>>> print(re.findall('\\week\\b', 'www.mysite.com/weeklytop/2'))
[]
program:
>>> urls_parts=['week', 'weeklytop', 'week/day']
>>> url='www.mysite.com/weeklytop/2'
>>> for parts in urls_parts:
if re.findall('\\'+parts +r'\b', url):
print (parts)
output:
weeklytop
Why not use urls_parts like this?
['/week/', '/weeklytop/', '/week/day/']
A slight change in your code would solve this issue -
>>> for part in urls_parts:
if part in url.split('/'): #splitting the url string with '/' as delimiter
print part
weeklytop

Categories