Replace substring surrounding main string

Replace substring surrounding main string - python

I have a lot of strings like the following:
\frac{l_{2}\,\mathrm{phi2dd}\,\sin\left(\varphi _{2}\right)}{2}
I want to replace the \frac{***}{2} to \frac{1}{2} ***
The desired string would then become:
\frac{1}{2} l_{2}\,\mathrm{phi2dd}\,\sin\left(\varphi _{2}\right)
I thought I could use a regular expression to do so, but I can't quite figure out how to extract the 'main string' from the substring.
Update: I simplified the problem a bit too much. The strings I have to replace actually contain multiple 'fracs', like so:
I_{2}\,\mathrm{phi2dd}-\frac{l_{2}\,\mathrm{lm}_{4}\,\cos\left(\varphi _{2}\right)}{2}+\frac{l_{2}\,\mathrm{lm}_{3}\,\sin\left(\varphi _{2}\right)}{2}=0
I don't know the number of occurances in the string, this is varying.

Match using \\frac\{(.*?)\}\{2} and substitute using \\frac{1}{2} \1
Updated code:
import re
regex = r"\\frac\{(.*?)\}\{2}"
test_str = "I_{2}\\,\\mathrm{phi2dd}-\\frac{l_{2}\\,\\mathrm{lm}_{4}\\,\\cos\\left(\\varphi _{2}\\right)}{2}+\\frac{l_{2}\\,\\mathrm{lm}_{3}\\,\\sin\\left(\\varphi _{2}\\right)}{2}=0"
subst = "\\\\frac{1}{2} \\1"
# 4th argument decides how many occurences to replace
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)

Related

Python replace between two chars (no split function)

I currently investigate a problem that I want to replace something in a string.
For example. I have the following string:
'123.49, 19.30, 02\n'
I only want the first two numbers like '123.49, 19.30'. The split function is not possible, because a I have a lot of data and some with and some without the last number.
I tried something like this:
import re as regex
#result = regex.match(', (.*)\n', string)
result = re.search(', (.*)\\n', string)
print(result.group(1))
This is not working finde. Can someone help me?
Thanks in advance

You could do something like this:
reg=r'(\d+\.\d+), (\d+\.\d+).*'
if(re.search(reg, your_text)):
match = re.search(reg, your_text)
first_num = match.group(1)
second_num = match.group(2)

Alternatively, also adding the ^ sign at the beginning, making sure to always only take the first two.
import re
string = '123.49, 19.30, 02\n'
pattern = re.compile('^(\d*.?\d*), (\d*.?\d*)')
result = re.findall(pattern, string)
result
Output:
[('123.49', '19.30')]

In the code you are using import re as regex. If you do that, you would have to use regex.search instead or re.search.
But in this case you can just use re.
If you use , (.*) you would capture all after the first occurrence of , and you are not taking digits into account.
If you want the first 2 numbers as stated in the question '123.49, 19.30' separated by comma's you can match them without using capture groups:
\b\d+\.\d+,\s*\d+\.\d+\b
Or matching 1 or more repetitions preceded by a comma:
\b\d+\.\d+(?:,\s*\d+\.\d+)+\b
regex demo | Python demo
As re.search can also return None, you can first check if there is a result (no need to run re.search twice)
import re
regex = r"\b\d+\.\d+(?:,\s*\d+\.\d+)+\b"
s = "123.49, 19.30, 02"
match = re.search(regex, s)
if match:
print(match.group())
Output
123.49, 19.30

Python Regular Expression: re.findall doesn`t see all mathces

I would like to find all mathces by such pattern: (one letter)(three figures)(two letter)(two or three figures).
So my python regular expression is:
[А,В,Е,К,М,Н,О,Р,С,Т,У,Х]\d{3}[А,В,Е,К,М,Н,О,Р,С,Т,У,Х]{2}\d{2,3}
where
[А,В,Е,К,М,Н,О,Р,С,Т,У,Х] is letters` set;
\d{num} is for any figure repeated num times.
I wrote this code to solve my problem:
import re
pattern = r"[А,В,Е,К,М,Н,О,Р,С,Т,У,Х]\d{3}[А,В,Е,К,М,Н,О,Р,С,Т,У,Х]{2}\d{2,3}"
string = "A123AA11 А222АА123 A12AA123 A123CC1234 AA123A12"
re.findall(pattern, string)
I suspect to see this list of strings: ['A123AA11', 'А222АА123']
But I got this one: ['А222АА123']
What is the problem? Where did I make a mistake?

I don't know how, but the A in your regex is A_(Cyrillic) (the U+0410 or (1040d) one from ASCII)
print(ord("А")) # 1040
print(ord("A")) # 65
Then the square bracket notation means an OR between every values so here [А,В,Е,К,М,Н,О,Р,С,Т,У,Х] is same as [ABEKMHOPCTYX,] comma included, you only need [ABEKMHOPCTYX]
Giving
string = "A123AA11 A222AA123 A12AA123 A123CC1234 A123A12"
pattern = r"[ABEKMHOPCTYX]\d{3}[ABEKMHOPCTYX]{2}\d{2,3}"
print(re.findall(pattern, string)) # ['A123AA11', 'A222AA123', 'A123CC123']
To match only words that fully match the pattern, use word boundaries \b
pattern = r"\b[ABEKMHOPCTYX]\d{3}[ABEKMHOPCTYX]{2}\d{2,3}\b"
print(re.findall(pattern, string)) # ['A123AA11', 'A222AA123']

RegEx for capturing and replacing digits in a pattern

I would like to replace the 3rd argument in the string with a new number (let's say 100). The matched string always starts with function, with the first argument either true or false, and with the number as second argument.
Expected
| |
v v
'function(true, 0, 15)' --> 'function(true, 0, 100)'
'function(false, 0, 23)' --> 'function(false, 0, 100)'
I have been reading the related posts but I believe I must have misunderstood some regex concept. The following code is that I had tried but it always replaces the whole string:
import re
string = 'function(true, 0, 15)'
regex = re.compile('function\([a-zA-Z]*, [0-9]*, ([0-9]*)\)')
res = re.sub(regex, '100', string)
print(res) # 100
# Expected: function(true, 0, 100)
Question: Could you point me out why the above code doesn't work? How would I write the code to achieve the expected result?

As the number you are trying to replace is just followed by a closing parenthesis ), you can just use this \d+(?=\s*\)) regex and replace it by 100 or whatever value you want. Try these Python codes,
import re
string = 'function(true, 0, 15)'
regex = re.compile(r'\d+(?=\s*\))')
res = re.sub(regex, '100', string)
print(res)
Prints,
function(true, 0, 100)
Also, the reason why your code isn't working as expected and is replacing whole of your string with 100 because the way you've written your regex, it matches your whole input and re.sub function replaces what all matches with second argument and hence all your input gets replaced with 100. But instead what you want is, to just replace the third argument with 100 hence the way you should write your regex, should only match the third argument value, like demonstrated in below regex demo,
Regex Demo matching only what you want to replace
And your current regex matches whole of your input as shown in below demo,
Regex Demo with your regex matching whole input
Also, in case you feel better and you want to match whole input and then selectively replace only third argument, you can use this regex to capture the function name and first two parameters in group1 like you wanted to capture in your original regex,
(function\([a-zA-Z]*, [0-9]*, )[0-9]*\)
and replace it with \g<1>100) where \g<1> references the value captured in group1 and further it is replaced with 100)
Regex Demo with full match and selected replacement

This expression also might work:
(?:\d+)(\))
which has a non-capturing group with our desired digits (?:\d+), followed by a right boundary (\)), which we can replace it with our new number and $1.
Test
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(?:\d+)(\))"
test_str = "function(true, 0, 15)"
subst = "100\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Demo

An alternative, you can print everything before the match and everything after the match, then print it out with the new result like so:
regex = re.compile( '(function\([a-zA-Z]*, [0-9]*, )([0-9]*)(\))' )
res = re.sub( regex, r'\1 100\3', string )
Basically, I placed parenthesis around the text before the expected match and after the expected match. Then I print it out as \1 (first match) 100 (new text) \3 (third match).
The reason why I propose this particular expression is in case OP specifically needs to only match strings that also contain the preceding "function(" section (or some other pattern). Plus, this is just an extension of OP's solution, so it may be more intuitive to OP.

Match everything except a pattern and replace matched with string

I want to use python in order to manipulate a string I have.
Basically, I want to prepend"\x" before every hex byte except the bytes that already have "\x" prepended to them.
My original string looks like this:
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
And I want to create the following string from it:
mystr = r"\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00"
I thought of using regular expressions to match everything except /\x../g and replace every match with "\x". Sadly, I struggled with it a lot without any success. Moreover, I'm not sure that using regex is the best approach to solve such case.

Regex: (?:\\x)?([0-9A-Z]{2}) Substitution: \\x$1
Details:
(?:) Non-capturing group
? Matches between zero and one time, match string \x if it exists.
() Capturing group
[] Match a single character present in the list 0-9 and A-Z
{n} Matches exactly n times
\\x String \x
$1 Group 1.
Python code:
import re
text = R'30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00'
text = re.sub(R'(?:\\x)?([0-9A-Z]{2})', R'\\x\1', text)
print(text)
Output:
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00
Code demo

You don't need regex for this. You can use simple string manipulation. First remove all of the "\x" from your string. Then add add it back at every 2 characters.
replaced = mystr.replace(r"\x", "")
newstr = "".join([r"\x" + replaced[i*2:(i+1)*2] for i in range(len(replaced)/2)])
Output:
>>> print(newstr)
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00

You can get a list with your values to manipulate as you wish, with an even simpler re pattern
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
import re
pat = r'([a-fA-F0-9]{2})'
match = re.findall(pat, mystr)
if match:
print('\n\nNew string:')
print('\\x' + '\\x'.join(match))
#for elem in match: # match gives you a list of strings with the hex values
# print('\\x{}'.format(elem), end='')
print('\n\nOriginal string:')
print(mystr)

This can be done without replacing existing \x by using a combination of positive lookbehinds and negative lookaheads.
(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})
Usage
See code in use here
import re
regex = r"(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})"
test_str = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
subst = r"\\x$1"
result = re.sub(regex, subst, test_str, 0, re.IGNORECASE)
if result:
print (result)
Explanation
(?!(?<=\\x)|(?<=\\x[a-f\d])) Negative lookahead ensuring either of the following doesn't match.
(?<=\\x) Positive lookbehind ensuring what precedes is \x.
(?<=\\x[a-f\d]) Positive lookbehind ensuring what precedes is \x followed by a hexidecimal digit.
([a-f\d]{2}) Capture any two hexidecimal digits into capture group 1.

how to use python regex find matched string?

for string "//div[#id~'objectnavigator-card-list']//li[#class~'outbound-alert-settings']", I want to find "#..'...'" like "#id~'objectnavigator-card-list'" or "#class~'outbound-alert-settings'". But when I use regex ((#.+)\~(\'.*?\')), it find "#id~'objectnavigator-card-list']//li[#class~'outbound-alert-settings'". So how to modify the regex to find the string successfully?

Use non-capturing, non greedy, modifiers on the inner brackets and search for not the terminating character, e.g.:
re.findall(r"((?:#[^\~]+)\~(?:\'[^\]]*?\'))", test)
On your test string returns:
["#id~'objectnavigator-card-list'", "#class~'outbound-alert-settings'"]

Limit the characters you want to match between the quotes to not match the quote:
>>> re.findall(r'#[a-z]+~\'[-a-z]*\'', x)
I find it's much easier to look for only the characters I know are going to be in a matching section rather than omitting characters from more permissive matches.

For your current test string's input you can try this pattern:
import re
a = "//div[#id~'objectnavigator-card-list']//li[#class~'outbound-alert-settings']"
# find everything which begins by '#' and neglect ']'
regex = re.compile(r'(#[^\]]+)')
strings = re.findall(regex, a)
# Or simply:
# strings = re.findall('(#[^\\]]+)', a)
print(strings)
Output:
["#id~'objectnavigator-card-list'", "#class~'outbound-alert-settings'"]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace substring surrounding main string - python

Related

Python replace between two chars (no split function)

Python Regular Expression: re.findall doesn`t see all mathces

RegEx for capturing and replacing digits in a pattern

Match everything except a pattern and replace matched with string

how to use python regex find matched string?

Categories

Resources