Please advise on the Python regular expression

Please advise on the Python regular expression - python

message = <#U0104FGR7SL> test111 <#U0106LSJ> test33
There is the above string.
Based on the reference letter corresponding to the pattern <#U......>
I'd like to split the text.
I'd like to cut it by a pattern.
regex = re.compile("<#U[^>]+>")
match = regex.split (message)
If I do this, I get a "test, test22"
<#U0104FGR7SL> test111
<#U0106LSJ> test33
I'd like to split it this way.
Please advise me what to do.

You can do the following:
import re
message = "<#U0104FGR7SL> test111 <#U0106LSJ> test33"
matches = re.findall("<\S+>\s\S+", message)
for x in matches:
print(x)
# <#U0104FGR7SL> test111
# <#U0106LSJ> test33

Another one - using the newer regex module which supports splitting by lookarounds:
import regex as re
string = "<#U0104FGR7SL> test111 <#U0106LSJ> test33"
parts = re.split(r'(?<!\A)(?=<#)', string)
print(parts)
This yields
['<#U0104FGR7SL> test111 ', '<#U0106LSJ> test33']
See a demo on regex101.com.

You may use either of the two re.split solutions:
re.split(r'\s+(?=<#U[^>]+>)', message) # Any Python version, if matches are whitespace separated
[x.strip() for x in re.split(r'(?=<#U[^>]+>)', message) if x] # Starting with Python 3.7
NOTE: In Python 3.7, re.split finally was fixed to split with empty matches.
Details
\s+ - 1+ whitespaces
(?=<#U[^>]+>) - a positive lookahead that requires <#U, 1+ chars other than > and then > immediately to the right of the current location.
See the Python demo:
import re
message = '<#U0104FGR7SL> test111 <#U0106LSJ> test33'
print ( re.split(r'\s+(?=<#U[^>]+>)', message) )
# => '<#U0104FGR7SL> test111', '<#U0106LSJ> test33']
print ( [x.strip() for x in re.split(r'(?=<#U[^>]+>)', message) if x] )
# => '<#U0104FGR7SL> test111', '<#U0106LSJ> test33']

Related

Python replace between two chars (no split function)

I currently investigate a problem that I want to replace something in a string.
For example. I have the following string:
'123.49, 19.30, 02\n'
I only want the first two numbers like '123.49, 19.30'. The split function is not possible, because a I have a lot of data and some with and some without the last number.
I tried something like this:
import re as regex
#result = regex.match(', (.*)\n', string)
result = re.search(', (.*)\\n', string)
print(result.group(1))
This is not working finde. Can someone help me?
Thanks in advance

You could do something like this:
reg=r'(\d+\.\d+), (\d+\.\d+).*'
if(re.search(reg, your_text)):
match = re.search(reg, your_text)
first_num = match.group(1)
second_num = match.group(2)

Alternatively, also adding the ^ sign at the beginning, making sure to always only take the first two.
import re
string = '123.49, 19.30, 02\n'
pattern = re.compile('^(\d*.?\d*), (\d*.?\d*)')
result = re.findall(pattern, string)
result
Output:
[('123.49', '19.30')]

In the code you are using import re as regex. If you do that, you would have to use regex.search instead or re.search.
But in this case you can just use re.
If you use , (.*) you would capture all after the first occurrence of , and you are not taking digits into account.
If you want the first 2 numbers as stated in the question '123.49, 19.30' separated by comma's you can match them without using capture groups:
\b\d+\.\d+,\s*\d+\.\d+\b
Or matching 1 or more repetitions preceded by a comma:
\b\d+\.\d+(?:,\s*\d+\.\d+)+\b
regex demo | Python demo
As re.search can also return None, you can first check if there is a result (no need to run re.search twice)
import re
regex = r"\b\d+\.\d+(?:,\s*\d+\.\d+)+\b"
s = "123.49, 19.30, 02"
match = re.search(regex, s)
if match:
print(match.group())
Output
123.49, 19.30

re.findall -> RegEx in Python

import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase)
print(x)
Result:
[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]
I want something like:
[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]
How can I do this?

You need to
Remove the initial / char that invalidates the match of https:// / http:// since / appears after http
Remove unnecessary capturing group and {1} quantifier
Convert the optional capturing group into a non-capturing one.
See this Python demo:
import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']
See the regex demo, too. Also, see the related re.findall behaves weird post.

Try this, it will take string starting from https to single space or newline.
import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = re.findall('(https?://(?:[^\s]*))', frase)
print(x)
# ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

Match everything except a pattern and replace matched with string

I want to use python in order to manipulate a string I have.
Basically, I want to prepend"\x" before every hex byte except the bytes that already have "\x" prepended to them.
My original string looks like this:
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
And I want to create the following string from it:
mystr = r"\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00"
I thought of using regular expressions to match everything except /\x../g and replace every match with "\x". Sadly, I struggled with it a lot without any success. Moreover, I'm not sure that using regex is the best approach to solve such case.

Regex: (?:\\x)?([0-9A-Z]{2}) Substitution: \\x$1
Details:
(?:) Non-capturing group
? Matches between zero and one time, match string \x if it exists.
() Capturing group
[] Match a single character present in the list 0-9 and A-Z
{n} Matches exactly n times
\\x String \x
$1 Group 1.
Python code:
import re
text = R'30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00'
text = re.sub(R'(?:\\x)?([0-9A-Z]{2})', R'\\x\1', text)
print(text)
Output:
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00
Code demo

You don't need regex for this. You can use simple string manipulation. First remove all of the "\x" from your string. Then add add it back at every 2 characters.
replaced = mystr.replace(r"\x", "")
newstr = "".join([r"\x" + replaced[i*2:(i+1)*2] for i in range(len(replaced)/2)])
Output:
>>> print(newstr)
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00

You can get a list with your values to manipulate as you wish, with an even simpler re pattern
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
import re
pat = r'([a-fA-F0-9]{2})'
match = re.findall(pat, mystr)
if match:
print('\n\nNew string:')
print('\\x' + '\\x'.join(match))
#for elem in match: # match gives you a list of strings with the hex values
# print('\\x{}'.format(elem), end='')
print('\n\nOriginal string:')
print(mystr)

This can be done without replacing existing \x by using a combination of positive lookbehinds and negative lookaheads.
(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})
Usage
See code in use here
import re
regex = r"(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})"
test_str = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
subst = r"\\x$1"
result = re.sub(regex, subst, test_str, 0, re.IGNORECASE)
if result:
print (result)
Explanation
(?!(?<=\\x)|(?<=\\x[a-f\d])) Negative lookahead ensuring either of the following doesn't match.
(?<=\\x) Positive lookbehind ensuring what precedes is \x.
(?<=\\x[a-f\d]) Positive lookbehind ensuring what precedes is \x followed by a hexidecimal digit.
([a-f\d]{2}) Capture any two hexidecimal digits into capture group 1.

Extract string within parentheses - PYTHON

I have a string "Name(something)" and I am trying to extract the portion of the string within the parentheses!
Iv'e tried the following solutions but don't seem to be getting the results I'm looking for.
n.split('()')
name, something = n.split('()')

You can use a simple regex to catch everything between the parenthesis:
>>> import re
>>> s = 'Name(something)'
>>> re.search('\(([^)]+)', s).group(1)
'something'
The regex matches the first "(", then it matches everything that's not a ")":
\( matches the character "(" literally
the capturing group ([^)]+) greedily matches anything that's not a ")"

as an improvement on #Maroun Maroun 's answer:
re.findall('\(([^)]+)', s)
it finds all instances of strings in between parentheses

You can use split as in your example but this way
val = s.split('(', 1)[1].split(')')[0]
or using regex

You can use re.match:
>>> import re
>>> s = "name(something)"
>>> na, so = re.match(r"(.*)\((.*)\)" ,s).groups()
>>> na, so
('name', 'something')
that matches two (.*) which means anything, where the second is between parentheses \( & \).

You can look for ( and ) (need to escape these using backslash in regex) and then match every character using .* (capturing this in a group).
Example:
import re
s = "name(something)"
regex = r'\((.*)\)'
text_inside_paranthesis = re.match(regex, s).group(1)
print(text_inside_paranthesis)
Outputs:
something
Without regex you can do the following:
text_inside_paranthesis = s[s.find('(')+1:s.find(')')]
Outputs:
something

Change a text between two strings in Python with Regex

I found several similar questions, but I cannot fit my problem to any of them. I try to find and replace a string between two other strings in a text.
reg = "%s(.*?)%s" % (str1,str2)
r = re.compile(reg,re.DOTALL)
result = r.sub(newstring, originaltext)
The problem is that the code above replace also str1 and str2, whereas I want to replace only the text between them. Something obviously that I miss?
Update:
I simplified example:
text = 'abcdefghijklmnopqrstuvwxyz'
str1 = 'gh'
str2 = 'op'
newstring = 'stackexchange'
reg = "%s(.*?)%s" % (str1,str2)
r = re.compile(reg,re.DOTALL)
result = r.sub(newstring, text)
print result
The result is abcdefstackexchangeqrstuvwxyz whereas I need abcdefghstackexchangeopqrstuvwxyz

Use a combination of lookarounds in your regular expression.
reg = "(?<=%s).*?(?=%s)" % (str1,str2)
Explanation:
Lookarounds are zero-width assertions. They don't consume any characters on the string.
(?<= # look behind to see if there is:
gh # 'gh'
) # end of look-behind
.*? # any character except \n (0 or more times)
(?= # look ahead to see if there is:
op # 'op'
) # end of look-ahead
Working Demo

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Please advise on the Python regular expression - python

You can do the following: import re message = "<#U0104FGR7SL> test111 <#U0106LSJ> test33" matches = re.findall("<\S+>\s\S+", message) for x in matches: print(x) # <#U0104FGR7SL> test111 # <#U0106LSJ> test33

Related

Python replace between two chars (no split function)

re.findall -> RegEx in Python

Match everything except a pattern and replace matched with string

Extract string within parentheses - PYTHON

Change a text between two strings in Python with Regex

Categories

Resources