how can avoid a string based on regex?

how can avoid a string based on regex? - python

am trying to fetch a string which only has a digit in it (the regex I give), but its returning me the both of them.
string1 = '1234843847394645362'
string2 = 'this is what I have 1297643847381737345is a multi'
Regex used :
'\d{15,20}'
this gives me both the numbers from string1 and string2 .
Can we avoid getting the number from string2 ?
need help.

Try with this regex: ^\d{15,20}$
Demo here

If you don't want to match the digits when followed by a newline use \Z
\A\d{15,20}\Z
Regex demo

Related

how to skip backslash followed by integer?

i have regex https://regex101.com/r/2H5ew6/1
(\!|\#)(1)
Hello!1 World
and i wanna get first mark (!|#) and change the number 1 to another number 2
I did
{\1}2_
\1\\2_
but it adds extra text and i just wanna change the number
i expect result to be
Hello!2_World
and ifusing # to be
Hello#2_World

Match and capture either ! or # in a named capture group, here called char, if followed by one or more digits and a whitespace:
(?P<char>[!#])\d+\s
Substitute with the named capture, \g<char> followed by 2_:
\g<char>2_
Demo
If you only want the substitution if there's a 1 following either ! or #, replace \d+ with 1.

In your substitution you need to change the {\1}2_ to just 2_.
string = "Hello!1 World"
pattern = "(\!|\#)(1)"
replacement = "2_"
result = re.sub(pattern, replacement, string)

Why not: string.replace('!1 ', '!2_').replace('#1 ', '#2_') ?
>>> string = "Hello!1 World"
>>> repl = lambda s: s.replace('!1 ', '!2_').replace('#1 ', '#2_')
>>> string2 = repl(string)
>>> string2
'Hello!2_World'
>>> string = "Hello!12 World"
>>> string2 = repl(string)
>>> string2
'Hello!12 World'

The replacement for you pattern should be \g<1>2_
Regex demo
You could also shorten your pattern to a single capture with a character class [!#] and a match and use the same replacement as above.
([!#])1
Regex demo
Or with a lookbehind assertion without any groups and replace with 2_
(?<=[!#])1
Regex demo

How to extract string information from these two strings?

I want to write a single regular expression code to extract the string from these two strings:
string1 = '#HISEQ:625:HC2T5BCXY:1:1101:1177:2101'
string2 = '#SRR7216015.1 HISEQ:630:HC2VKBCXY:1:1101:1177:2073/1'
I want to extract the string right after the # until it hit the end or a space to get
HISEQ:625:HC2T5BCXY:1:1101:1177:2101 from string1
or
SRR7216015.1 from string2
So, how to do it. I've tested a bunch of the regular expression code but couldn't do it.
Below is the code I tried:
string1 = '#HISEQ:625:HC2T5BCXY:1:1101:1177:2101'
string2 = '#SRR7216015.1 HISEQ:630:HC2VKBCXY:1:1101:1177:2073/1'
pattern1 = re.compile(r'#(\w*.*:*\d*:*\w*:*\d*:*\d*[$|\s])')
print(pattern1.search(string1).group(1))
Thanks in advance!

Just use
#(\S+)
and take the first group. Lookarounds or alternations - as suggested in other answers - are expensive.

You could use this regex for that:
(?<=#).*?(?= |$)
Use lookarounds. (?<=#) checks for an # signt before, (?= |$) matches an spaces or end of string. .* mathes everything between
https://regex101.com/r/p7kI2O/1

findall not retruning all the results in Python 3.7

I am trying to create list of tuples with the data after strings string1 and string3. But not getting expected result.
s = 'string1:1234string2string3:a1b2c3string1:2345string3:b5c6d7'
re.findall('string1:(\d+)[\s,\S]+string3:([\s\S]+',s)
Actual result:
[('1234', 'b5c6d7)']
Expected result:
[('1234', 'a1b2c3'), ('2345', 'b5c6d7')]

You current regex uses [\s,\S]+ which is greedy and matches all characters until the end of the line.
You could make it non greedy and use a positive lookahead (?=string|$) for the last match that assert what follows is either string or the end of the line $.
string1:(\d+).*?string3:(.*?)(?=string|$)
import re
s = 'string1:1234string2string3:a1b2c3string1:2345string3:b5c6d7'
print(re.findall('string1:(\d+).*?string3:(.*?)(?=string|$)',s))
Demo

The problem is that [\s,\S]+ is greedy and therefore consuming everything between the first string1 and the last string3.
You can fix that by using positive lookaheads and making the regex non greedy like this:
string1:(\d+)[^\d][\s,\S]+?string3:([\s\S]+?(?=string|$))

Match everything except a pattern and replace matched with string

I want to use python in order to manipulate a string I have.
Basically, I want to prepend"\x" before every hex byte except the bytes that already have "\x" prepended to them.
My original string looks like this:
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
And I want to create the following string from it:
mystr = r"\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00"
I thought of using regular expressions to match everything except /\x../g and replace every match with "\x". Sadly, I struggled with it a lot without any success. Moreover, I'm not sure that using regex is the best approach to solve such case.

Regex: (?:\\x)?([0-9A-Z]{2}) Substitution: \\x$1
Details:
(?:) Non-capturing group
? Matches between zero and one time, match string \x if it exists.
() Capturing group
[] Match a single character present in the list 0-9 and A-Z
{n} Matches exactly n times
\\x String \x
$1 Group 1.
Python code:
import re
text = R'30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00'
text = re.sub(R'(?:\\x)?([0-9A-Z]{2})', R'\\x\1', text)
print(text)
Output:
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00
Code demo

You don't need regex for this. You can use simple string manipulation. First remove all of the "\x" from your string. Then add add it back at every 2 characters.
replaced = mystr.replace(r"\x", "")
newstr = "".join([r"\x" + replaced[i*2:(i+1)*2] for i in range(len(replaced)/2)])
Output:
>>> print(newstr)
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00

You can get a list with your values to manipulate as you wish, with an even simpler re pattern
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
import re
pat = r'([a-fA-F0-9]{2})'
match = re.findall(pat, mystr)
if match:
print('\n\nNew string:')
print('\\x' + '\\x'.join(match))
#for elem in match: # match gives you a list of strings with the hex values
# print('\\x{}'.format(elem), end='')
print('\n\nOriginal string:')
print(mystr)

This can be done without replacing existing \x by using a combination of positive lookbehinds and negative lookaheads.
(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})
Usage
See code in use here
import re
regex = r"(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})"
test_str = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
subst = r"\\x$1"
result = re.sub(regex, subst, test_str, 0, re.IGNORECASE)
if result:
print (result)
Explanation
(?!(?<=\\x)|(?<=\\x[a-f\d])) Negative lookahead ensuring either of the following doesn't match.
(?<=\\x) Positive lookbehind ensuring what precedes is \x.
(?<=\\x[a-f\d]) Positive lookbehind ensuring what precedes is \x followed by a hexidecimal digit.
([a-f\d]{2}) Capture any two hexidecimal digits into capture group 1.

regex: match all characteres between 2 words, returns strange output

in this text:
"IPAddress":"10.0.0.18","PolicerID":"","IPAddress":"","PolicerID":""
I want to catch all ips, in this example are 10.0.0.18 and emptystring
I tried to use this regex:
(?<="IPAddress":")(.*?)(?=")
which returns me 10.0.0.18 and ",
it took the first " from PolicerID instead of the last " in IPAddress.
Can you please help me?
Thanks

You can keep it simple and just use a capturing group:
>>> str = r'"IPAddress":"10.0.0.18","PolicerID":"","IPAddress":"","PolicerID":""'
>>> print re.findall(r'"IPAddress":"([^"]*)', str)
['10.0.0.18', '']
>>>
However if you have to use lookbehind assertion then use this regex:
(?<="IPAddress":")([^"]*)
([^"]*) is a negated pattern to match 0 or more of any character that is not a double quote.
RegEx Demo

If you want all IPs in that text I would suggest this regex
[0-9]+(?:\.[0-9]+){3}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how can avoid a string based on regex? - python

Try with this regex: ^\d{15,20}$ Demo here

If you don't want to match the digits when followed by a newline use \Z \A\d{15,20}\Z Regex demo

Related

how to skip backslash followed by integer?

How to extract string information from these two strings?

findall not retruning all the results in Python 3.7

Match everything except a pattern and replace matched with string

regex: match all characteres between 2 words, returns strange output

Categories

Resources