I would like to validate the following expressions :
"CODE1:123/CODE2:3467/CODE1:7686"
"CODE1:9090"
"CODE2:078/CODE1:7788/CODE1:333"
"CODE2:77"
In my case, the patterns 'CODE1:xx' or 'CODE2:xx' are given in any different orders.
I can sort the patterns to make them like 'CODE1:XX/CODE1:YY/CODE2:ZZ'
and check if matches something like
r'[CODE1:\d+]*[CODE2:\d+]*'
Could we make it shorter : is it possible to solve this with one regex matcher ?
Thanks
This regex will provide a match for all 4 cases:
CODE[12]:\d+(?:/CODE[12]:\d+)*
See here: https://regex101.com/r/wn30a5/1
It will match CODE followed by either 1 or 2 and then a colon : with digits; and optionally followed by a slash / and that pattern again, any number of times. So a trailing slash won't be permitted and it can appear as a single code too; and in any order; so it doesn't need to be sorted first.
CODE is static but after it the digit is dynamic, to make it shorter just use CODE\d:\d+
if you want to match only two digit after : use CODE\d:\d{2}
Related
I need to a regex to validate a string like "foo.com". A word which contains a dot. I have tried several but could not get it work.
The patterns I have tried:
(\\w+\\.)
(\\w+.)
(\\w.)
(\\W+\\.)
Can some one please help me one this.
Thanks,
Use regex with character class
([\\w.]+)
If you just want to contain single . then use
(\\w+\\.\\w+)
In case you want multiple . which is not adjacent then use
(\\w+(?:\\.\\w+)+)
To validate a string that contains exactly one dot and at least two letters around use match for
\w+\.\w+
which in Java is denoted as
\\w+\\.\\w+
This regex works:
[\w\[.\]\\]+
Tested for following combinations:
foo.com
foo.co.in
foo...
..foo
I understand your question like, you need a regex to match a word which has a single dot in-between the word (not first or last).
Then below regex will satisfy your need.
^\\w+\\.\\w+$
This is an example string:
123456#p654321
Currently, I am using this match to capture 123456 and 654321 in to two different groups:
([0-9].*)#p([0-9].*)
But on occasions, the #p654321 part of the string will not be there, so I will only want to capture the first group. I tried to make the second group "optional" by appending ? to it, which works, but only as long as there is a #p at the end of the remaining string.
What would be the best way to solve this problem?
You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:
([0-9]*)(?:#p([0-9]*))?
The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.
Also, most reg-ex variants have a \d character class for digits. So you could simplify even further:
(\d*)(?:#p(\d*))?
As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:
(\d+)(?:#p(\d+))?
Your regex will actually match no digits, because you've used * instead of +.
This is what (I think) you want:
(\d+)(?:#p(\d+))?
I am trying to write a regex expression in python that can match the following lines - I am just able to match the very first number by doing something like this
re.compile(r'\d.\d{14}\s+')
but could not do rest. Also tried doing [^-\d] to catch the negative sign - does not seem working.
Any help? Thanks!
First, lets start by looking at the numbers. You've already got a decent expression for finding a single number (\d.\d{14}\s+), but there are a couple things wrong with it.
In regex, . indicates any single character. This means that your expression will accept any character after the first digit.
It's not taking into account the possibility that there could be a negative sign at the beginning.
Both of these problems are really easy to fix. The first can be fixed by simply escaping the period (\.). The second can be fixed by adding the negative sign to the pattern and giving it a quantifier. In this case, the ? quantifier will be the best option because it matches between 0 and 1 times. All this means is that it won't care if the symbol is there, but if it is it will match it. After these 2 changes, the pattern looks like this: -?\d\.\d{14}\s+.
Next, we need to tell it to match more than once. This can be done very easily by putting the pattern in a group and applying a quantifier to said group. Now the question is which quantifier should be used. In your example, there are only 3 numbers before the single character at the end of the line. You can match this pattern exactly 3 times by using the {3} quantifier. If you know there will be at least 1 but don't know how many in total there will be, you can use the + quantifier. For this example I will be using the {3} quantifier just so it's more specific to your question. After adding this, the pattern will look something like this: (-?\d\.\d{14}\s+){3}
Now all that's left is to match the character at the end. You can use \S to match any single word character. You can add a quantifier to it, but again, for the purposes of your question, I won't be since there's only a single character. The final expression would look like (-?\d\.\d{14}\s+){3}\S.
I have an input string consisting of a sequence of real numbers separated by a single space. It is also acceptable for the string to contain only one real number (no spaces). My goal is to check whether the string structure matches the following (in this order):
optional (0/1): minus (-)
1/more digits
optional (1+): a period and 1/more digits
optional (0+): a group consisting of a space and the first group (the first three bullet points)
It should describe the string completely. If not, it should print an error message and exit.
My current regular expression is ^(-?\d+(\.?\d)*)( \1)*$ which I thought would be okay, but even the first group doesn't match all the real numbers individually. And I need it to check the string from the beginning to the end, including the spaces.
My code for this function looks like this:
import re
def structure_check(string):
structure = r"^(-?\d+(\.?\d)*)( \1)*$"
if re.match(structure,string):
return("OK")
else:
print("Input error")
exit()
It should accept strings like: 15 35 -45 8 -2.3 4564.18 56 etc., but it doesn't correspond to changes in the input (doesn't match) at all. It shouldn't match if there is too many spaces, incorrectly placed . or -, or if there are other characters than digits, periods, dashes (-) and spaces.
I could also do this with just the first group while iterating over a list created by splitting the input string by space, but I would prefer to check it according to my main goal, since I wouldn't have to split the input in the validation function and also to save some more code lines by checking the input alltogether (eg. for excess spaces, or unsupported characters, which I'd have to otherwise check separately).
Sorry if I missed any answered questions, I couldn't find any appropriate for my problem in Python. If you know about any, feel free to link them, please. And thank you, I am a beginner and started learning regex for a project just about yesterday.
You can use:
^((?:[+-]?\d+(?:[.]\d+)?)(?:[ \t]|$))*$
Demo and explantation
I added + to the optional sign. If you only want to match with no sign or -, just remove that from the optional character class.
You could also use an unrolled version to prevent matching a space at the end.
^-?\d+(?:\.\d+)?(?: -?\d+(?:\.\d+)?)*$
Regex demo
The backreference \1 will match exactly what is matched in group 1 and for your pattern will match for example 123 123 123
If you want to repeat the group, you could recurse the first group using the PyPi regex module and (?1)
^(-?\d+(?:\.\d+)?)(?: (?1))*$
See a Python example
Problem is in your regexp, to be specific, in ( \1)* part.
This, described, means: space and string that was matched in group 1 zero or more times
Thus, your regexp will match for the following, for example:
15 15 15
-5.3 -5.3 -5.3 -5.3
And so on.
To fix the regexp, I would replace the group reference with the actual group, like so:
^(-?\d+(\.?\d)*)( -?\d+(\.?\d)*)*$
I would also point out that this regexp allows the numbers to have multiple decimal dots, (e.g. 1.2.3 passes) however I'm not sure if that's intended or not.
In JavaScript you can use the method .test of regex. The regex should work in python.
let ok = /^(([+\-]?\d+(\.\d+)?)( |$))+$/.test("15 35 -45 8 -2.3 4564.18 56");
console.log(ok);
Explanation: (.\d+)? You must make the whole group optional. The number can be followed by a space or the end of a string ( |$). The pattern is repeated throughout the string so I wrapped the entire expression in a group. Insert ^ at the beginning of the regex and $ at the end of the regex to force the regex to check the string completely.
I would like to intercept string starting with \*#\*
followed by a number between 0 and 7
and ending with: ##
so something like \*#\*0##
but I could not find a regex for this
Assuming you want to allow only one # before and two after, I'd do it like this:
r'^(\#{1}([0-7])\#{2})'
It's important to note that Alex's regex will also match things like
###7######
########1###
which may or may not matter.
My regex above matches a string starting with #[0-7]## and ignores the end of the string. You could tack a $ onto the end if you wanted it to match only if that's the entire line.
The first backreference gives you the entire #<number>## string and the second backreference gives you the number inside the #.
None of the above examples are taking into account the *#*
^\*#\*[0-7]##$
Pass : *#*7##
Fail : *#*22324324##
Fail : *#3232#
The ^ character will match the start of the string, \* will match a single asterisk, the # characters do not need to be escape in this example, and finally the [0-7] will only match a single character between 0 and 7.
r'\#[0-7]\#\#'
The regular expression should be like ^#[0-7]##$
As I understand the question, the simplest regular expression you need is:
rex= re.compile(r'^\*#\*([0-7])##$')
The {1} constructs are redundant.
After doing rex.match (or rex.search, but it's not necessary here), .group(1) of the match object contains the digit given.
EDIT: The whole matched string is always available as match.group(0). If all you need is the complete string, drop any parentheses in the regular expression:
rex= re.compile(r'^\*#\*[0-7]##$')