I'm trying to catch the last part after the last backslash
I need the \Web_ERP_Assistant (with the \)
My idea was :
C:\Projects\Ensure_Solution\Assistance\App_WebReferences\Web_ERP_WebService\Web_ERP_Assistant
\\.+?(?!\\) // I know there is something with negative look -ahead `(?!\\)`
But I can't find it.
[Regexer Demo]
Your negative lookahead solution would e.g. be this:
\\(?:.(?!\\))+$
See it here on Regexr
One that worked for me was:
.+(\\.+)$
Try it online!
Explanation:
.+ - any character except newline
( - create a group
\\.+ - match a backslash, and any characters after it
) - end group
$ - this all has to happen at the end of the string
A negative look ahead is a correct answer, but it can be written more cleanly like:
(\\)(?!.*\\)
This looks for an occurrence of \ and then in a check that does not get matched, it looks for any number of characters followed by the character you don't want to see after it. Because it's negative, it only matches if it does not find a match.
You can try anchoring it to the end of the string, something like \\[^\\]*$. Though I'm not sure if one absolutely has to use regexp for the task.
What about this regex: \\[^\\]+$
If you don't want to include the backslash, but only the text after it, try this: ([^\\]+)$ or for unix: ([^\/]+)$
I used below regex to get that result also when its finished by a \
(\\[^\\]+)\\?$
[Regex Demo]
Related
I am new to regexes.
I have the following string : \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop
Out of this string, I want to extract Rivet and I already have (941) as a string in a variable.
My thought process was like this:
Find all the (941)s
filter the results by checking if the string after (941) is followed by \n, followed by a word, and ending with \n
I made a regex for the 2nd part: \n[\w\s\'\d\-\/\.]+$\n.
The problem I am facing is that because of the parenthesis in (941) the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941) so then I can apply the 3rd step on that.
PS.
I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
I have tried the following regex: (?:...), (941){1} and the make regex literal character \ like this \(941\) with no useful results. Maybe I am using them wrong.
Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.
Thanks!
Assuming:
You want to avoid matching only digits;
Want to match a substring made of word-characters (thus including possible digits);
Try to escape the variable and use it in the regular expression through f-string:
import re
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}\n(?!\d+\n)(\w+)', s)[0]
print(m)
Prints:
Rivet
If you have text in a variable that should be matched exactly, use re.escape() to escape it when substituting into the regexp.
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
num = '(941)'
re.findall(rf'(?<=\n{re.escape(num)}\n)[\w\s\'\d\-\/\.]+(?=\n)', s)
This puts (941)\n in a lookbehind, so it's not included in the match. This avoids a problem with the \n at the end of one match overlapping with the \n at the beginning of the next.
I am looking for a pattern that matches everything until the first occurrence of a specific character, say a ";" - a semicolon.
I wrote this:
/^(.*);/
But it actually matches everything (including the semicolon) until the last occurrence of a semicolon.
You need
/^[^;]*/
The [^;] is a character class, it matches everything but a semicolon.
^ (start of line anchor) is added to the beginning of the regex so only the first match on each line is captured. This may or may not be required, depending on whether possible subsequent matches are desired.
To cite the perlre manpage:
You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.
This should work in most regex dialects.
Would;
/^(.*?);/
work?
The ? is a lazy operator, so the regex grabs as little as possible before matching the ;.
/^[^;]*/
The [^;] says match anything except a semicolon. The square brackets are a set matching operator, it's essentially, match any character in this set of characters, the ^ at the start makes it an inverse match, so match anything not in this set.
None of the proposed answers did work for me. (e.g. in notepad++)
But
^.*?(?=\;)
did.
Try /[^;]*/
Google regex character classes for details.
sample text:
"this is a test sentence; to prove this regex; that is g;iven below"
If for example we have the sample text above, the regex /(.*?\;)/ will give you everything until the first occurence of semicolon (;), including the semicolon: "this is a test sentence;"
Try /[^;]*/
That's a negating character class.
This was very helpful for me as I was trying to figure out how to match all the characters in an xml tag including attributes. I was running into the "matches everything to the end" problem with:
/<simpleChoice.*>/
but was able to resolve the issue with:
/<simpleChoice[^>]*>/
after reading this post. Thanks all.
this is not a regex solution, but something simple enough for your problem description. Just split your string and get the first item from your array.
$str = "match everything until first ; blah ; blah end ";
$s = explode(";",$str,2);
print $s[0];
output
$ php test.php
match everything until first
This will match up to the first occurrence only in each string and will ignore subsequent occurrences.
/^([^;]*);*/
"/^([^\/]*)\/$/" worked for me, to get only top "folders" from an array like:
a/ <- this
a/b/
c/ <- this
c/d/
/d/e/
f/ <- this
Really kinda sad that no one has given you the correct answer....
In regex, ? makes it non greedy. By default regex will match as much as it can (greedy)
Simply add a ? and it will be non-greedy and match as little as possible!
Good luck, hope that helps.
This works for getting the content from the beginning of a line till the first word,
/^.*?([^\s]+)/gm
I faced a similar problem including all the characters until the first comma after the word entity_id. The solution that worked was this in Bigquery:
SELECT regexp_extract(line_items,r'entity_id*[^,]*')
I have huge string like this dsdasdludocid=15878284988193842600#lrd=0x3be04dcc5b5ac513:0xdc5b0011ebb625a8,2
I want to get the number after ludocid, only consecutive numbers.
I have tried this regex (ludocid).*(?=\d+\d+) and many more but no luck.
You can try ludocid=(\d+):
s = "dsdasdludocid=15878284988193842600#lrd=0x3be04dcc5b5ac513:0xdc5b0011ebb625a8,2"
import re
re.findall(r"ludocid=(\d+)", s)
# ['15878284988193842600']
You can use this regex:
ludocid\D*(\d+)
RegEx Demo
This will match literal ludocid followed by 0 or more non-digits and then it will match 1 or more digits in captured group #1
Code:
>>> s = 'dsdasdludocid=15878284988193842600#lrd=0x3be04dcc5b5ac513:0xdc5b0011ebb625a8,2'
>>> print re.search(r'ludocid\D*(\d+)', s).group(1)
15878284988193842600
It looks like you just threw a bunch of regex bits together... Let's work through that.
First, this is the correct regex: ludocid.(\d+)
(You would want to use it with re.search instead of re.match, by the way. Match requires the regex to match the entire string.)
But let's look at yours and see what went wrong and how we can get to the correct regex.
(ludocid).*(?=\d+\d+)
Imagine a regex as a function. You pass it the right things, and it gives you the appropriate result. When you wrap things in parentheses, you're saying "Find this and give it back to me." You don't need the ludocid given back to you, I'm guessing... so remove those paren.
ludocid.*(?=\d+\d+)
Now you've got a .*. This is dangerous in regular expressions because it literally says "Grab as many of anything as you possibly can!" Often I use the non-greedy version (.*?), but in this case it looks like we're just expecting a single extra character there. If you know the literal character you can use that, but to be safe I'll leave it as ., which says "Grab any one character."
ludocid.(?=\d+\d+)
Now let's go inside the parentheses. You've got \d+\d+, which says "Find a sequence of one or more digits, and then find another sequence of one or more digits." This equates to "Find a sequence of two or more digits." I don't think this is what you wanted (it's not how you described the problem, anyway), so let's reduce that:
ludocid.(?=\d+)
Okay, great. Now... what is (?=...) for? It's called a lookahead assertion. It says "If you find this string, match things in front of it." The example given in the Python 2.7 documentation is:
(?=...)
Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.
Essentially this means that your regex will never return the digits. Instead, it looks to see if digits exist, and then it returns things from the rest of the regex. Remove the lookahead assertion and we're there:
ludocid.(\d+)
When you use this with re.search, you'll get the group you want:
>>> s = "dsdasdludocid=15878284988193842600#lrd=0x3be04dcc5b5ac513:0xdc5b0011ebb625a8,2"
>>> import re
>>> re.search(r"ludocid.(\d+)", s).group(1)
'15878284988193842600'
To match only the digits that follow, stopping at the first non-numeric char, try a positive look behind:
(?<=ludocid=)(\d+)
So:
re.findall(r"(?<=ludocid=)(\d+)", s)
The positive look behind will look for what you want, and only match if it is preceded by the 'flag' string.
**Note: **You may need to escape that second = sign like this: (?<=ludocid\=)(\d+)
I am using Python 2.7 and have a question with regards to regular expressions. My string would be something like this...
"SecurityGroup:Pub HDP SG"
"SecurityGroup:Group-Name"
"SecurityGroup:TestName"
My regular expression looks something like below
[^S^e^c^r^i^t^y^G^r^o^u^p^:].*
The above seems to work but I have the feeling it is not very efficient and also if the string has the word "group" in it, that will fail as well...
What I am looking for is the output should find anything after the colon (:). I also thought I can do something like using group 2 as my match... but the problem with that is, if there are spaces in the name then I won't be able to get the correct name.
(SecurityGroup):(\w{1,})
Why not just do
security_string.split(':')[1]
To grab the second part of the String after the colon?
You could use lookbehind:
pattern = re.compile(r"(?<=SecurityGroup:)(.*)")
matches = re.findall(pattern, your_string)
Breaking it down:
(?<= # positive lookbehind. Matches things preceded by the following group
SecurityGroup: # pattern you want your matches preceded by
) # end positive lookbehind
( # start matching group
.* # any number of characters
) # end matching group
When tested on the string "something something SecurityGroup:stuff and stuff" it returns matches = ['stuff and stuff'].
Edit:
As mentioned in a comment, pattern = re.compile(r"SecurityGroup:(.*)") accomplishes the same thing. In this case you are matching the string "SecurityGroup:" followed by anything, but only returning the stuff that follows. This is probably more clear than my original example using lookbehind.
Maybe this:
([^:"]+[^\s](?="))
Regex live here.
I would like to intercept string starting with \*#\*
followed by a number between 0 and 7
and ending with: ##
so something like \*#\*0##
but I could not find a regex for this
Assuming you want to allow only one # before and two after, I'd do it like this:
r'^(\#{1}([0-7])\#{2})'
It's important to note that Alex's regex will also match things like
###7######
########1###
which may or may not matter.
My regex above matches a string starting with #[0-7]## and ignores the end of the string. You could tack a $ onto the end if you wanted it to match only if that's the entire line.
The first backreference gives you the entire #<number>## string and the second backreference gives you the number inside the #.
None of the above examples are taking into account the *#*
^\*#\*[0-7]##$
Pass : *#*7##
Fail : *#*22324324##
Fail : *#3232#
The ^ character will match the start of the string, \* will match a single asterisk, the # characters do not need to be escape in this example, and finally the [0-7] will only match a single character between 0 and 7.
r'\#[0-7]\#\#'
The regular expression should be like ^#[0-7]##$
As I understand the question, the simplest regular expression you need is:
rex= re.compile(r'^\*#\*([0-7])##$')
The {1} constructs are redundant.
After doing rex.match (or rex.search, but it's not necessary here), .group(1) of the match object contains the digit given.
EDIT: The whole matched string is always available as match.group(0). If all you need is the complete string, drop any parentheses in the regular expression:
rex= re.compile(r'^\*#\*[0-7]##$')