REGEX: how to i get the name more the character " : " - python

Im using python to extract some info
i wanna get the words/names before the charcter :
but the problem is everythig is tied together
from here
Morgan Stanley.Erik Woodring:
i just wanna extract "Erik Woodring:"
or from here
market.Operator:
i just wanna extract Operator:
sometimes there are questiosn like this
to acquire?Tim Cook:
i just wanna extract "Tim Cook:"
this is what i tried
\w*(?=.*:)
this is not getting what i wanted, its returning a lot of words

This could be the regex you're looking for:
\b[\w\s]+(?=:)
\b world boundary;
[\w\s]+ matches any word or whitespace (at least one character);
(?=:) positive lookahead that specifies the word must be followed by a punctation mark;
https://regex101.com/r/w86oWv/1
If you want to get the ":" too you can simply remove the lookahead:
\b[\w\s]+:

Related

Regex negative lookahead in python [duplicate]

I am trying to search for all occurrences of "Tom" which are not followed by "Thumb".
I have tried to look for
Tom ^((?!Thumb).)*$
but I still get the lines that match to Tom Thumb.
You don't say what flavor of regex you're using, but this should work in general:
Tom(?!\s+Thumb)
In case you are not looking for whole words, you can use the following regex:
Tom(?!.*Thumb)
If there are more words to check after a wanted match, you may use
Tom(?!.*(?:Thumb|Finger|more words here))
Tom(?!.*Thumb)(?!.*Finger)(?!.*more words here)
To make . match line breaks please refer to How do I match any character across multiple lines in a regular expression?
See this regex demo
If you are looking for whole words (i.e. a whole word Tom should only be matched if there is no whole word Thumb further to the right of it), use
\bTom\b(?!.*\bThumb\b)
See another regex demo
Note that:
\b - matches a leading/trailing word boundary
(?!.*Thumb) - is a negative lookahead that fails the match if there are any 0+ characters (depending on the engine including/excluding linebreak symbols) followed with Thumb.
Tom(?!\s+Thumb) is what you search for.

Python: Regex to search for a "Mozilla" but ignore the match if the string also includes "iPhone" [duplicate]

I am trying to search for all occurrences of "Tom" which are not followed by "Thumb".
I have tried to look for
Tom ^((?!Thumb).)*$
but I still get the lines that match to Tom Thumb.
You don't say what flavor of regex you're using, but this should work in general:
Tom(?!\s+Thumb)
In case you are not looking for whole words, you can use the following regex:
Tom(?!.*Thumb)
If there are more words to check after a wanted match, you may use
Tom(?!.*(?:Thumb|Finger|more words here))
Tom(?!.*Thumb)(?!.*Finger)(?!.*more words here)
To make . match line breaks please refer to How do I match any character across multiple lines in a regular expression?
See this regex demo
If you are looking for whole words (i.e. a whole word Tom should only be matched if there is no whole word Thumb further to the right of it), use
\bTom\b(?!.*\bThumb\b)
See another regex demo
Note that:
\b - matches a leading/trailing word boundary
(?!.*Thumb) - is a negative lookahead that fails the match if there are any 0+ characters (depending on the engine including/excluding linebreak symbols) followed with Thumb.
Tom(?!\s+Thumb) is what you search for.

python search word after match dash

ihi, I want to search for a following word after a match, so when if i search in a string "i use a table blue" with (\w+) with that regex i solver the problem, but "i use a table blue-green-red" so how can i get the entire word without using the (\w+).(\w+).(\w+) n number of times. how can i get that, but there is always a carriage return after the "i use a table blue-green-red\n" or "i use a table blue\n" so how can i get the following entire word even if there are n number of dash in the following word
If I understand correctly, what you are trying to extract is the last word (or trailing word) in the matched search, even if it has dashes. You also indicate that you are guaranteed a newline \n at the end of the phrase.
With that in mind, a possible solution would be to include a greedy operator right after the word \w, and curb it with a newline, something like:
regex = r"i use a table (\w+.*)\n"
which matches both:
"i use a table blue\n"
and
"i use a table blue-green-red\n"
extracting the last word.
See it in action here: https://regex101.com/r/34MBP3/1

positive lookbehind multiple times

"Name":"abc"
Expected output: Name
Here in this case, when I have the value "abc", I need to fetch the word Name by using positive look behind and extracting the words between the occurences of ".
You don't need anything fancy like look-behinds for this functionality:
"([A-Za-z]*)":"abc"
Regex101
Edit: Since you've added the python tag, escape your quotes:
\"([A-Za-z]*)\":\"abc\"
New Regex101

match until a certain pattern using regex

I have string in a text file containing some text as follows:
txt = "java.awt.GridBagLayout.layoutContainer"
I am looking to get everything before the Class Name, "GridBagLayout".
I have tried something the following , but I can't figure out how to get rid of the "."
txt = re.findall(r'java\S?[^A-Z]*', txt)
and I get the following: "java.awt."
instead of what I want: "java.awt"
Any pointers as to how I could fix this?
Without using capture groups, you can use lookahead (the (?= ... ) business).
java\s?[^A-Z]*(?=\.[A-Z]) should capture everything you're after. Here it is broken down:
java //Literal word "java"
\s? //Match for an optional space character. (can change to \s* if there can be multiple)
[^A-Z]* //Any number of non-capital-letter characters
(?=\.[A-Z]) //Look ahead for (but don't add to selection) a literal period and a capital letter.
Make your pattern match a period followed by a capital letter:
'(java\S?[^A-Z]*?)\.[A-Z]'
Everything in capture group one will be what you want.
This seems to do what you want with re.findall(): (java\S?[^A-Z]*)\.[A-Z]

Categories