ihi, I want to search for a following word after a match, so when if i search in a string "i use a table blue" with (\w+) with that regex i solver the problem, but "i use a table blue-green-red" so how can i get the entire word without using the (\w+).(\w+).(\w+) n number of times. how can i get that, but there is always a carriage return after the "i use a table blue-green-red\n" or "i use a table blue\n" so how can i get the following entire word even if there are n number of dash in the following word
If I understand correctly, what you are trying to extract is the last word (or trailing word) in the matched search, even if it has dashes. You also indicate that you are guaranteed a newline \n at the end of the phrase.
With that in mind, a possible solution would be to include a greedy operator right after the word \w, and curb it with a newline, something like:
regex = r"i use a table (\w+.*)\n"
which matches both:
"i use a table blue\n"
and
"i use a table blue-green-red\n"
extracting the last word.
See it in action here: https://regex101.com/r/34MBP3/1
Related
Im using python to extract some info
i wanna get the words/names before the charcter :
but the problem is everythig is tied together
from here
Morgan Stanley.Erik Woodring:
i just wanna extract "Erik Woodring:"
or from here
market.Operator:
i just wanna extract Operator:
sometimes there are questiosn like this
to acquire?Tim Cook:
i just wanna extract "Tim Cook:"
this is what i tried
\w*(?=.*:)
this is not getting what i wanted, its returning a lot of words
This could be the regex you're looking for:
\b[\w\s]+(?=:)
\b world boundary;
[\w\s]+ matches any word or whitespace (at least one character);
(?=:) positive lookahead that specifies the word must be followed by a punctation mark;
https://regex101.com/r/w86oWv/1
If you want to get the ":" too you can simply remove the lookahead:
\b[\w\s]+:
I have a string like below:
"i'm just returning from work. *oeee* all and we can go into some detail *oo*. what is it that happened as far as you're aware *aouu*"
with some junk characters like above (highlighted with '*' marks). All I could observe was that junk characters come as bunch of vowels knit together. Now, I need to remove any word that has space before and after and has only vowels in it (like oeee, aouu, etc...) and length of 2 or more. How do I achieve this in python?
Currently, I built a tuple to include replacement words like ((" oeee "," "),(" aouu "," ")) and sending it through a for loop with replace. But if the word is 'oeeee', I need a add a new item into the tuple. There must be a better way.
P.S: there will be no '*' in the actual text. I just put it here to highlight.
You need to use re.sub to do a regex replacement in python. You should use this regex:
\b[aeiou]{2,}\b
which will match a sequence of 2 or more vowels in a word by themselves. We use \b to match the boundaries of the word so it will match at the beginning and end of the string (in your string, aouu) as well as words adjacent to punctuation (in your string, oo). If your text may include uppercase vowels too, use the re.I flag to ignore case:
import re
text = "i'm just returning from work. oeee all and we can go into some detail oo. what is it that happened as far as you're aware aouu"
print(re.sub(r'\b[aeiou]{2,}\b', '', text, 0, re.I))
Output
i'm just returning from work. all and we can go into some detail . what is it that happened as far as you're aware
I have tried to replace in all procedures some mistakes. Now, I need to find last "end;" in procedure and replace it with another text.
I wrote like: (\s.*)(end|END)(.*(;).*)
But in work not correctly, it also replace some words in the middle of the text. I using re biblio from python.
You can use
result = re.sub(r'(?si)(.*)\bend\b', r'\g<1>some other word', text)
The regex matches
(?si) - an inline re.DOTALL (s) and re.IGNORECASE (i) modifier
(.*) - Group 1: any zero or more chars as many as possible
\bend\b -a whole word end.
The \g<1>some other word replacement is the Group 1 value (I used \g<1> since it will be helpful if your some other word starts with a digit) plus your word.
NOTE: if your some other word can contain literal backslashes, do not forget to double them.
I am trying to replace every word within quotes " " to upper case word except those coming after the word "then" in a pandas column:
for example:
0 There was a "quick" "brown" fox who "jumped" over the wall then "fell" and broke its "tooth"
the output should be:
0 There was a "QUICK" "BROWN" fox who "JUMPED" over the wall then "fell" and broke its "TOOTH"
although I am able to find the words in quotes but I am not able to exclude the word coming right after "then".
df.str.replace({r'"(.*?)"':r'\U$1') #this will select and replace all values in quotes to uppercase also values after then
please help.
You can use regex (?<!then\s)"(\w*)" to find the words within quotes that are NOT preceded by 'then' & 'space'
"(\w*)" = Look for words within quotes
(?<!then\s) = Make sure the words that are matched with "(\w*)"does not have 'then' & 'space' before it(Negative look-behind)
RegexDemo You can see the demo of the regex here (you can put several other string to check how the regex works on them as well)
Regex-info This is very comprehensive website (kind of the go-to website for all things regex) on regex, almost all concepts of regex should be answered here. It is not programming language dependent & has a lot of information which can be overwhelming.
Regex Cheat-Sheet I would say start with this cheat sheet, it is very simple & explained in simple words. I find it very helpful.
String= He "ate" a "penguin" then "played with a hamburger.
Turn the string into a list splitting at the word then. Convert the list[0] into a string, and use an if '"' is in clause to isolate the quoted words. Capitalize. Then split by spaces, use join to get the whole string back together again and there ya go
I don't use or do much text searching but have not been able to find an answer as to what the regex is to find all words starting with T and ending with T from a text file where each word is on a newline. Tried a number of suggestions from searches; the following finds all words starting with T and where T occurs next. However, I want to find where the LAST letter is T also, irrespective of how many T's occur between. Apologies if this is actually trivial, but after every combo I can find I have no result. I am unsure why r'^T.*T$' doesn't work.
with open('/Users/../words.txt') as f:
passage = f.read()
words = re.findall(r'T.+T', passage)
print(words)
I'd use that expression:
re.findall(r"\bT\w*?T\b",s))
use word boundary
use any numbers of \w to avoid matching spaces in between
use "non-greedy" mode (maybe not that useful here since word boundary already does the job)
Use word boundary anchor \b and non-whitespace character \S:
words = re.findall(r'\bT\S+T\b', passage)
this will also allow to match such words as Trust-TesT, Tough&FasT etc.