Regex to find string with multiple dot (.) characters in between [duplicate] - python

I need to a regex to validate a string like "foo.com". A word which contains a dot. I have tried several but could not get it work.
The patterns I have tried:
(\\w+\\.)
(\\w+.)
(\\w.)
(\\W+\\.)
Can some one please help me one this.
Thanks,

Use regex with character class
([\\w.]+)
If you just want to contain single . then use
(\\w+\\.\\w+)
In case you want multiple . which is not adjacent then use
(\\w+(?:\\.\\w+)+)

To validate a string that contains exactly one dot and at least two letters around use match for
\w+\.\w+
which in Java is denoted as
\\w+\\.\\w+

This regex works:
[\w\[.\]\\]+
Tested for following combinations:
foo.com
foo.co.in
foo...
..foo

I understand your question like, you need a regex to match a word which has a single dot in-between the word (not first or last).
Then below regex will satisfy your need.
^\\w+\\.\\w+$

Related

Exact search of a string that has parenthesis using regex

I am new to regexes.
I have the following string : \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop
Out of this string, I want to extract Rivet and I already have (941) as a string in a variable.
My thought process was like this:
Find all the (941)s
filter the results by checking if the string after (941) is followed by \n, followed by a word, and ending with \n
I made a regex for the 2nd part: \n[\w\s\'\d\-\/\.]+$\n.
The problem I am facing is that because of the parenthesis in (941) the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941) so then I can apply the 3rd step on that.
PS.
I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
I have tried the following regex: (?:...), (941){1} and the make regex literal character \ like this \(941\) with no useful results. Maybe I am using them wrong.
Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.
Thanks!
Assuming:
You want to avoid matching only digits;
Want to match a substring made of word-characters (thus including possible digits);
Try to escape the variable and use it in the regular expression through f-string:
import re
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}\n(?!\d+\n)(\w+)', s)[0]
print(m)
Prints:
Rivet
If you have text in a variable that should be matched exactly, use re.escape() to escape it when substituting into the regexp.
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
num = '(941)'
re.findall(rf'(?<=\n{re.escape(num)}\n)[\w\s\'\d\-\/\.]+(?=\n)', s)
This puts (941)\n in a lookbehind, so it's not included in the match. This avoids a problem with the \n at the end of one match overlapping with the \n at the beginning of the next.

Check if expression matches a regex

I would like to validate the following expressions :
"CODE1:123/CODE2:3467/CODE1:7686"
"CODE1:9090"
"CODE2:078/CODE1:7788/CODE1:333"
"CODE2:77"
In my case, the patterns 'CODE1:xx' or 'CODE2:xx' are given in any different orders.
I can sort the patterns to make them like 'CODE1:XX/CODE1:YY/CODE2:ZZ'
and check if matches something like
r'[CODE1:\d+]*[CODE2:\d+]*'
Could we make it shorter : is it possible to solve this with one regex matcher ?
Thanks
This regex will provide a match for all 4 cases:
CODE[12]:\d+(?:/CODE[12]:\d+)*
See here: https://regex101.com/r/wn30a5/1
It will match CODE followed by either 1 or 2 and then a colon : with digits; and optionally followed by a slash / and that pattern again, any number of times. So a trailing slash won't be permitted and it can appear as a single code too; and in any order; so it doesn't need to be sorted first.
CODE is static but after it the digit is dynamic, to make it shorter just use CODE\d:\d+
if you want to match only two digit after : use CODE\d:\d{2}

Python regex match all sentences include either wordA or wordB [duplicate]

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Regex to remove underscore followed by number in string in python?

I have set of strings like this:
CLM_ADJUSTMT.CLAIM_DATA.TUDCAP_L_2.CRT_TS_0,
marks.science_0.physics_0,
marks.geo_1
I want to remove only if underscore is followed by number(CRT_TS_0=CRT_TS)
can someone help me to get the right regex
I tried using
re.sub('_[0-9]+$', '',newstr)
but it removes all underscore and numbers
output:
CLM_ADJUSTMT.CLAIM_DATA.TUDCAP_L.CRT_TS,
marks.science.physics,
marks.geo
As #wiktor said, remove the « $ »sign which means that your the string that you’re looking for has to FINISH with a _[0-9], not just contain.

extract string betwen two strings in pandas

I have a text column that looks like:
http://start.blabla.com/landing/fb603?&mkw...
I want to extract "start.blabla.com"
which is always between:
http://
and:
/landing/
namely:
start.blabla.com
I do:
df.col.str.extract('http://*?\/landing')
But it doesn't work.
What am I doing wrong?
Your regex matches http:/, then 0+ / symbols as few as possible and then /landing.
You need to match and capture the characters (The extract method accepts a regular expression with at least one capture group.) after http:// other than /, 1 or more times. It can be done with
http://([^/]+)/landing
^^^^^^^
where [^/]+ is a negated character class that matches 1+ occurrences of characters other than /.
See the regex demo
Just to answer a question you didn't ask, if you wanted to extract several portions of the string into separate columns, you'd do it this way:
df.col.str.extract('http://(?P<Site>.*?)/landing/(?P<RestUrl>.*)')
You'd get something along the lines of:
Site RestUrl
0 start.blabla.com fb603?&mkw...
To understand how this regex (and any other regex for that matter) is constructed I suggest you take a look at the excellent site regex101. I constructed a snippet where you can see the above regex in action here.

Categories