How to improve this email regex? [duplicate] - python

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 7 years ago.
I am trying to match email addresses in Python using regex with this pattern:
"\w{1,}#\w{1,}.\w{1,}"
However sometimes there are email addresses that look like firstname.lastname#lol.omg.hahaha.museum which my pattern will miss.
Is there a way to adjust this regex so it will include an arbitrary number of chained ".word" type patterns?

You can use the following:
[\w.-]+#[\w-][\w.-]+\w //replaced {1,} with its equivalent.. "+"

You shouldn't try to match email addresses with regex. You'll have to use a more complicated state machine to check whether the address correctly matches RFC 2822.
https://pypi.python.org/pypi/validate_email is one such library you can check out.

This should work for you
[a-zA-Z0-9._-]+#([a-zA-Z0-9.-]+\.)+[a-zA-Z0-9.-]{2,4}

Related

Best way to extract the Date from a string [duplicate]

This question already has answers here:
Python/Regex - How to extract date from filename using regular expression?
(5 answers)
Closed 2 years ago.
I am trying to extract the date from a string. I used to be able to just pull the entire line, but the company sending the data keeps adding characters to the front/back of the date, which causes my code to stop functioning till I fix it. I am getting mixed reviews searching on if I should use regex or datetime module. Here is what I am currently using, which you can see if cumbersome and not efficient.
line = ' .10/10/2020<=x'
date = line.strip().replace('.', '').replace('<', '').replace('=', '').replace('x', '')
edit:
I ended up taking Yash's regex and it worked perfectly.
Why not extract using regex? this will only work for format xx/xx/xxxx. need to change regex if multiple formats are found
import re
line=' .10/10/2020<=x'
a=re.search("([0-9]{2}/[0-9]{2}/[0-9]{4})", line)
print(a.group(1))

Regex working in text editor(sublime) but not in python [duplicate]

This question already has answers here:
Case insensitive regular expression without re.compile?
(10 answers)
Closed 2 years ago.
I want to extract the line using regex.
The line that I want to extract from document is:
":method":"POST",":path":"/api/browser/projects/8bd4d1d3-0b69-515e-8e15-e9c49992f7d5/buckets/b-ao-mock-testing/copy
The regex I am using is:
":method"[:"a-z,/\d-]{20,1000}/copy
The code for the same in python is:
re.findall('":method"[:"a-z,/\d-]{20,1000}/copy', str(s), re.MULTILINE)
It is working perfectly fine in sublime text but not in python. It is returning an empty list in python. How to resolve this?
You need to use i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]).
Without this how will POST match?
or use ":method"[:"a-zA-Z,/\d-]{20,1000}/copy
See demo

How to do if-else in regex based on previous regex in the pattern (Python)? [duplicate]

This question already has answers here:
How can I make part of regex optional?
(2 answers)
Closed 3 years ago.
I'm trying to match urls of the form
r'https://.*\.mysite.com'
However, one will notice that if there is no subdomain .mysite.com isn't going to be valid. If there is a subdomain such as sub.mysite.com, only then do I want a dot in front of mysite, otherwise, I want 0 dots (or more generally, characters) between https:// and mysite.com
How do I accomplish this?
This doesn't seem to be a Python-specific problem but more of a RegEx one.
You could modify your expression to optionally accept a subdomain as such:
https:\/\/([^.]+\.)?mysite\.com
Or allow for multi-level subdomains:
https:\/\/([^.]+\.)*mysite\.com
Additionally, if you didn't want to use a capture group, you could use a non-capturing group:
https:\/\/(?:[^.]+\.)*mysite\.com

Regex of email works with decimal value as well. How do I fix it? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have the following two regex patterns.
url(r"^list/(?P<email>[\w.%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4})?/?", MyFunction_ListAPIView.as_view()),
url(r"^list/(?P<id>[\d+])/$", OtherFunction_ListAPIView.as_view()),
I wanted to have two separate functions for email and for id.
If an email is passed MyFunction should be called however if a decimal value is passed then OtherFunction should be called.
I just passed in a decimal value like so - Here 11 is a decimal value and not regex. Yet it is still calling the same function. Any suggestions on what I might be doing wrong ?
http://127.0.0.1:8000/api/job/list/11/
The ?/? at the end of the first regex makes the email optional. I don't know what you have in the urls list, but I suggest you try your regex here https://regex101.com/ so you can debug easily any url.

What are () (parentheses) are for in regex python [duplicate]

This question already has answers here:
Python regex -- extraneous matchings
(5 answers)
Closed 6 years ago.
I searched in all the internet and didnt get a good answer on this thing.
What parentheses in python are stand for? its very wierd..
For example, if i do:
re.split(r'(/s*)', "ho from there")
its will give me a list of separate words with the spaces between that... how does its happening?
This isn't specific to python, but in regex those denote a capture group.
Further information on how these are handled in re.split can be seen here

Categories