finding email addresses in python with regex [duplicate] - python

This question already has answers here:
Extract email sub-strings from large document
(14 answers)
Closed 4 years ago.
Hi I'm trying to find a list of e-mails from a website. there is 4 e-mail addresses on the website but only returns 2 emails.
I'm using this to help search for the emails.
emails = re.findall(r'[^\s#<>]+#[^\s#<>]+\.[^\s#<>]+',s)
print(count, ' email address found : ',item)
count += 1

You can try out this regex :
regex = r"([\w\.-]+)#([\w\.-]+)(\.[\w\.]+)"

The following pattern should match most forms of email addresses:
emails = re.findall(r'^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#(([0-9a-zA-Z])+([-\w]*[0-9a-zA-Z])*\.)+[a-zA-Z]{2,9})$',s)

Related

Junk character while reading email body in python using win32com.client [duplicate]

This question already has answers here:
Is there a way to get around unicode issues when using win32api/com modules in python 3?
(2 answers)
Closed last year.
This post was edited and submitted for review last year and failed to reopen the post:
Original close reason(s) were not resolved
I am trying to read email body As bellow but getting junk characters
for account in EmailsAccounts:
print(account)
inbox = outlook.Folders(account).Folders('Inbox')
messages=inbox.Items
print(len(messages))
for mail in messages:
body = mail.Body
print(body.encode('utf-8'))
If the problem is related to encoding message bodies, try to use the following code instead:
print (mail.Body.encode('utf8'))
See Is there a way to get around unicode issues when using win32api/com modules in python 3? for more information.
If it is another problem I'd suggest check the message type - an Outlook folder may contain different kind of items such as appointments, tasks, documents or mail items.

Extract a string from a url field in python or SQL [duplicate]

This question already has answers here:
Retrieving parameters from a URL
(20 answers)
Closed 3 years ago.
URL example = 'https://stackoverflow.com/questions/ask?newreg=12f6529a5b3449c3be1d14458a4657ef'
i want to return the barcode number after ''newreg' in the URL. The string is not same len everytime
Would appreciate any help here
Thanks
Why not just use simple split() ?
url = 'https://stackoverflow.com/questions/ask?newreg=12f6529a5b3449c3be1d14458a4657ef'
barcode = url.split('=')[-1]
Hope this helps!

Extract email from string variable using find() [duplicate]

This question already has answers here:
Extract email sub-strings from large document
(14 answers)
Closed 3 years ago.
Do you guys know how I'll be able to extract an email from a string using find()
info = "message email#gmail.com"
I want to be able to get the entire "email#gmail.com" and output only that to the screen.
You can do this by using regex:
import re
emails_list = re.findall('\S+#\S+', info)

Using regex to extract email receiver using python 2.7 [duplicate]

This question already has answers here:
Parsing email with Python
(3 answers)
Closed 5 years ago.
I want to extract all emails that are received the email. I used this regex to extract just emails after To, it just extracts the first email.
To: ([a-z0-9_\.-]+#[\da-z\.-]+\.[a-z\.]{2,6})
And when I use this regex without To. It extracts the all emails wheather for reciver and sender.
([a-z0-9_\.-]+#[\da-z\.-]+\.[a-z\.]{2,6})
This is a sample of the data
Message-ID: <7618763.1075855377753.JavaMail.evans#thyme>
Date: Mon, 31 Dec 2001 10:53:43 -0800 (PST)
From: louise.kitchen#enron.com
To: wes.colwell#enron.com, georgeanne.hodges#enron.com, rob.milnthorp#enron.com, john.zufferli#enron.com, peggy.hedstrom#enron.com, thomas.myers#enron.com
Thank you
Try to use something like:
emails = re.findall('write your expression there', emailDataText)

How to improve this email regex? [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 7 years ago.
I am trying to match email addresses in Python using regex with this pattern:
"\w{1,}#\w{1,}.\w{1,}"
However sometimes there are email addresses that look like firstname.lastname#lol.omg.hahaha.museum which my pattern will miss.
Is there a way to adjust this regex so it will include an arbitrary number of chained ".word" type patterns?
You can use the following:
[\w.-]+#[\w-][\w.-]+\w //replaced {1,} with its equivalent.. "+"
You shouldn't try to match email addresses with regex. You'll have to use a more complicated state machine to check whether the address correctly matches RFC 2822.
https://pypi.python.org/pypi/validate_email is one such library you can check out.
This should work for you
[a-zA-Z0-9._-]+#([a-zA-Z0-9.-]+\.)+[a-zA-Z0-9.-]{2,4}

Categories