Using Regex to find and replace email addresses - python

New to Python and would like to use it with Regex to work with a list of 5k+ email addresses. I need to change the encapsulate each address with either quotes. I am using \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b to identify each email address. How would I replace the current entry of user#email.com to "user#email.com" adding quotes around the each of the 5k email addresses?

You can use re.sub module and using back-reference like this:
>>> a = "this is email: someone#mail.com and this one is another email foo#bar.com"
>>> re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)
'this is email: "someone#mail.com" and this one is another email "foo#bar.com"'
UPDATE: If you have a file that want to replace emails in each line of it you can use readlines() like this:
import re
with open("email.txt", "r") as file:
lines = file.readlines()
new_lines = []
for line in lines:
new_lines.append(re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))
with open("email-new.txt", "w") as file:
file.writelines(new_lines)
email.txt:
this is test#something.com and another email here foo#bar.com
another email abc#bcd.com
still remaining someone#something.com
email-new.txt (after running the code):
this is "test#something.com" and another email here "foo#bar.com"
another email "abc#bcd.com"
still remaining "someone#something.com"

Related

Insert a value before comma in a file

I'm new to Python and I'm trying to create a simple program for login which reads/writes the information to/from a text file, but I'm having an issue.
Let's say I have the following content in a text file:
mytest#gmail.com, testPass123
First the email and after the comma the password. How can I read those two separately?
I have used .split(',') but it stores the whole line.
If I run this:
email = []
for line in file:
email.append(line.split(','))
print(email[0])
I get the following output:
['mytest#gmail.com', ' testPass123\n']
I think your variable naming is confusing you here. If you name email accounts, things might become clearer:
accounts = []
for line in file:
accounts.append(line.strip().split(','))
for email, password in accounts:
print("Email:", email, "Password:", password)
You may be looking for multiple assignment
>>> a, b = "em#ail, pass".split(",")
>>> a
'em#ail'
>>> b
' pass'

Check variable against specific line in a text file | Python 3.6.x

Pretend I am making an email script. The user has already made a username and password, which has been stored in a text file so they can log in later at anytime.
The user needs to be able to log in. I want python to check that the users input matches the information in the text file from earlier, on their corresponding line. Capitalization doesn't matter.
The text file that was created initially reads:
johncitizen
johnspassword
My python script should read something like:
##Reads text file
guessusername = input('What is your username? ')
guesspassword = input('What is your password? ')
if guessusername.lower() = lines[0] and guesspassword = lines[1]:
##Grant access
I don't mind if capitalization is wrong, as long as the string itself matches up
Before first of all, what you are doing with plain text password storage is ill-advised. You should be using hashing+salting, or even better, pick a decent framework to work in and learn from how they do it.
First of all, your data storage format should be more record like:
user_id<tab>username<tab>password
user_id<tab>username<tab>password
user_id<tab>username<tab>password
In that case, you are able to read the file like this:
username = ... #user input
password = ... #user input
found_user_id = None
with open('pass.txt', 'rt') as f:
for line in f:
fields = line.split("\t")
if fields[1] == username and fields[2] == password:
found_user_id = fields[0]
break
#okay, here if found_user_id is not None, then you have found them
#if it is None, then you did not find them.
Truly, a database is much more useful than a text file, but this is how it works!

Regex to find consecutive IP Addresses

I finally have to throw in the towel after working with this for quite some time today. I am trying to retrieve all the IP addresses from a output that looks like this:
My Address: 10.10.10.1
Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209
192.168.192.223
Record Route:
I need to pull all the IP addresses between from 'Explicit Route' and 'Record Route'. I am using textfsm and I seem not to be able to get everything I need.
Use regex and string operations:
import re
s = '''My Address: 10.10.10.1
Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209
192.168.192.223
Record Route:'''
ips = re.findall(r'\d+\.\d+\.\d+\.\d+', s[s.find('Explicit Route'):s.find('Record Route')])
import re
with open('file.txt', 'r') as file:
f = file.read().splitlines()
for line in f:
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', line)
for f in found:
print(f)
Edit:
We open the txt and read by line, then for each line using regular exp. to find the ips ( can have 1-3 numbers, then . and repeat 4 times)

Parse email fields

I want to parse email addresses from a To: email field.
Indeed, when looping on the emails in a mbox:
mbox = mailbox.mbox('test.mbox')
for m in mbox:
print m['To']
we can get things like:
info#test.org, Blahblah <blah#test.com>, <another#blah.org>, "Hey" <last#one.com>
That should be parsed into:
[{email: "info#test.org", name: ""},
{email: "blah#test.com", name: "Blahblah"},
{email: "another#blah.org", name: ""},
{email: "last#one.com", name: "Hey"}]
Is there something already built-in (in mailbox or another module) for this or nothing?
I read a few times this doc but I didn't find something relevant.
You can use email.utils.getaddresses() for this:
>>> getaddresses(['info#test.org, Blahblah <blah#test.com>, <another#blah.org>, "Hey" <last#one.com>'])
[('', 'info#test.org'), ('Blahblah', 'blah#test.com'), ('', 'another#blah.org'), ('Hey', 'last#one.com')]
(Note that the function expects a list, so you have to enclose the string in [...].)
email.parser has the modules you're looking for. email.message is still relevant, because the parser will return messages using this structure, so you'll be getting your header data from that. But to actually read the files in, email.parser is the way to go.
As pointed by #TheSpooniest, email has a parser:
import email
s = 'info#test.org, Blahblah <blah#test.com>, <another#blah.org>, "Hey" <last#one.com>'
for em in s.split(','):
print email.utils.parseaddr(em)
gives:
('', 'info#test.org')
('Blahblah', 'blah#test.com')
('', 'another#blah.org')
('Hey', 'last#one.com')
Python provides email.Header.decode_header() for decoding header. The function decode each atom and return a list of tuples ( text, encoding ) that you still have to decode and join to get the full text.
For addresses, Python provides email.utils.getaddresses() that split addresses in a list of tuple ( display-name, address ). display-name need to be decoded too and addresses must match the RFC2822 syntax. The function getmailaddresses() does all the job.
Here's a tutorial that might help http://blog.magiksys.net/parsing-email-using-python-header

Determine unique "from" email addresses in Maildir folder

I want to find out a list of "From" addresses in a Maildir folder. Using the following script, it illustrates the varying formats that are valid in From:
import mailbox
mbox = mailbox.Maildir("/home/paul/Maildir/.folder")
for message in mbox:
print message["from"]
"John Smith" <jsmith#domain.com>
Tony <tony#domain2.com>
brendang#domain.net
All I need is the email address, for any valid (or common) "From:" field format. This must have been solved a crazillion times before, so I was expecting a library. All I can find is various regexes.
Is there a standard approach?
email.utils.parseaddr is your friend:
>>> emails = """"John Smith" <jsmith#domain.com>
Tony <tony#domain2.com>
brendang#domain.net"""
>>> lines = emails.splitlines()
>>> from email.utils import parseaddr
>>> [parseaddr(email)[1] for email in lines]
['jsmith#domain.com', 'tony#domain2.com', 'brendang#domain.net']
So you should just be able to work with:
for message in mbox:
print parseaddr(message['from'])
Then, I guess if you just want unique email addresses, then you can just use a set directly over mbox, eg:
mbox = mailbox.MailDir('/some/path')
uniq_emails = set(parseaddr(email['from'])[1] for email in mbox)

Categories