Python replace in list - python

I have a list with different email formatting, which I want to unify all in one.
There are 2 types of email:
name.surname#dom
name.surname.extern#dom
My program allows the user to input emails, and to not have to enter "#dom" all the time (it's always the same one), what I've done is allowing the user to write name.surname or name.surname.e and then the script replaces those usernames with #dom or .extern#dom
The problem rises when I have all the mails in different formats stores in a list, and I want them to be filled to standards, so that if I have
["john.doe#dom", "john2.doe", "john3.doe.e","john4.doe.extern#dom"]
it all ends up looking like this
["john.doe#dom", "john2.doe#dom", "john3.doe.extern#dom","john4.doe.extern#dom"]
I have tried with list comprehensions, but all I got was three concatenations:
["%s.xtern#dom" % x for x in mails if x[-2:] == ".e"] +
["%s#dom" %x for x in mails if "#dom not in mails" and x[-2:] != ".e"] +
[x for x in mails if "#dom" in x]
I'm sure there's a better way that does not require me to do 3 list comprehensions and that does not require me to do
for i,v in enumerate(mails):
if "#dom" not in v:
mails[i] = "%s#dom" % v
etc.

You can use a string's endswith() method to determine what you need to do with the input:
mails = ["john.doe#dom", "john2.doe", "john3.doe.e","john4.doe.extern#dom"]
final_mails = []
for mail in mails:
if mail.endswith("#dom"):
# Use as-is if it ends with #dom.
final_mails.append(mail)
elif mail.endswith(".e"):
# Replace to extern#dom if it ends with .e
final_mails.append(mail.replace(".e", ".extern#dom"))
else:
# Add #dom on all other cases
final_mails.append("{}#dom".format(mail))
print final_mails
# Result: ['john.doe#dom', 'john2.doe#dom', 'john3.doe.extern#dom', 'john4.doe.extern#dom']
It might need more thorough checks to not accept things like #dom right in the middle of the name and whatnot. Hope that helps you out though!
Edit:
Just for fun, if you insist on a list comprehension:
mails = ["john.doe#dom", "john2.doe", "john3.doe.e","john4.doe.extern#dom"]
final_mails = ["{}#dom".format((mail.replace(".e", ".extern#dom")
if mail.endswith(".e") else mail).rstrip("#dom"))
for mail in mails]
print final_mails
# Result: ['john.doe#dom', 'john2.doe#dom', 'john3.doe.extern#dom', 'john4.doe.extern#dom']
Personally I find list comprehensions are best when they are short and readable, so I would stick with the first option.

Option without listcomprehensions:
maillist = ["john.doe#dom", "john2.doe", "john3.doe.e","john4.doe.extern#dom"]
for i in range(len(maillist)):
if "#" not in maillist[i]:
if maillist[i][-2:]== ".e":
maillist[i] += ("xtern#dom")
else:
maillist[i] += "#dom"

#Green Cell was faster than me and his answer seems to be correct. Here is a list comprehension that does the same thing :
mails = ["john.doe#dom", "john2.doe", "john3.doe.e","john4.doe.extern#dom"]
print mails
mails = [mail if mail.endswith("#dom") else mail.replace(".e", ".extern#dom") if mail.endswith(".e") else "{}#dom".format(mail) for mail in mails]
print mails
Which outputs :
['john.doe#dom', 'john2.doe', 'john3.doe.e', 'john4.doe.extern#dom']
['john.doe#dom', 'john2.doe#dom', 'john3.doe.extern#dom', 'john4.doe.extern#dom']
Hope this helps.

if you want a one-liner, combine list comprehension with multiple if/else statements:
first_list = ["john.doe#dom", "john2.doe", "john3.doe.e","john4.doe.extern#dom"]
second_list = [email + 'xtern#dom' if email.endswith(".e") else \
email if email.endswith("#dom") else "{}#dom".format(email) \
for email in first_list]
print second_list
Gives:
['john.doe#dom', 'john2.doe#dom', 'john3.doe.extern#dom', 'john4.doe.extern#dom']

I came with a different solution in the end: Declaring a help function.
def mailfy(mail):
if mail.endswith(".c"):
return mail[:-2] + "#dom"
...
mails = [mailfy(x) for x in mails]

Related

Long list in python, remove all none email (or "|") values ... and then, perhaps custom emails

I've been working on something for a week. A PDF was converted automatically to XML (ERAS medical program details), a very large and imperfect result. The problem is that this strange result returned errors for a lot of things I tried. And it seems that regex doesn't work for lists, at least this one ... . I just need to have only the emails. I could do that by getting anything with "#" in it, or removing anything with "|" in it.
How can I do this? It doesn't seem like turning it into a string works. But I could be wrong.
import xml.etree.ElementTree as ET
tree = ET.parse(r'C:\Users\Iainc\Downloads\ERAS application 2022 emails.xml')
root = tree.getroot()
import re
email = ['']
for x in root.iter():
email.append(x.text)
editedemail = ['']
search_term = 'Email:'
for i in range(len(email)-1):
if email[i] == search_term:
editedemail.append(email[i+1])
for i in range(len(email)-1):
if email[i] == search_term:
editedemail.append(email[i-1])
phonelesseditedemail = list(filter(lambda a: a != 'Phone:', editedemail))
The only things left to remove are entries like:
'Emergency Medicine | NRMP Program Code:********* | Categorical',
But the rest are email addresses I can use. I next afterwards want to write a program to automate sending custom emails, but for now I need to remove what I have mentioned.
Use a list comprehension:
pipelesseditedemail = [x for x in editedemail if '|' not in x]
You could do all if this in one loop:
for i in range(len(email)):
item1 = item2 = ''
if i > 0:
item1 = email[i-1]
if i < len(email)-1:
item2 = email[i+1]
for item in [item1, item2]:
if '#' in item and '|' not in item:
editedemails.append(item)

Return all values contains a specific text from a dictionary in python

How to return all values contains a specific text/string from a list as a comma separate value?
i have a list of emails like this:
emails = ['email#example.com',
'email1#example.com',
'email2#example.com',
'emaila#emailexample.com',
'emailb#emailexample.com',
'email33#examplex.com',
'emailas44#exampley.com',
'emailoi45#exampley.com',
'emailgh#exampley.com']
what i want to do is get all emails from the same domain like this:
Website = 'example.com'
Email = 'email#example.com','email1#example.com','email2#example.com'
and so on....
i tried this so far but can not figure out how can i achieve this, would be great if anyone help me, thanks in advance.
def Email(values, search):
for i in values:
if search in i:
return i
return None
data = Email(emails, 'example.com')
print(data)
You never needed a regex. Use a list-comprehension taking advantage of str.endswith() to look for strings with matching characters towards the end:
emails = ['email#example.com',
'email1#example.com',
'email2#example.com',
'emaila#emailexample.com',
'emailb#emailexample.com',
'email33#examplex.com',
'emailas44#exampley.com',
'emailoi45#exampley.com',
'emailgh#exampley.com']
Website = 'example.com'
print([email for email in emails if email.endswith(f'#{Website}')])
# ['email#example.com', 'email1#example.com', 'email2#example.com']
You are returning value at the first iteration itself that's why you are not able to achieve the result. You can store the emails in a list and then return the comma separated values.
Modifying your approach:
def Email(values, search):
x = list()
for i in values:
if i.endswith("#" + search):
x.append(i)
return ", ".join(x) # Returning list as a comma separated value
emails = ["email#example.com","email1#example.com","email2#example.com","emaila#emailexample.com","emailb#emailexample.com","email33#examplex.com","emailas44#exampley.com","emailoi45#exampley.com","emailgh#exampley.com"]
website = 'example.com'
data = Email(emails, website)
print("Website = " + website)
print("Email = " + data)
Hope this answers your question!!!

If not found assign value

I am pulling data out of a file that looks like this
"LIC_ARP11|104100000X|33"
I collect the taxonomy number (taxonomies) out of the second field and translate it using another file (IDVtaxo) that looks like this:
"104100000X Behavioral Health & Social Service Providers Social Worker"
If the taxonomy number is not in IDVtaxo I want to append "Not Found"
if taxofile.startswith('IDV'):
for nums in taxonomies:
IDVfile = open (os.path.join(taxodir,IDVtaxo))
for line in IDVfile:
text = line.rstrip('\n')
text = text.split("\t")
if nums in line:
data = text[2:]
final.append(data)
else:
final.append('Not Found')
Then I print the original data along with the translated taxonomy. Currently I get:
"LIC_ARP11|104100000X|33| Not Found"
I want:
"LIC_ARP11|104100000X|33 | Social Worker"
The issue seems to be that the "else" appends "Not Found" for each line instead of just when the taxonomy isn't found in IDVtaxo.
taxonomies = ['152W00000X', '156FX1800X', '200000000X', '261QD0000X', '3336C0003X', '333600000X', '261QD0000X']
translations = {'261QD0000X': 'Clinic/Center Dental', '3336C0003X': 'Pharmacy Community/Retail Pharmacy', '333600000X': 'Pharmacy'}
a = 0
final = []
for nums in taxonomies:
final.append(translations.get(nums, 'Not Found'))
for nums in taxonomies:
print nums, "|", final[a]
a = a + 1
equality operator in Python is ==:
>>> if data == 'Not Found':
... final.append(data)
for "not equal":
>>> if data != 'Not Found':
... final.append(data)
It appears you are testing for presence of nums in each line, appending 'Not Found' every time you fail to find nums.
Instead, try maintaining a variable (e.g. job_title) storing 'Not Found' string. If nums is found, reassign job_title to correct value and append it to final outside of the loop.
I believe you can get a more efficient solution if you load IDVtaxo into a dictionary structure! https://docs.python.org/2/tutorial/datastructures.html#dictionaries

Regular Expression in Python

I'm trying to build a list of domain names from an Enom API call. I get back a lot of information and need to locate the domain name related lines, and then join them together.
The string that comes back from Enom looks somewhat like this:
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1
I'd like to build a list from that which looks like this:
[domain1.com, domain2.org, domain3.co.uk, domain4.net]
To find the different domain name components I've tried the following (where "enom" is the string above) but have only been able to get the SLD and TLD matches.
re.findall("^.*(SLD|TLD).*$", enom, re.M)
Edit:
Every time I see a question asking for regular expression solution I have this bizarre urge to try and solve it without regular expressions. Most of the times it's more efficient than the use of regex, I encourage the OP to test which of the solutions is most efficient.
Here is the naive approach:
a = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
b = a.split("\n")
c = [x.split("=")[1] for x in b if x != 'TLDOverride=1']
for x in range(0,len(c),2):
print ".".join(c[x:x+2])
>> domain1.com
>> domain2.org
>> domain3.co.uk
>> domain4.net
You have a capturing group in your expression. re.findall documentation says:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
That's why only the conent of the capturing group is returned.
try:
re.findall("^.*((?:SLD|TLD)\d*)=(.*)$", enom, re.M)
This would return a list of tuples:
[('SLD1', 'domain1'), ('TLD1', 'com'), ('SLD2', 'domain2'), ('TLD2', 'org'), ('SLD3', 'domain3'), ('TLD4', 'co.uk'), ('SLD5', 'domain4'), ('TLD5', 'net')]
Combining SLDs and TLDs is then up to you.
this works for you example,
>>> sld_list = re.findall("^.*SLD[0-9]*?=(.*?)$", enom, re.M)
>>> tld_list = re.findall("^.*TLD[0-9]*?=(.*?)$", enom, re.M)
>>> map(lambda x: x[0] + '.' + x[1], zip(sld_list, tld_list))
['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
I'm not sure why are you talking about regular expressions. I mean, why don't you just run a for loop?
A famous quote seems to be appropriate here:
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
domains = []
components = []
for line in enom.split('\n'):
k,v = line.split('=')
if k == 'TLDOverride':
continue
components.append(v)
if k.startswith('TLD'):
domains.append('.'.join(components))
components = []
P.S. I'm not sure what's this TLDOverride so the code just ignores it.
Here's one way:
import re
print map('.'.join, zip(*[iter(re.findall(r'^(?:S|T)LD\d+=(.*)$', text, re.M))]*2))
# ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
Just for fun, map -> filter -> map:
input = """
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
"""
splited = map(lambda x: x.split("="), input.split())
slds = filter(lambda x: x[1][0].startswith('SLD'), enumerate(splited))
print map(lambda x: '.'.join([x[1][1], splited[x[0] + 1][1], ]), slds)
>>> ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
This appears to do what you want:
domains = re.findall('SLD\d+=(.+)', re.sub(r'\nTLD\d+=', '.', enom))
It assumes that the lines are sorted and SLD always comes before its TLD. If that can be not the case, try this slightly more verbose code without regexes:
d = dict(x.split('=') for x in enom.strip().splitlines())
domains = [
d[key] + '.' + d.get('T' + key[1:], '')
for key in d if key.startswith('SLD')
]
You need to use multiline regex for this. This is similar to this post.
data = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
domain_seq = re.compile(r"SLD\d=(\w+)\nTLD\d=(\w+)", re.M)
for item in domain_seq.finditer(data):
domain, tld = item.group(1), item.group(2)
print "%s.%s" % (domain,tld)
As some other answers already said, there's no need to use a regular expression here. A simple split and some filtering will do nicely:
lines = data.split("\n") #assuming data contains your input string
sld, tld = [[x.split("=")[1] for x in lines if x[:3] == t] for t in ("SLD", "TLD")]
result = [x+y for x, y in zip(sld, tld)]

Python - How to print a specific text

def handler_users_answ(coze, res, type, source):
if res:
if res.getType() == 'result':
aa=res.getQueryChildren()
if aa:
print 'workz1'
for x in aa:
m=x.getAttr('jid')
if m:
print m
so this code returns me the values like this:
roomname#domain.com/nickname1
roomname#domain.com/nickname2
and so on, but i want it to print the value after the '/' only.
like:
nickname1
nickname2
Thanks in advance.
You can use rpartition to get the part after the last \ in the string.
a = 'roomname#domain.com/nickname1'
b=a.split('/');
c=b[1];
You can use rsplit which will do the splitting form the right:
a = 'roomname#domain.com/nickname1'
try:
print a.rsplit('/')[1][1]
except IndexError:
print "No username was found"
I think that this is efficient and readable. If you really need it to be fast you can use rfind:
a = 'roomname#domain.com/nickname1'
index = a.rfind('/')
if index != -1:
print a[index+1:]
else:
print "No username was found"
To fully parse and validate the JID correctly, see this answer. There's a bunch of odd little edge cases that you might not expect.

Categories