Junk character while reading email body in python using win32com.client [duplicate]

Junk character while reading email body in python using win32com.client [duplicate] - python

This question already has answers here:
Is there a way to get around unicode issues when using win32api/com modules in python 3?
(2 answers)
Closed last year.
This post was edited and submitted for review last year and failed to reopen the post:
Original close reason(s) were not resolved
I am trying to read email body As bellow but getting junk characters
for account in EmailsAccounts:
print(account)
inbox = outlook.Folders(account).Folders('Inbox')
messages=inbox.Items
print(len(messages))
for mail in messages:
body = mail.Body
print(body.encode('utf-8'))

If the problem is related to encoding message bodies, try to use the following code instead:
print (mail.Body.encode('utf8'))
See Is there a way to get around unicode issues when using win32api/com modules in python 3? for more information.
If it is another problem I'd suggest check the message type - an Outlook folder may contain different kind of items such as appointments, tasks, documents or mail items.

Related

How to get multiple ip's from a domain loop [duplicate]

This question already has answers here:
How can I find all matches to a regular expression in Python?
(1 answer)
Writing a list to a file with Python, with newlines
(26 answers)
Closed 4 months ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
As you can see below there are multiple strings in an array that is going to go through a loop.
from urllib.request import urlopen
import re
This is the website strings that I want to test out to get the IP from.
exist = ["https://www.youtube.com/", "https://www.twitch.tv/", 'https://twitter.com/']
for c in exist:
try:
def getIP():
d = str(urlopen(c).read())
return r.compile(r'Address: (\d+\.\d+\.\d+\.\d+)').search(d).group(1)
print(getIP())
except AttributeError:
print('error')
But after trying to run the loop it only says Certificate verified failed. is there a reason on why it outputs like that?
Thank you!

Best way to extract the Date from a string [duplicate]

This question already has answers here:
Python/Regex - How to extract date from filename using regular expression?
(5 answers)
Closed 2 years ago.
I am trying to extract the date from a string. I used to be able to just pull the entire line, but the company sending the data keeps adding characters to the front/back of the date, which causes my code to stop functioning till I fix it. I am getting mixed reviews searching on if I should use regex or datetime module. Here is what I am currently using, which you can see if cumbersome and not efficient.
line = ' .10/10/2020<=x'
date = line.strip().replace('.', '').replace('<', '').replace('=', '').replace('x', '')
edit:
I ended up taking Yash's regex and it worked perfectly.

Why not extract using regex? this will only work for format xx/xx/xxxx. need to change regex if multiple formats are found
import re
line=' .10/10/2020<=x'
a=re.search("([0-9]{2}/[0-9]{2}/[0-9]{4})", line)
print(a.group(1))

Using regex to extract email receiver using python 2.7 [duplicate]

This question already has answers here:
Parsing email with Python
(3 answers)
Closed 5 years ago.
I want to extract all emails that are received the email. I used this regex to extract just emails after To, it just extracts the first email.
To: ([a-z0-9_\.-]+#[\da-z\.-]+\.[a-z\.]{2,6})
And when I use this regex without To. It extracts the all emails wheather for reciver and sender.
([a-z0-9_\.-]+#[\da-z\.-]+\.[a-z\.]{2,6})
This is a sample of the data
Message-ID: <7618763.1075855377753.JavaMail.evans#thyme>
Date: Mon, 31 Dec 2001 10:53:43 -0800 (PST)
From: louise.kitchen#enron.com
To: wes.colwell#enron.com, georgeanne.hodges#enron.com, rob.milnthorp#enron.com, john.zufferli#enron.com, peggy.hedstrom#enron.com, thomas.myers#enron.com
Thank you

Try to use something like:
emails = re.findall('write your expression there', emailDataText)

Python Request in unicode [duplicate]

This question already has answers here:
urllib.quote() throws KeyError
(3 answers)
Closed 7 years ago.
I am trying to make a program that requests to steam to get a the cheapest price for an item. For this I will be using StatTrak™ P250 | Supernova (Factory New) as an example.
The problem is that when requesting, you will make a url:
http://www.steamcommunity.com/market/priceoverview/?country=SG&currency=13&appid=730&market_hash_name=StatTrak™%20P250%20%7C%20Supernova%20%28Factory%20New%29
Afterwards, (I am using the requests module) I do this:
url = "http://www.steamcommunity.com/market/priceoverview/?country=SG&currency=13&appid=730&market_hash_name=StatTrak™%20P250%20%7C%20Supernova%20%28Factory%20New%29"
requests.get(url)
However, the server will return an error.
I can't seem to find solutions to replace ™. I have tried %2122. In python I tried using u'\u084a' but that didn't work too. The problem is that python sends literally \u084a in the request. Is there any way to solve this?

Just use URL encoding. You can't use unicode in urls.
>>> import urllib
>>> f = {'market_hash_name': 'StatTrak™'}
>>> urllib.urlencode(f)
'market_hash_name=StatTrak%E2%84%A2'
Also possible
>>> urllib.quote_plus('StatTrak™')

How to improve this email regex? [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 7 years ago.
I am trying to match email addresses in Python using regex with this pattern:
"\w{1,}#\w{1,}.\w{1,}"
However sometimes there are email addresses that look like firstname.lastname#lol.omg.hahaha.museum which my pattern will miss.
Is there a way to adjust this regex so it will include an arbitrary number of chained ".word" type patterns?

You can use the following:
[\w.-]+#[\w-][\w.-]+\w //replaced {1,} with its equivalent.. "+"

You shouldn't try to match email addresses with regex. You'll have to use a more complicated state machine to check whether the address correctly matches RFC 2822.
https://pypi.python.org/pypi/validate_email is one such library you can check out.

This should work for you
[a-zA-Z0-9._-]+#([a-zA-Z0-9.-]+\.)+[a-zA-Z0-9.-]{2,4}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Junk character while reading email body in python using win32com.client [duplicate] - python

Related

How to get multiple ip's from a domain loop [duplicate]

Best way to extract the Date from a string [duplicate]

Using regex to extract email receiver using python 2.7 [duplicate]

Python Request in unicode [duplicate]

How to improve this email regex? [duplicate]

Categories

Resources