LangDetectException No features in text - python

I have a script that detects the language of incomming text (which are in bulk) with the help of langdetect module.I also have set an email alert script for error so when I get an error a mail will be sent to me. my problem is that whenever langdetect is not able to recognise a language (that happens a lot of time as I get many random texts from internet) it throws an exception "No Features in Text". Due to this my daily email sending capacity gets exhausted. What I want is for it to check if the error is for no features in text then it will skip the sending email part else it will send email.
How can I do this?
I tried using if case:
if LangDetectException.code == 'no features in text':
pass
else:
sendmail()
Thank you

I solved it usingget_code method as follows
if error.get_code() == 5:
pass
else:
sendmail()
5 is for no features in text.
Thank you

Related

How to validate phone number and email in aws lex code hook(in lambda)

How can I validate a phone number and email in AWS Lex code hook (in Lambda).
I had tried using the following code to validate the phone number and email address in AWS Lex chatbot. I am getting errors.
import re
EMAIL_REGEX = re.compile(r"[^#]+#[^#]+\.[^#]+")
if len(str(phonenumber)) <= 10 or len(str(phonenumber)) >= 10:
return build_validation_result(False,
'PhoneNumber',
'Please enter valid phone number which contains 10 digits'
)
if not EMAIL_REGEX.match(email):
return build_validation_result(False,
'Email',
'Please enter valid email address'
)
Firstly, you will want to fix some of your formatting. Following the guide here will serve you well both to improve the readability of your code for yourself and others who you want help from or who need to maintain code later on.
Secondly, I am assuming you are omitting the vast majority of your code here, and that some of the errors in your indenting come from issues pasting to stackoverflow. I have fixed these errors, but if you are missing other important information regarding interacting with the aws api no one can help you until you post the code and ideally a full traceback of your error.
Not everyone might agree with me on this, but unless you are an expert with regular expressions, it is generally best to copy regex made by gurus and test it thoroughly to verify it produces your desired result rather than making one yourself. The regex I am using below was copied from here. I have tested it with a long list of valid emails I have and not one of them failed to match.
import re
PHONE_REGEX = re.compile(r'[0-9]{10}')
EMAIL_REGEX = re.compile(r"""(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#'$"""+
r"""%&*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d"""+
r"""-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*"""+
r"""[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4]["""+
r"""0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|["""+
r"""a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|"""+
r"""\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])""")
if not PHONE_REGEX.match(phonenumber):
return build_validation_result(
False,
'PhoneNumber',
'Please enter valid phone number which contains 10 digits'
)
if not EMAIL_REGEX.match(email):
return build_validation_result(
False,
'Email',
'Please enter valid email address'
)

Accessing website with urllib returns error, retrieving information from Results Page

I created a code in python so I could access a reverse phone lookup site and determine if a phone is a cell phone or land line. The website I am using is whitepages, whose results page will only include the phrase "VoIP" if the phone is a land line (which I have determined after looking at many results). However, I am getting an error at the website accessing stage. So far my code looks like:
import urllib
def Phone_Checker(number):
url = 'http://www.whitepages.com/reverse_phone'
enter = {'e.g. 206-867-5309': number}
door= urllib.parse.urlencode(enter)
open=door.encode('UTF-8')
fight= urllib.request.urlopen(url, open)
d = fight.read()
v="VoIP"
vv=v.encode("UTF-8")
if vv in d: #if VoIP it is landline
return False
else:
return True
I changed my strings into bytes because it was required for my variable "open" to be in bytes for urlopen. In a version of the code I made to access a different site it required a few other string conversion into bytes but I cannot quite remember which information required this conversion (just a heads up if the code after introducing the variable fight looks incorrect because I have not been able to debug the code which follows because of my difficulty with my urlopen. Whenever I run my code I receive this error
File "C:\Users\aa364\Anaconda3\lib\urllib\request.py", line 589, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Requested Range Not Satisfiable
I was wondering how I could circumvent this error and if there is any possible alternative to creating a program to verify if a phone is mobile or a landline for DOMESTIC (USA) phone numbers. Thank you in advance!
Based on the stuff I'm reading and experimenting with to try to find an answer on this, I think this is likely whitepages' doing. I have 3 reasons:
the error seems to be a result of whitepages only accepting requests from certain browsers ('User-Agents')
Upon changing the 'User-Agent' I get kicked to robots.txt (which is basically a response meaning "don't automate this")
Both of these things are likely the result of whitepages having a paid/premium-access API: obviously, they'll do whatever they can to stop people from accessing their information for free if they're trying to charge for it
So, I think the answer in this case is, unfortunately, find another phonenumber lookup.

smtplib sometimes can't send a string variable from sqlite concatenated to a string literal

I am trying to send a message in Python with smtplib. I have a sendEmail(message) function:
def sendEmail(message)
server = smtplib.SMTP_SSL('mailserver.example.com', 465)
server.ehlo()
server.login("username", "password")
print message
msg = message
fromaddr = "myaddr#someplace.com"
toaddr = "theiraddr#someotherplace.com"
server.sendmail(fromaddr, toaddr, msg)
server.quit()
I am trying to do something like this:
myMessage = "A String Literal" + someString
sendEmail(myMessage)
But when I do that I sometimes receive an empty email message. someString is a unicode string pulled from sqlite. I have tried various string concatenations combining literals with literals, sqlite strings with sqlite strings, sqlite strings with literals, each by themselves. I have used + and .join and anything else I can think of to join the strings. It seems like everything works sometimes but nothing works every time.
The "print message" inside sendEmail function always prints the string I expect. smtplib always sends the email message to the correct email address without complaining, it is just sometimes an empty message.
Since I have had everything work and also not work, I don't really trust any solution I come up with if I don't fully understand what is going on. Can anybody help me understand what is happening with these strings so I can construct my message in the most appropriate way and be confident that my application will work reliably?
Here is a more detailed example of what I am trying to do:
#connect to the database
dbconn = sqlite3.connect(config["database"])
dbconn.row_factory = sqlite3.Row
dbc = dbconn.cursor()
while true:
#step 1: Check the things and set alarm status in the alarms table
#this step reads from and writes to the database
#step 2: Check each alarm and send a message if necessary
dbc.execute('SELECT * FROM alarms')
theAlarms = dbc.fetchall()
for theAlarm in theAlarms:
if bool(theAlarm['alertShouldBeSent'):
sendEmail('ALARM!!: ' + theAlarm['message'])
dbc.execute('UPDATE alarms SET alertShouldBeSent=0 WHERE id=?',(theAlarm['id']),)
elif bool(theAlarm['allClearShouldBeSent']):
sendEmail('NORMAL: ' + theAlarm['message'])
dbc.execute('UPDATE alarms SET allClearShouldBeSent=0 WHERE id=?',(theAlarm['id'],))
dbconn.commit()
Each row in the alarm table defines a condition that should trigger an alarm and has a field that indicates whether the alarm condition is met making the alarm active
theAlarm['message'] from the database is something like "The batteries are on fire" or "Somebody left the door open." There are currently only 4 or 5 rows in the alarms table. The logic that determines theAlarm['alertShouldBeSent'] makes sure an alert is only sent once per alarm condition and won't be continuously sent if the alarm doesn't first reset. The same is true for theAlarm['allClearShouldBeSent']. The loop runs continuously but emails are sent infrequently. In testing, I set one row into an alarm state. I can verify that the code to send the email is triggered when it should be.
Had the same problem as you, so I went googling and I found your post. As I was reading I realised it must be the damn parser smtplib is using. It doesn't know how to parse ":".
There may be other characters it can't parse, but I haven't found any.

Gmail IMAP is not returning uids

I am connecting to my gmail account via IMAP to sync some of my emails and parse them. Sometimes I need to download again some emails because I did some kind of fix and now gmail is not returning me the uids of those emails in any way, here is some code to explain myself better:
typ, data = self.connection.uid('search', None, '(SINCE 14-Dec-2012 BEFORE 20-Dec-2012)')
17:05.55 > HJBM3 UID SEARCH (SINCE 14-Dec-2012 BEFORE 20-Dec-2012)
17:05.69 < * SEARCH
17:05.69 < HJBM3 OK SEARCH completed (Success)
('OK', [''])
I have a good bunch of emails on those dates including the ones I want to parse and it doesn't return anything, depending on the date it does return some uids so is not completely broken.
I decided to try if thunderbird synced correctly those emails and it got them no problem.
I am using the python 2.6 imaplib (version 2.58)
Maybe this will help someone so I'll answer it here:
I had in gmail this setting on:
When I changed it to "Do not limit" It worked like a charm.

get email unread content, without affecting unread state [duplicate]

This question already has answers here:
Fetch an email with imaplib but do not mark it as SEEN
(4 answers)
Closed 7 years ago.
Right now its a gmail box but sooner or later I want it to scale.
I want to sync a copy of a live personal mailbox (inbox and outbox) somewhere else, but I don't want to affect the unread state of any unread messages.
what type of access will make this easiest? I can't find any information if IMAP will affect the read state, but it appears I can manually reset a message to unread. Pop by definition doesn't affect unread state but nobody seems to use pop to access their gmail, why?
In the IMAP world, each message has flags. You can set the individual flags on each message. When you Fetch a message, it's actually possible to read the message, without applying the \Seen flag.
Most mail clients will apply the \Seen flag when the message is read. So, if the message has already been read, outside of your app, then you will need to remove the \Seen flag.
Just as fyi...here is the relevant part about flags from the RFCs:
A system flag is a flag name that is pre-defined in this
specification. All system flags begin with "\". Certain system
flags (\Deleted and \Seen) have special semantics described
elsewhere. The currently-defined system flags are:
\Seen
Message has been read
\Answered
Message has been answered
\Flagged
Message is "flagged" for urgent/special attention
\Deleted
Message is "deleted" for removal by later EXPUNGE
\Draft
Message has not completed composition (marked as a draft).
\Recent
Message is "recently" arrived in this mailbox. This session
is the first session to have been notified about this
message; if the session is read-write, subsequent sessions
will not see \Recent set for this message. This flag can not
be altered by the client.
If it is not possible to determine whether or not this
session is the first session to be notified about a message,
then that message SHOULD be considered recent.
If multiple connections have the same mailbox selected
simultaneously, it is undefined which of these connections
will see newly-arrived messages with \Recent set and which
will see it without \Recent set.
There is a .PEEK option on the FETCH command in IMAP that will explicitly not set the /Seen flag.
Look at the FETCH command in RFC 3501 and scroll down a bit to page 57 or search for "BODY.PEEK".
You need to specify section when you use BODY.PEEK. Sections are explained in IMAP Fetch Command documentations under BODY[<section>]<<partial>>
import getpass, imaplib
M = imaplib.IMAP4()
M.login(getpass.getuser(), getpass.getpass())
M.select()
typ, data = M.search(None, 'ALL')
for num in data[0].split():
typ, data = M.fetch(num, '(BODY.PEEK[])')
print 'Message %s\n%s\n' % (num, data[0][5])
M.close()
M.logout()
PS: I wanted to fix answer given Gene Wood but was not allowed because edit was smaller than 6 characters (BODY.PEEK -> BODY.PEEK[])
Nobody uses POP because typically they want the extra functionality of IMAP, such as tracking message state. When that functionality is only getting in your way and needs workarounds, I think using POP's your best bet!-)
if it helps anyone, GAE allows you to receive email as an HTTP request, so for now i'm just forwarding emails there.
To follow up on Dan Goldstein's answer above, in python the syntax to use the ".PEEK" option would be to call IMAP4.fetch and pass it "BODY.PEEK"
To apply this to the example in the python docs :
import getpass, imaplib
M = imaplib.IMAP4()
M.login(getpass.getuser(), getpass.getpass())
M.select()
typ, data = M.search(None, 'ALL')
for num in data[0].split():
typ, data = M.fetch(num, '(BODY.PEEK)')
print 'Message %s\n%s\n' % (num, data[0][5])
M.close()
M.logout()

Categories