I want to grab data from a BigQuery database at a set interval and have the results emailed automatically using smtplib.
I have an example below. I can pull data from BigQuery just fine. I can send email via smtplib just fine. What I need to do is combine. I want to store the results of a for loop into the email body of the message. I believe I do that by calling the function. However when I do that. I receive the error.
File "bqtest5.py", line 52, in server.sendmail(login_email,
recipients, query_named_params('corpus', 'min_word_count')) File
"/usr/lib/python2.7/smtplib.py", line 729, in sendmail
esmtp_opts.append("size=%d" % len(msg)) TypeError: object of type
'NoneType' has no len()
from google.cloud import bigquery
import smtplib
#Variables
login_email = 'MYEMAIL'
login_pwd = 'MYPASSWORD'
recipients ='EMAILSENDINGTO'
#Create a function
#specifies we are going to add two paramaters
def query_named_params(corpus, min_word_count):
#Create a Client
client = bigquery.Client()
#Define the query
query = """
SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = #corpus
AND word_count >= #min_word_count
ORDER BY word_count DESC;
"""
#Define the paramaters
query_params = [
bigquery.ScalarQueryParameter('corpus', 'STRING', 'sonnets'),
bigquery.ScalarQueryParameter(
'min_word_count', 'INT64', 10)
]
#Create job configuration
job_config = bigquery.QueryJobConfig()
#Add Query paramaters
job_config.query_parameters = query_params
#P
query_job = client.query(query, job_config=job_config)
# Print the results.
destination_table_ref = query_job.destination
table = client.get_table(destination_table_ref)
resulters = client.list_rows(table)
for row in resulters:
print("{} : {} views".format(row.word, row.word_count))
# --------------------EMAIL PORTION -------------#
#)smtplib connection print('messenger()')
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login(login_email, login_pwd)
msg = """
"""
server.sendmail(login_email, recipients, query_named_params('corpus',
'min_word_count'))
server.quit()
if __name__ == '__main__':
query_named_params('corpus','min_word_count')
Your function is not returning any value which causes an empty message to be sent to server.sendmail.
Try this instead:
def query_named_params(corpus, min_word_count):
(...)
query_params = [
bigquery.ScalarQueryParameter('corpus', 'STRING', corpus),
bigquery.ScalarQueryParameter(
'min_word_count', 'INT64', min_word_count)
]
(...)
s = ""
for row in resulters:
s+= "{} : {} views\n".format(row.word, row.word_count))
return s
(...)
server.sendmail(login_email, recipients, query_named_params('sonnets', 10))
This probably will not send a very readable message though. Depending on the complexity of the results from your BQ table I'd recommend using Jinja2 to create a HTML template and then rendering it to be sent as the mail body.
Related
I've connected a SQL query to Python and am trying to automate an email with attachment if the SQL query returns results. This query will show the discrepancies between our prices and the prices our client is selling our products at.
If there are differences between the 'Unit Price' and 'Selling Price', I want to email these to my director. However, if there are no differences, then the query will return 0 results and I do not want an empty email to be sent out.
My difficulty in this is that the python script is sending the attachment even if the query is returning no results. I only want to send the attachment if my query is returning results.
Any help on this would be greatly appreciated!
sql_query = pd.read_sql_query('''
SELECT SaleH.[Order No], SaleH.[Reference], SaleL.[Product code], SaleL.[Description], SaleL.[Quantity], SaleL.[Unit Price], SaleP.[Unit Price] AS 'Selling Price'
FROM [Sales Header] SaleH
INNER JOIN [Sales Line] SaleL ON SaleH.[Order No] = SaleL.[Order No]
LEFT JOIN [Sales Price] SaleP ON SaleP.[Product Code] = SaleL.[Product Code] AND SaleH.[Customer No] = SaleP.[Customer No]
WHERE SaleH.[Customer No] = 'Cust01' AND SaleH.[Date] >DATEADD(DD,-1,getdate()) AND SaleP.[Unit Price] != SaleL.[Unit Price]
''' ,conn)
df = pd.DataFrame(sql_query)
df.to_csv (r'G:\Customer Folder\Customer_Sales_Orders.csv', index = False)
outlook = win32. Dispatch('outlook.application')
email = outlook.CreateItem(0)
mail_from = "Sender"
mail_to = "Recipient"
mail_subject = "Customer Sales Orders"
mail_attachment = 'Customer_Sales_Orders.csv'
mail_attachment_name = "Customer_Sales_Orders" +'.csv'
This will do:
results = pd.read_csv('Data.csv') # count no. of lines
if len(results))> 0:
#sendemail
from_address = "<from which email>>"
to_address = "<<to which email>>"
# Create message container - the correct MIME type is multipart/alternative.
msg = MIMEMultipart('alternative')
msg['Subject'] = "Customer ha made purchases in the last 24 hours for
{}".format(date.today())
msg['From'] = from_address
msg['To'] = to_address
# Create the message (HTML).
html = """\
This is an automated email.
We are sending an email using Python and Gmail, how fun! We can fill this with html, and gmail supports a decent range of css style attributes too - https://developers.google.com/gmail/design/css#example.
"""
# Record the MIME type - text/html.
part1 = MIMEText(html, 'html')
# Attach parts into message container
msg.attach(part1)
# Credentials
username = '<your email id>'
password = '<google_app_password>'
# Sending the email
## note - this smtp config worked for me, I found it googling around, you may have to tweak the # (587) to get yours to work
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.login(username,password)
server.sendmail(from_address, to_address, msg.as_string())
server.quit()
I have a sufficiently large dataset that I would like to bulk index the JSON objects in AWS OpenSearch.
I cannot see how to achieve this using any of: boto3, awswrangler, opensearch-py, elasticsearch, elasticsearch-py.
Is there a way to do this without using a python request (PUT/POST) directly?
Note that this is not for: ElasticSearch, AWS ElasticSearch.
Many thanks!
I finally found a way to do it using opensearch-py, as follows.
First establish the client,
# First fetch credentials from environment defaults
# If you can get this far you probably know how to tailor them
# For your particular situation. Otherwise SO is a safe bet :)
import boto3
credentials = boto3.Session().get_credentials()
region='eu-west-2' # for example
auth = AWSV4SignerAuth(credentials, region)
# Now set up the AWS 'Signer'
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
auth = AWSV4SignerAuth(credentials, region)
# And finally the OpenSearch client
host=f"...{region}.es.amazonaws.com" # fill in your hostname (minus the https://) here
client = OpenSearch(
hosts = [{'host': host, 'port': 443}],
http_auth = auth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)
Phew! Let's create the data now:
# Spot the deliberate mistake(s) :D
document1 = {
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}
document2 = {
"title": "Apollo 13",
"director": "Richie Cunningham",
"year": "1994"
}
data = [document1, document2]
TIP! Create the index if you need to -
my_index = 'my_index'
try:
response = client.indices.create(my_index)
print('\nCreating index:')
print(response)
except Exception as e:
# If, for example, my_index already exists, do not much!
print(e)
This is where things go a bit nutty. I hadn't realised that every single bulk action needs an, er, action e.g. "index", "search" etc. - so let's define that now
action={
"index": {
"_index": my_index
}
}
You can read all about the bulk REST API, there.
The next quirk is that the OpenSearch bulk API requires Newline Delimited JSON (see https://www.ndjson.org), which is basically JSON serialized as strings and separated by newlines. Someone wrote on SO that this "bizarre" API looked like one designed by a data scientist - far from taking offence, I think that rocks. (I agree ndjson is weird though.)
Hideously, now let's build up the full JSON string, combining the data and actions. A helper fn is at hand!
def payload_constructor(data,action):
# "All my own work"
action_string = json.dumps(action) + "\n"
payload_string=""
for datum in data:
payload_string += action_string
this_line = json.dumps(datum) + "\n"
payload_string += this_line
return payload_string
OK so now we can finally invoke the bulk API. I suppose you could mix in all sorts of actions (out of scope here) - go for it!
response=client.bulk(body=payload_constructor(data,action),index=my_index)
That's probably the most boring punchline ever but there you have it.
You can also just get (geddit) .bulk() to just use index= and set the action to:
action={"index": {}}
Hey presto!
Now, choose your poison - the other solution looks crazily shorter and neater.
PS The well-hidden opensearch-py documentation on this are located here.
conn = wr.opensearch.connect(
host=self.hosts, # URL
port=443,
username=self.username,
password=self.password
)
def insert_index_data(data, index_name='stocks', delete_index_data=False):
""" Bulk Create
args: body [{doc1}{doc2}....]
"""
if delete_index_data:
index_name = 'symbol'
self.delete_es_index(index_name)
resp = wr.opensearch.index_documents(
self.conn,
documents=data,
index=index_name
)
print(resp)
return resp
I have used below code to bulk insert records from postgres into OpenSearch ( ES 7.2 )
import sqlalchemy as sa
from sqlalchemy import text
import pandas as pd
import numpy as np
from opensearchpy import OpenSearch
from opensearchpy.helpers import bulk
import json
engine = sa.create_engine('postgresql+psycopg2://postgres:postgres#127.0.0.1:5432/postgres')
host = 'search-xxxxxxxxxx.us-east-1.es.amazonaws.com'
port = 443
auth = ('username', 'password') # For testing only. Don't store credentials in code.
# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
hosts = [{'host': host, 'port': port}],
http_compress = True,
http_auth = auth,
use_ssl = True,
verify_certs = True,
ssl_assert_hostname = False,
ssl_show_warn = False
)
with engine.connect() as connection:
result = connection.execute(text("select * from account_1_study_1.stg_pred where domain='LB'"))
records = []
for row in result:
record = dict(row)
record.update(record['item_dataset'])
del record['item_dataset']
records.append(record)
df = pd.DataFrame(records)
#df['Date'] = df['Date'].astype(str)
df = df.fillna("null")
print(df.keys)
documents = df.to_dict(orient='records')
#bulk(es ,documents, index='search-irl-poc-dump', raise_on_error=True)\
#response=client.bulk(body=documents,index='sample-index')
bulk(client, documents, index='search-irl-poc-dump', raise_on_error=True, refresh=True)
I have a python list, this list is of any document that has been updated within a set timeframe and the documents are ID'd via a set value in the list. There may be one or many in the resulting list. What I am trying to figure out is how I can loop through the list of values (documents) and trigger another loop that iterates through another list I have of emails resulting in one email per address per document in the original list? I tried to "stack" loops on top of each other (code snippet shown below), but this results in multiple emails to each email address all with a full list of documents (i.e. if there are two documents in the list, two emails are sent to each address with the details about both documents).
import boto3
from botocore.exceptions import ClientError
import requests
from datetime import datetime, timedelta
#first get request to pull all current endusers
GET1 = "https://abdc.com/api"
r1 = requests.get(url = GET1, auth=('username/token','APItoken'))
#convert to python dict
data = r1.json()
#create a list of all user's email addresses
emails = [user["email"] for user in data["users"]]
#create an timestamp of previous day and convert to epoch
ts1 = datetime.today() - timedelta(days =1)
ts2 = ts1.strftime("%s")
#set start time attribute as a parameter
params = {'start_time':ts2}
#second GET reqeust to pull all articles updated in the last 24 hrs
GET2 = "https://efdg.com/api"
r2 = requests.get(url = GET2, params=params, auth=('username/token','APItoken'))
#convert to python dict
data2 = r2.json()
#create list of all the target article titles and html url
updated_docs = [articles["html_url"] for articles in data2["articles"]]
doc_title = [articles["title"] for articles in data2["articles"]]
for y in updated_doc:
#create loop to iterate throuhg all the email addresses and send
individual emails to users
for x in emails:
# This address must be verified with Amazon SES.
SENDER = "example#example.com"
#To list
RECIPIENT = x
# Specify a configuration set. If you do not want to use a configuration
# set, comment the following variable, and the
# ConfigurationSetName=CONFIGURATION_SET argument below.
#CONFIGURATION_SET = "ConfigSet"
# If necessary, replace us-west-2 with the AWS Region you're using for Amazon SES.
AWS_REGION = "us-east-1"
# The subject line for the email.
SUBJECT = "blah blah "
# The email body for recipients with non-HTML email clients.
BODY_TEXT = ("Amazon SES Test (Python)\r\n"
"This email was sent with Amazon SES using the "
"AWS SDK for Python (Boto)."
)
# The HTML body of the email.
BODY_HTML = """<html>
<head></head>
<body>
<h1>l Documentation Notification</h1>
<p1>Please click the link below for the most current version of this document.<br>
<br>
"""+str(doc_title)+"""<br>
<br>
"""+str(updated_docs)+"""
</p1>
</body>
</html>
"""
# The character encoding for the email.
CHARSET = "UTF-8"
# Create a new SES resource and specify a region.
client = boto3.client('ses', region_name=AWS_REGION)
# Try to send the email.
try:
# Provide the contents of the email.
response = client.send_email(
Destination={
'ToAddresses': [
RECIPIENT,
],
},
Message={
'Body': {
'Html': {
'Charset': CHARSET,
'Data': BODY_HTML,
},
'Text': {
'Charset': CHARSET,
'Data': BODY_TEXT,
},
},
'Subject': {
'Charset': CHARSET,
'Data': SUBJECT,
},
},
Source=SENDER,
# If you are not using a configuration set, comment or delete the
# following line
#ConfigurationSetName=CONFIGURATION_SET,
)
# Display an error if something goes wrong.
except ClientError as e:
print(e.response['Error']['Message'])
else:
print("Email sent! Message ID:"),
print(response['MessageId'])
So, you've completely changed your code, so I'm well confused, but if I can read between the lines properly you meant something like this:
#create loop to iterate through all documents in the list
for article in data2["articles"]:
#create loop to iterate through all the email addresses and send individual emails to
users
for x in emails:
# This address must be verified with Amazon SES.
SENDER = "example#example.com"
#To list
RECIPIENT = x
# The HTML body of the email.
BODY_HTML = """<html>
<head></head>
<body>
<h1>l Documentation Notification</h1>
<p1>Please click the link below for the most current version of this document.<br>
<br>
"""+str(article["title"])+"""<br>
<br>
"""+str(article["html_url"])+"""
</p1>
</body>
</html>
"""
Notice how I iterate through data2["articles"] and update the BODY_HTML with article["title"] and article["html_url"],
so I'm working on something that uses regex to search something from an email, which is fetched via imaplib module. Right now I can't get it to work, even after using str() function.
result, data = mail.fetch(x, '(RFC822)')
eemail = email.message_from_bytes(data[0][1])
print(str(eemail))
trying to regex it:
print(re.search("button", eemail))
Regex gives me no matches even after making the email a string object.
This is what I use:
import imaplib
import email
import re
mail = imaplib.IMAP4_SSL(SMTP_SERVER, SMTP_PORT)
mail.login(FROM_EMAIL,FROM_PWD)
mail.select('inbox')
status, response = mail.search(None, '(UNSEEN)')
unread_msg_nums = response[0].split()
for e_id in unread_msg_nums:
_, response = mail.fetch(e_id, '(UID BODY[TEXT])')
b = email.message_from_string(response[0][1])
if b.is_multipart():
for payload in b.get_payload(decode=True):
print(re.search("button", payload.get_payload(decode=True)))
else:
print(re.search("button", b.get_payload(decode=True)))
Below is my code and I am hoping someone can help me with the cleaning up the code and making it more effiencient. Basically, the code should iterate through all the volumes in my AWS account and then list all untagged volumes and then send out an email. However, it times out when running it as a lambda function in AWS but if i run it locally, it will take over 30 mins to complete (however it does complete). Im sure its iterating through things it doesnt need.
Also if I print the ec2_instances list, I can see duplicate values, so I want to only have unique values so that its not repeating the script for each ec2 instance.
import logging
import boto3
from smtplib import SMTP, SMTPException
from email.mime.text import MIMEText
logger = logging.getLogger()
logger.setLevel(logging.INFO)
session = boto3.Session(profile_name="prod")
client = session.client('ec2')
untagged_volumes = []
detached_volumes = []
ec2_instances = []
response = client.describe_volumes()
for volume in response['Volumes']:
if 'Tags' in str(volume):
continue
else:
if 'available' in str(volume):
detached_volumes.append(volume['VolumeId'])
else:
untagged_volumes.append(volume['VolumeId'])
untagged_volumes.append(volume['Attachments'][0]['InstanceId'])
ec2_instances.append(volume['Attachments'][0]['InstanceId'])
unique_instances = list(set(ec2_instances))
# Create the msg body.
msg_body_list = []
for instance in unique_instances:
desc_instance = client.describe_instances()
# append to the msg_body_list the lines that we would like to show on the email
msg_body_list.append("VolumeID: {}".format(desc_instance['Reservations'][0]['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['VolumeId']))
msg_body_list.append("Attached Instance: {}".format(desc_instance['Reservations'][0]['Instances'][0]['InstanceId']))
# if there are tags, we will append it as singles lines as far we have tags
if 'Tags' in desc_instance['Reservations'][0]['Instances'][0]:
msg_body_list.append("Tags:")
for tag in desc_instance['Reservations'][0]['Instances'][0]['Tags']:
msg_body_list.append(" Key: {} | Value: {}".format(tag['Key'], tag['Value']))
# in case we don't have tags, just append no tags.
else:
msg_body_list.append("Tags: no tags")
msg_body_list.append("--------------------")
# send email
mail_from = "xxx#xxx.com"
mail_to = 'xxx#xxx.com'
msg = MIMEText("\n".join(msg_body_list))
msg["Subject"] = "EBS Tagged Instance Report for"
msg["From"] = mail_from
msg["To"] = mail_to
try:
server = SMTP('xxx.xxx.xxx.xxx', 'xx')
server.sendmail(mail_from, mail_to.split(','), msg.as_string())
server.quit()
print('Email sent')
except SMTPException:
print('ERROR! Unable to send mail')
Lambda functions have a time limit of 15 minutes. That is the reason for the timeout - if you need to run scripts for longer, look up AWS Fargate.